Using Analytics is Not a Privacy Violation

In 2016, Homebrew started collecting user behavior statistics using Google Analytics. The move was criticized heavily soon as it was announced. Few of Homebrew’s users even called it a “betrayal.” How could an open-source software send data to Google, especially without making it opt-in? How could it commit such an egregious privacy violation?

But what if the users are wrong in the assumption?

Privacy is a sensitive topic and for rightful reasons. When I downloaded my location history data from Google, I was in disbelief when I realized that it was nearly half a gigabyte in size. Everywhere I have been in the last five or six years was in that file. It was mildly frightening knowing how much a single company knew about me.

Note: Google has since then altered its policy and it no longer stores location/browsing history for more than 18 months.

However, app analytics doesn’t belong to the ‘scary tracking’ category. Not every bit of data being collected needs to be regarded as unethical.

Homebrew did everything to preserve people’s privacy. Their usage of Google Analytics was more like a free and convenient key-value store.

No PII was sent with any request and IP Addresses were anonymized, which means they couldn’t have possibly tracked an installation to a particular machine.
They were only interested in collecting aggregate statistics (total installations vs failures).
It was made clear what events will be tracked (none being particularly sensitive for the user).
They created a publicly available page to share the statistics.
They made it very easy to opt out if the users wanted to.

In anonymizing IP addresses, Google says that “at no time is the full IP address written to disk as all anonymization happens in memory nearly instantaneously after the request has been received.” Unless Google is explicitly lying, there is no way anyone involved could say Person A installed Package B.

People often fail to see valid reasons to collect data, because they see it as inherently evil. However, usage data is a critical piece in the product development cycle that enables managers to make better decisions.

If I collect data about the features people are using/not using, I can prioritize what to build.
If I have an onboarding wizard, and I am tracking what steps people are skipping, it can help me understand where the wizard is lacking and how I can improve it.
I can know if my app is being used or not and if I should put more effort into building or marketing.
Or as in the Homebrew’s case, tracking failures is an efficient way to understand where users are facing problems and how to fix them.

In all four cases, we are not interested in the individual behavior, but in how users (in aggregate) are interacting with our software. Where they are getting stuck. What problems they are facing.

In 2014, Uber described in a (now deleted) blog post, how it is possible to identify one-night stands based on the usage of the app.

It was while playing around with this idea of (blind!) rider segmentation that we came up with the Ride of Glory (RoG). A RoGer is anyone who took a ride between 10pm and 4am on a Friday or Saturday night, and then took a second ride from within 1/10th of a mile of the previous nights’ drop-off point 4-6 hours later (enough for a quick night’s sleep). (This time window may not be the best, but small changes don’t change the overall pattern.)

It’s no secret that data can be very revealing about the individual, depending on how closely tied is app’s usage to their behavior, and how many data points the individual is giving away when using the app.

But if my app/website is small and does a very specific thing, it’s hard to imagine how one can probe into their behavior by tracking their usage. If it’s a small e-commerce store or a SaaS app, can someone use the data to profile someone? Probably not. Even if you wanted to, the data would be too noisy to do that.

What about contributing to Google’s enormous data pile, though? Won’t they use it to track users wherever Google Analytics is installed? Recently, Google clarified its privacy policies regarding GA in a blog post:

It does not track people or profile people across the internet. Google Analytics cannot be used to track people across the web or apps. It does not create user profiles.

[…]

This kind of information also includes things like the type of device or browser used; how long, on average, visitors spend on their site or app; or roughly where in the world their visitors are coming from. These data points are never used to identify the visitor or anyone else in Google Analytics.

Of course, you have a reason to not trust a big corporation. However, a public company, that’s already on the antitrust radar, doesn’t have a good reason to outright lie publicly.

In my opinion, privacy violation occurs when data is used, and to an extent, has the potential to be used, to probe into an individual’s behavior. In many instances though, the data itself can be pretty weak to permit that. The chief interest of apps/website is in aggregate statistics to gain insights about their product vs. profiling individual users.

It’s the difference between a supermarket knowing which products are selling faster vs trying to profile users based on their purchases. The latter is immoral, former is not.