How will the ruling in favor of HiQ influence the scraping industry

How will the ruling in favor of HiQ influence the scraping industry

Posted at February 05, 2020 in Web scraping

How will the ruling in favor of HiQ influence the scraping industry?

The world wide web holds an endless amount of information. The use of all this data was recently brought up before the courts, and no, we are not referring to the whole social media fact-checking extravaganza. Nowadays, a growing number of companies offer data collection and analysis services. One major tool these companies use is data scraping. Usually, web scraping only involves collecting data from what is considered the open Internet. However, there is great opposition to this data collection practice. Many online social platforms claim their users’ data is their own, arguing that they are the only ones who have any right to access and use it.

A legal milestone in support of free access to public data was recently set in the HiQ vs LinkedIn case ruling. Microsoft owned popular professional social network, LinkedIn, aimed to prevent the data analytics company HiQ, from scraping and using information LinkedIn users share on their public profiles.

After lengthy litigation, the United States Court of Appeals for the Ninth Circuit has finally released its ruling in September of 2019, a decision set to send shock-waves throughout the scrapping industry.

Let’s start from the beginning

HiQ Labs is a data science company, providing business with valuable information based on data scraped from LinkedIn. Data scraping, aka web scraping, is one of the most efficient ways to scan and collect data from around the internet. Data analysis companies use both advanced software and scraping bots to get information for a multitude of goals, like marketing content research, product pricing, and more.

Like many other data analysis companies, HiQ depends on free access to publicly-available data in order to continue their operation. The data HiQ collects and analyzes consists only of what LinkedIn’s users have shared on their public profiles. This information was accessible to anyone who visited LinkedIn. From HiQ’s point of view, this meant that the data was fair game.

However, from LinkedIn’s perspective, that data belonged to them and was protected from external scraping under their Terms of Service (ToS). In face of the growing number of data scraping enterprises, many of whom used data from LinkedIn, the professional social media platform couldn’t stay silent. So, since LinkedIn’s ToS prohibit the use of bots or automatic data collection tools, they took the decision to ban any IP addresses associated with data scraping.

Unfortunately, one of the banned businesses was HiQ. The data lab easily bypassed the IP ban, masking their IP addresses with proxy servers. In response, LinkedIn served HiQ with a cease-and-desist. But the story doesn’t end here.

Unwilling to give up, HiQ filed a lawsuit to the district court, obtaining a preliminary injunction stating there is a foundation to HiQ’s claims that automated access and use of public information is not a violation of the Computer Fraud and Abuse Act (CFAA).

The CFAA, dubbed the anti-hacking law, was passed back in 1986. This piece of federal legislation sets both criminal and civil liability on anyone accessing a computer connected to the Internet “without authorization” or “exceeds authorized access.” But since the legislature does not define exactly what “without authorization” means, a whole array of interpretations is possible.

In today’s modern online reality, this discrepancy has turned out to be a pain in the behind of courts all over the United States. The HiQ vs. LinkedIn is just the latest case in a row of lawsuits debating this issue in court.

Simply put, the US Appeals Court had to decide whether to draw the CFAA back, limiting the law to its original purpose of hack prevention, or adopting a more expansive interpretation of the legislation and risk criminalizing normative and popular online practices.

Does scraping equal hacking?

Luckily, the Ninth Circuit concluded that collecting and analysing public data doesn’t constitute “hacking,” allowing HiQ to continue with business as usual. Thankfully, the court recognized the harmful potential of banning commercial access to publicly available information, such as public LinkedIn profile information scraped by HiQ.

According to the Ninth Circuit ruling in the case of HiQ vs. LinkedIn, any information posted to social networks is free to be scraped and aggregated regardless of the sites’ ToS or any implemented means trying to prevent data collection. The appeals court pointed out “the default is free access,” with the bottom line being that launching automated scripts to scan and collect publicly shared data does not constitute as the “computer hacking” the CFAA sought to control and abolish.

In its decision, the US federal appeal court affirmed the preliminary injunction granted against LinkedIn, preventing the social media site from blocking HiQ’s IP addresses. This case will undoubtedly discourage any social media platforms from claiming property rights of their users’ information. So, if it is publicly posted, it is free to mine.

Moreover, the court chose to limit the interpretation of the Computer Fraud and Abuse Act. Instead of allowing LinkedIn to use the law as a tool for preventing automated scraping of publicly visible data, the court opted to protect HiQ’s right to access the information, thus protecting all scraping-based industries.

The court also wasn’t impressed by LinkedIn’s argument of having “its members’ privacy interests in mind.” Since LinkedIn has developed its own data scraping and analysis tool for its own users’ data, the court dismissed their claim of info protectors. It seems more like the professional community site was looking to protect its data aggregation venture, and remain the sole beneficiary of possible information sales revenue.

So, how does the ruling matter?

Well, the most significant part of the Ninth Circuit ruling was the affirmation that the CFAA was put in place to combat hacking and should not be used for enforcing a website’s ToS, no matter how powerful and popular it may be.

Another major subject this ruling touches is data privacy and ownership. As mentioned before, the question of who actually owns our personal data, who has access to it, and who can use it are questions debated by the highest judicial and legislative bodies in the world. The US Appeal Court ruling somewhat affirms that we, the users, are the owners. Any platforms we share that information with merely has a license to use it, but they do not own it or have any claim to it over other platforms.

Finally, the ruling in favor of HiQ acknowledges that data scraping is an integral part of today’s modern global internet activity. It is part of an entire online ecosystem and shouldn’t be criminalized. Indeed, the case will now go back to the district court for a trial. But the precedent was already set. So, at least for the foreseeable future, data scraping is safe.