Automated Bots Are Increasingly Scraping Data & Attempting Logins

April 21, 2020 TH Author

The share of bot traffic to online sites declines, but businesses are seeing an overall increase in automated scraping of data, login attempts, and other detrimental activity.

The volume of Internet traffic due to automated software — bots — has declined to its lowest point in at least six years, but the share of the traffic due to unwanted automated activity — “bad” bots — has increased to its highest level over the same period, according to cybersecurity firm Imperva in a report published on April 21.

In 2019, bad bots accounted for 24% of all Internet traffic seen by Imperva’s customers, 5.5 points higher than its lowest level in 2015, the company stated in its “Bad Bot Report 2020.” Bad bots are automated software programs that perform unwanted activities, such as scrape price data or availability information from websites, or conduct outright-malicious activities, such as account-takeover attempts or credit card fraud.

Acceptable bot activity has fallen by nearly two-thirds to 13% of all traffic in 2019, down from 36% in 2014, the report states. The move to a data-driven economy has created an incentive for more bots while at the same time making their activities less acceptable, says Kunal Anand, chief technology officer for Imperva.

“The digital transformation and the movement of information to the Web is a major driver that makes running bots more lucrative,” he says. “There is also increased awareness, and companies are controlling what bots they allow through whitelisting or allow what are seen as good bots.”

Bots are a natural evolution of connecting computers and software to the Internet, but they are problematic for companies that have to expose their intellectual property online as part of their business. Airlines, for example, need to give flight information and pricing to customers, but at the same time, a bot-using competitor can scrape that information and gain valuable information.

Businesses that see Internet efficiencies declining — such as poor conversion rates, content appearing on other sites, or increased failed logins — have likely been targeted by bots, according to Imperva’s report.

“The two biggest problems from bad bots are credential stuffing to attack account logins and scraping of data, [such as] pricing and/or content,” Anand says. “Almost every website suffers from both of these.”

About a quarter of bots are considered simple, with traffic that comes from a single IP address and does not use a browser agent header to pretend that its traffic is legitimate. More-complex bots use browser emulation software, such as Selenium, to masquerade as a legitimate visitor. Selenium is an open source project that is commonly used to test websites for vulnerabilities. The most sophisticated bots move the mouse to mimic human behavior, Imperva states in the report.

More than 55% of bots impersonated Google’s Chrome browser, the highest percentage yet, the company found.

Different industries see different levels of bad bot activity. The financial industry encountered very little “good” bot traffic, with a little less than 48% of traffic due to bad bots and a little more than 51% of traffic from humans. Similarly, the education and IT services sectors are seeing around 45% of their traffic accounted for by bad bots.

Online data firms and business service firms encountered the largest share of traffic from good bots, which accounted for 51% and 54% of their traffic, respectively.

Nearly 46% of unwanted bot traffic came from the United States in 2019, and in many cases, the bot activity is likely legal. In September, a US appeals court upheld the ability of HiQ Labs, a provider of intelligence on employees, to scrape LinkedIn and other services to compile profiles of professionals.

“Our definition of good bot is typically a tool that the business is willing to allow to be on its site — search engines and SEO tools fall into this list,” Anand says. “Companies typically also whitelist other tools that they use themselves, like a vulnerability scanner that they control when it is being deployed. Bad bots are classified as those requests that don’t come from a recognized browser and are there for another reason that wasn’t authorized by the company.”

Related Content

Check out The Edge, Dark Reading’s new section for features, threat data, and in-depth perspectives. Today’s top story: “How Can I Help My Users Spot Disinformation?“

Veteran technology journalist of more than 20 years. Former research engineer. Written for more than two dozen publications, including CNET News.com, Dark Reading, MIT’s Technology Review, Popular Science, and Wired News. Five awards for journalism, including Best Deadline … View Full Bio

More Insights

Leave a Reply Cancel reply