Handpicked products, unbeatable prices — shop with confidence at CannonEdge

Perplexity is allegedly scraping websites it’s not supposed to, again

Web crawlers deployed by Perplexity to scrape websites are allegedly skirting restrictions, according to a new report from Cloudflare. Specifically, the report claims that the company’s bots appear to be “stealth crawling” sites by disguising their identity to get around robots.txt files and firewalls.

Robots.txt is a simple file websites host that lets web crawlers know if they can scrape a websites’ content or not. Perplexity’s official web crawling bots are “PerplexityBot” and “Perplexity-User.” In Cloudflare’s tests, Perplexity was still able to display the content of a new, unindexed website, even when those specific bots were blocked by robots.txt. The behavior extended to websites with specific Web Application Firewall (WAF) rules that restricted web crawlers, as well.

A flowchart created by Cloudflare to illustrate the different ways Perplexity's web crawlers try to access the content of a website.

Cloudflare

Cloudflare believes that Perplexity is getting around those obstacles by using “a generic browser intended to impersonate Google Chrome on macOS” when robots.txt prohibits its normal bots. In Cloudlfare’s tests, the company’s undeclared crawler could also rotate through IP addresses not listed in Perplexity’s official IP range to get through firewalls. Cloudflare says that Perplexity appears to be doing the same thing with autonomous system numbers (ASNs) — an identifier for IP addresses operated by the same business — writing that it spotted the crawler switching ASNs “across tens of thousands of domains and millions of requests per day.”

Engadget has reached out to Perplexity for comment on Cloudflare’s report. We’ll update this article if we hear back.

Up-to-date information from websites is vital to companies training AI models, especially as service’s like Perplexity are used as replacements for search engines. Perplexity has also been caught in the past circumventing the rules to stay up-to-date. Multiple websites reported in 2024 that Perplexity was still accessing their content despite them forbidding it in robots.txt — something the company blamed on the third-party web crawlers it was using at the time. Perplexity later partnered with multiple publishers to share revenue earned from ads displayed alongside their content, seemingly as a make-good for its past behavior.

Stopping companies from scraping content from the web will likely remain a game of whack-a-mole. In the meantime, Cloudflare has removed Perplexity’s bots from its list of verified bots and implemented a way to identify and block Perplexity’s stealth crawler from accessing its customers’ content.

Trending Products

- 27% ANTEC NX200M RGB, Large Mesh Front ...
Original price was: $75.34.Current price is: $54.99.

ANTEC NX200M RGB, Large Mesh Front ...

0
Add to compare
- 24% HP 330 Wireless Keyboard and Mouse ...
Original price was: $32.99.Current price is: $24.99.

HP 330 Wireless Keyboard and Mouse ...

0
Add to compare
- 37% NZXT H9 Flow Dual-Chamber ATX Mid-T...
Original price was: $252.75.Current price is: $159.97.

NZXT H9 Flow Dual-Chamber ATX Mid-T...

0
Add to compare
- 33% KEDIERS ATX PC Case,6 PWM ARGB Foll...
Original price was: $164.98.Current price is: $109.99.

KEDIERS ATX PC Case,6 PWM ARGB Foll...

0
Add to compare
- 10% LG FHD 32-Inch Computer Monitor 32M...
Original price was: $199.99.Current price is: $179.99.

LG FHD 32-Inch Computer Monitor 32M...

0
Add to compare
- 41% SAMSUNG 27″ CF39 Collection F...
Original price was: $287.28.Current price is: $169.99.

SAMSUNG 27″ CF39 Collection F...

0
Add to compare
- 34% Dell SE2422HX Monitor – 24 in...
Original price was: $182.38.Current price is: $119.99.

Dell SE2422HX Monitor – 24 in...

0
Add to compare
- 36% Aircove Go | Portable Wi-Fi 6 VPN R...
Original price was: $266.74.Current price is: $169.90.

Aircove Go | Portable Wi-Fi 6 VPN R...

0
Add to compare
- 40% Motorola MG7550 – Modem with ...
Original price was: $200.32.Current price is: $119.95.

Motorola MG7550 – Modem with ...

0
Add to compare
- 37% Cudy TR3000 Pocket-Sized Wi-Fi 6 Wi...
Original price was: $142.94.Current price is: $89.90.

Cudy TR3000 Pocket-Sized Wi-Fi 6 Wi...

0
Add to compare
.

We will be happy to hear your thoughts

Leave a reply

TheCannonEdge
Logo
Register New Account
Compare items
  • Total (0)
Compare
0
Shopping cart