Amazon is investigating buzzy AI startup Perplexity for allegedly violating its Cloud division’s rules by improperly “scraping” content from other web sites without permission, in keeping with a report Friday.
Perplexity, which recently drew a $3 billion valuation, is allegedly ignoring a widely known web standard called the Robots Exclusion Protocol, commonly known as robots.txt, which news publishers and other sites use to point out automated bots which pages they aren’t allowed to scrape, tech outlet Wired reported.
While adhering to the usual isn’t required by law, most web firms opt to follow the protocol. Compliance can be mandatory for web sites that depend on Amazon Web Services, resembling Perplexity.
“AWS’s terms of service prohibit abusive and illegal activities and our customers are answerable for complying with those terms,” an Amazon Web Services spokesperson said in a press release. “We routinely receive reports of alleged abuse from a wide range of sources and interact our customers to know those reports.”
Scrutiny of Perplexity’s practices has intensified after Forbes accused the corporate earlier this month of “directly ripping off” articles written by its reporters and others by CNBC and Bloomberg, including those who were behind paywalls.
Wired approached Amazon after its own investigation determined that Perplexity allegedly used an “unpublished IP address” to scrape web sites operated by its parent company Condé Nast — although it was attempting to block access.
The outlet said that representatives from other outlets, including Forbes, the Recent York Times and the Guardian, had detected the identical IP address visiting their servers.
Perplexity spokesperson Sara Platnick pushed back on Wired’s report, calling it “inaccurate.”
“Our PerplexityBot — which runs on AWS — respects robots.txt, and we confirmed that Perplexity-controlled services should not crawling in any way that violates AWS Terms of Service,” Platnick said in a press release.
“AWS looked into WIRED’s media query as a part of a regular protocol for investigating reports of abuse of AWS resources,” Platnick added. “We had not heard anything from AWS prior to a WIRED reporter contacting them. To say that AWS is ‘investigating’ Perplexity outside of this specific WIRED inquiry is wrong. AWS is a helpful partner to Perplexity and we’re grateful for his or her ongoing collaboration.”
Platnick told Wired that the PerplexityBot would bypass the robots.txt protocol in “very infrequent” circumstance that a user included a particular URL of their query.
Perplexity CEO Aravind Srinivas had previously slammed Wired’s findings, asserting that they “reflect a deep and fundamental misunderstanding of how Perplexity and the Web work.”
Forbes had taken issue with a feature called “Perplexity Pages,” a product that displays “curated” articles that pull details from articles written by third-party news outlets.
The unique authors weren’t credited by name, even when the wording of Perplexity’s posts closely matched that of the source text.
As a substitute, Perplexity used what Forbes described as “small, easy-to-miss logos” linking back to the unique sources.
In a single egregious example, Perplexity’s chatbot churned out a version of an exclusive, paywalled Forbes report on ex-Google CEO Eric Schmidt’s military drone project.
“Our reporting on Eric Schmidt’s stealth drone project was posted this AM by @perplexity_ai,” Forbes Executive Editor John Paczkowski wrote on X on the time. “It rips off most of our reporting. It cites us, and a couple of that reblogged us, as sources in essentially the most easily ignored way possible.”
Srinivas said the tool “has rough edges” but otherwise denied wrongdoing.