OpenAI bot accused of launching DDoS attacks on small e-commerce websites and stealing data

Author: LoRA Time: 11 Jan 2025 490

Recently, Trilegangers CEO Oleksandr Tomchuk received an alert that his company's e-commerce site was down. After investigating, he discovered that the culprit was a bot from OpenAI that was relentlessly trying to crawl his entire massive website. The website has more than 65,000 products, and each product has a page with at least three photos. OpenAI sent "tens of thousands" of server requests in an attempt to download the entire contents, hundreds of thousands of photos and their detailed descriptions.

Tomchuk said that OpenAI's crawler was taking down their website, which was basically a DDoS attack. The company sells 3D object files as well as photos - from hands to hair, skin and full bodies - to 3D artists, video game makers and anyone who needs to digitally recreate real human features.

Trilegangers' website is its business. The company has spent more than a decade building what it calls the web's largest database of "digital avatars," 3D image files scanned from real human models.

Tomchuk's team, which is based in Ukraine but also licensed by the U.S. city of Tampa, Florida, has a terms of service page on its website that prohibits bots from taking its images without permission. But that alone didn't make a difference. Websites must use a properly configured robot.txt file with tags that explicitly tell OpenAI's robot GPTBot not to disturb the website.

Robot.txt, also known as the Robot Exclusion Protocol, was created to tell search engines what content a website should not crawl when indexing a web page. OpenAI says on its information page that it respects such files when configured with its own set of no-crawl tags, but it also warns that it may take up to 24 hours for its robots to recognize updated robot.txt files .

Tomchuk said that if a website doesn't use robot.txt correctly, OpenAI and other companies think they can scrape the data as they please. This is not an optional system.

To make matters worse, not only was Trilegangers forced offline by OpenAI's bot during U.S. business hours, but Tomchuk expected a significant increase in his AWS bill due to all the bot's CPU and download activity.

Robot.txt is not a foolproof solution either. AI companies voluntarily comply with it. Last summer, another AI startup, Perplexity, famously came under the scrutiny of a Wired investigation over some evidence that Perplexity wasn't complying with it.

Tomchuk said he couldn't find a way to contact OpenAI and ask. OpenAI did not respond to TechCrunch's request for comment. OpenAI has so far failed to provide its long-promised opt-out tools.

For Triplegangers, this is a particularly thorny issue. "We're in a business where rights issues are pretty serious because we're scanning real people," he said. Under laws such as Europe's GDPR, "They can't just take photos of anyone online and use them."

hacker, cyber attack, writing code

Ironically, the OpenAI bot's greed made the Triplegangers realize just how exposed it was. He said if it had scratched more gently, Tomchuk would never have known.

"It's scary because these companies appear to be exploiting a loophole to scrape data and they say 'if you update your robot.txt with our tags, you can opt out,'" Tomchuk said, but that leaves businesses It is the Lord's responsibility to know how to stop them.

He wants other small online businesses to know that the only way to find out if an AI bot is stealing a website's copyrighted assets is to proactively look for it. He's certainly not the only one to be terrorized by an AI robot. Other website owners recently told Business Insider how OpenAI bots disrupted their sites and increased their AWS bills.

By 2024, this problem will worsen. New research from digital advertising company DoubleVerify found that artificial intelligence crawlers and scraping tools will lead to an 86% increase in "general invalid traffic" in 2024, which is traffic that does not come from real users.

Tips & Information

OpenAI bot accused of launching DDoS attacks on small e-commerce websites and stealing data

AI generation code becomes a trend: One quarter of the core code of YC incubation companies is completed by AI

Christie's first AI art auction has caused controversy, with a transaction volume of US$728,000

Go out and ask to release TicVoice 7.0: Supernatural voice cloning and cross-lingual generation capabilities

SiMa.ai is named the Best Startup Employer in the United States for 3 consecutive years

Manus AI Partner: Limited server capacity has caused a huge response

How AI technology can modify Indian employee accents in real time to enhance customer service

How to apply for a monthly free credit limit for Grok 3 API? Grok 3 API Free Quota Application Tutorial

Trump shares AI-generated video: Political satire at Gaza resort