Current location: Home> Ai News

OpenAI bot accused of launching DDoS attacks on small e-commerce websites and stealing data

Author: LoRA Time: 11 Jan 2025 439

Recently, Trilegangers CEO Oleksandr Tomchuk received an alert that his company's e-commerce site was down. After investigating, he discovered that the culprit was a bot from OpenAI that was relentlessly trying to crawl his entire massive website. The website has more than 65,000 products, and each product has a page with at least three photos. OpenAI sent "tens of thousands" of server requests in an attempt to download the entire contents, hundreds of thousands of photos and their detailed descriptions.

Tomchuk said that OpenAI's crawler was taking down their website, which was basically a DDoS attack. The company sells 3D object files as well as photos - from hands to hair, skin and full bodies - to 3D artists, video game makers and anyone who needs to digitally recreate real human features.

Trilegangers' website is its business. The company has spent more than a decade building what it calls the web's largest database of "digital avatars," 3D image files scanned from real human models.

Tomchuk's team, which is based in Ukraine but also licensed by the U.S. city of Tampa, Florida, has a terms of service page on its website that prohibits bots from taking its images without permission. But that alone didn't make a difference. Websites must use a properly configured robot.txt file with tags that explicitly tell OpenAI's robot GPTBot not to disturb the website.

openai-crawler-log-2-e1736526937976.jpg

Robot.txt, also known as the Robot Exclusion Protocol, was created to tell search engines what content a website should not crawl when indexing a web page. OpenAI says on its information page that it respects such files when configured with its own set of no-crawl tags, but it also warns that it may take up to 24 hours for its robots to recognize updated robot.txt files .

Tomchuk said that if a website doesn't use robot.txt correctly, OpenAI and other companies think they can scrape the data as they please. This is not an optional system.

To make matters worse, not only was Trilegangers forced offline by OpenAI's bot during U.S. business hours, but Tomchuk expected a significant increase in his AWS bill due to all the bot's CPU and download activity.

Robot.txt is not a foolproof solution either. AI companies voluntarily comply with it. Last summer, another AI startup, Perplexity, famously came under the scrutiny of a Wired investigation over some evidence that Perplexity wasn't complying with it.

Tomchuk said he couldn't find a way to contact OpenAI and ask. OpenAI did not respond to TechCrunch's request for comment. OpenAI has so far failed to provide its long-promised opt-out tools.

For Triplegangers, this is a particularly thorny issue. "We're in a business where rights issues are pretty serious because we're scanning real people," he said. Under laws such as Europe's GDPR, "They can't just take photos of anyone online and use them."

hacker, cyber attack, writing code

Ironically, the OpenAI bot's greed made the Triplegangers realize just how exposed it was. He said if it had scratched more gently, Tomchuk would never have known.

"It's scary because these companies appear to be exploiting a loophole to scrape data and they say 'if you update your robot.txt with our tags, you can opt out,'" Tomchuk said, but that leaves businesses It is the Lord's responsibility to know how to stop them.

He wants other small online businesses to know that the only way to find out if an AI bot is stealing a website's copyrighted assets is to proactively look for it. He's certainly not the only one to be terrorized by an AI robot. Other website owners recently told Business Insider how OpenAI bots disrupted their sites and increased their AWS bills.

By 2024, this problem will worsen. New research from digital advertising company DoubleVerify found that artificial intelligence crawlers and scraping tools will lead to an 86% increase in "general invalid traffic" in 2024, which is traffic that does not come from real users.

FAQ

Who is the AI course suitable for?

AI courses are suitable for people who are interested in artificial intelligence technology, including but not limited to students, engineers, data scientists, developers, and professionals in AI technology.

How difficult is the AI course to learn?

The course content ranges from basic to advanced. Beginners can choose basic courses and gradually go into more complex algorithms and applications.

What foundations are needed to learn AI?

Learning AI requires a certain mathematical foundation (such as linear algebra, probability theory, calculus, etc.), as well as programming knowledge (Python is the most commonly used programming language).

What can I learn from the AI course?

You will learn the core concepts and technologies in the fields of natural language processing, computer vision, data analysis, and master the use of AI tools and frameworks for practical development.

What kind of work can I do after completing the AI ​​course?

You can work as a data scientist, machine learning engineer, AI researcher, or apply AI technology to innovate in all walks of life.