Home > News > ChatGPT Maker Suspects China’s Dirt Cheap DeepSeek AI Models Were Built Using OpenAI Data — and the Irony Is Not Lost on the Internet

ChatGPT Maker Suspects China’s Dirt Cheap DeepSeek AI Models Were Built Using OpenAI Data — and the Irony Is Not Lost on the Internet

Author : Hunter

Mar 04,2025

OpenAI suspects that China's DeepSeek AI models, significantly cheaper than Western counterparts, were trained using OpenAI's data. This revelation, coupled with DeepSeek's rapid rise in popularity, triggered a dramatic market downturn for major AI companies. Nvidia, a key player in GPU technology crucial for AI model development, experienced its largest-ever single-day stock loss, losing nearly $600 billion in market value. Other tech giants like Microsoft, Meta, and Alphabet also suffered significant declines.

DeepSeek's R1 model, based on the open-source DeepSeek-V3, boasts significantly lower training costs (estimated at $6 million) compared to Western models. While this claim is disputed by some, it has fueled concerns about the massive investments Western companies are making in AI. The surge in DeepSeek's downloads further underscores its impact.

OpenAI and Microsoft are investigating whether DeepSeek violated OpenAI's terms of service by using its API or employing a technique called "distillation" – extracting data from larger models to train smaller ones. OpenAI acknowledges that Chinese companies are actively attempting to replicate leading US AI models and emphasizes its commitment to protecting its intellectual property (IP) through various countermeasures and collaboration with the US government.

David Sacks, President Trump's AI czar, supports the claim of data extraction, suggesting that OpenAI will likely implement measures to prevent future instances of distillation.

This situation highlights the irony of OpenAI's position, given its own history of using copyrighted material to train ChatGPT. OpenAI has previously argued that creating today's leading AI models without copyrighted material is impossible, a stance supported by its submission to the UK's House of Lords and challenged by lawsuits from the New York Times and 17 authors alleging copyright infringement. OpenAI maintains that its training practices constitute "fair use." The legal battles surrounding AI training data and copyright continue to unfold, with a 2018 US Copyright Office ruling stating that AI-generated art cannot be copyrighted due to the lack of a "nexus between the human mind and creative expression."

DeepSeek is accused of using OpenAI’s model to train its competitor using distillation. Image credit: Andrey Rudakov/Bloomberg via Getty Images.