Article
DeepSeek’s AI shows 74% resemblance to ChatGPT, raising copyright concerns
In a significant development, Chinese AI startup DeepSeek has gained attention for its AI-generated content, which a new study claims bears a close re
In a significant development, Chinese AI startup DeepSeek has gained attention for its AI-generated content, which a new study claims bears a close resemblance to OpenAI’s ChatGPT. The findings of the research raise important questions about the ethical implications surrounding AI technology and copyright law. The study conducted by AI detection firm Copyleaks revealed an alarming 74.2% similarity between the textual outputs of DeepSeek’s AI and OpenAI’s ChatGPT. This resemblance prompts investors and industry experts to scrutinize the methods used by DeepSeek in developing its AI model.
DeepSeek emerged earlier this year with its cost-effective R1 V3-powered AI, which has been claimed to outperform OpenAI’s renowned models across various benchmarks, including mathematics, coding, and scientific reasoning—all at only a fraction of the cost. While DeepSeek’s representatives have proclaimed that the model was trained on a budget of approximately $6 million, allegations have surfaced proposing that the startup may have cut corners by utilizing copyrighted materials from Microsoft and OpenAI during its training process.
Multiple reports indicate that DeepSeek possibly invested an astounding $1.6 billion in required hardware, including a fleet of 50,000 NVIDIA Hopper GPUs. Concerns escalated after OpenAI lodged a complaint, suggesting that certain copyrighted data had been improperly used for training DeepSeek’s AI. In the AI realm, the term “distillation” has been cited, referring to the process of employing outputs from existing AI models (like ChatGPT) to train new models, thereby substantially reducing the financial and temporal investment needed.
Copyleaks utilized advanced algorithms to examine the writing styles of AI models, coming to the conclusion that DeepSeek’s outputs overwhelmingly mirrored those of OpenAI. Shai Nisan, head of data science at Copyleaks, emphasized the significance of their study’s unanimous results, which indicated a pronounced stylistic similarity exclusively with the OpenAI models—similarities that were not observed with other sampled AI outputs.
DeepSeek’s assertion that it utilized established data precedents to generate its writing highlights a growing dilemma in the realm of AI ethics, as transparency in AI development and the usage of training datasets come under intensified scrutiny. If DeepSeek is found guilty of copyright infringement, the potential repercussions could be detrimental, leading to extensive legal challenges, significant financial penalties, and damages to its reputation. Investors are beginning to express anxiety over the potential implications of such findings, especially given the high stakes involved in the AI industry.
Despite the findings, the implications do not conclusively label DeepSeek’s model as an outright copy of OpenAI’s technology. Nevertheless, the situation demands closer examination of DeepSeek’s architecture and development processes to clarify the authenticity and originality of its AI output. As Nisan pointed out, while similarities don’t undoubtedly categorize DeepSeek as derivative, they do raise pressing concerns regarding its developmental practices.
The landscape of AI technology confrontation stemming from copyright issues is ever-evolving. OpenAI itself is embroiled in its share of copyright lawsuits, including a notable case where multiple publishers have contested the legality of the training methods used for its models. The complexity surrounding AI-generated content and the boundaries of intellectual property create a murky scenario, necessitating a reevaluation of legal frameworks surrounding AI technologies and the datasets that inform them.
In summary, as DeepSeek’s AI content emerges as a significant player in the AI industry, the implications of copyright and ethical AI practices loom large. This situation presents a pivotal moment for AI developers to examine their methodologies and ensure lawful and ethical practices moving forward, particularly in leveraging established models and training data. Investors, regulators, and the public must remain vigilant as the reverberations of these developments unfold in the rapidly advancing field of artificial intelligence.
