DeepSeek's surprisingly affordable AI model challenges industry giants. The Chinese startup claims to have trained its powerful DeepSeek V3 neural network for a mere $6 million, utilizing only 2048 GPUs, a stark contrast to competitors' significantly higher costs. This seemingly low figure, however, omits substantial expenses like research, refinement, data processing, and infrastructure.
DeepSeek's innovative approach leverages several key technologies: Multi-token Prediction (MTP) for enhanced accuracy and efficiency; Mixture of Experts (MoE) with 256 neural networks for accelerated training and performance; and Multi-head Latent Attention (MLA) to focus on crucial sentence elements.
Image: ensigame.com
Contrary to DeepSeek's publicized figures, SemiAnalysis reveals a massive computational infrastructure involving approximately 50,000 Nvidia Hopper GPUs, including H800, H100, and H20 units, spread across multiple data centers. The total server investment is estimated at $1.6 billion, with operational costs reaching $944 million.
Image: ensigame.com
DeepSeek, a subsidiary of High-Flyer hedge fund, owns its data centers, unlike cloud-reliant competitors, granting it greater control and faster innovation implementation. Its self-funded status contributes to agility and rapid decision-making. The company attracts top talent, with some researchers earning over $1.3 million annually, primarily from Chinese universities.
Image: ensigame.com
While DeepSeek's $6 million training cost claim is misleading, its overall investment exceeds $500 million. Its lean structure enables efficient innovation deployment, contrasting with larger, more bureaucratic companies. The company's success hinges on substantial investment, technological advancements, and a skilled team.
Image: ensigame.com
DeepSeek's story showcases a well-funded independent AI company successfully competing with industry leaders. However, the narrative of revolutionary cost-effectiveness requires nuanced understanding, given the substantial overall investment. The contrast remains stark, though: DeepSeek's R1 model cost $5 million to train, compared to ChatGPT4's $100 million. Despite the clarified expenses, DeepSeek's efficiency still presents a compelling challenge to the established order.