How Chinese AI Startup DeepSeek Made a Model that Rivals OpenAI


Today, DeepSeek is one of the only leading AI companies in China that does not rely on funding from tech giants like Baidu, Alibaba or ByteDance.

A Young Group of Geniuses Eager to Prove Themselves

According to Liang, when he put together DeepSeek’s research team, he wasn’t looking for experienced engineers to build a consumer-facing product. Instead, he focused on PhD students from China’s top universities, including Peking University and Tsinghua University, who were eager to prove themselves. Many were published in leading journals and won awards at international academic conferences, but lacked industry experience, according to the Chinese tech publication QBitAI.

“Our core technical positions are mostly filled by people who graduated this year or in the past year or two,” Liang told 36Kr in 2023. The hiring strategy helped create a collaborative company culture where people were free to use extensive computing resources to pursue unorthodox research projects. It’s a completely different way of operating from established Internet companies in China, where teams often compete for resources. (A recent example: ByteDance has accused a former intern—a prestigious academic award winner, no less—from sabotaging the work of his colleagues to amass more computing resources for his team.)

Liang said undergraduates may be better suited for high-investment, low-return research. “Most people, when they are young, can devote themselves completely to a mission without utilitarian considerations,” he explained. His pitch to prospective hires is that DeepSeek was created to “solve the most difficult questions in the world.”

The fact that these young researchers are almost entirely educated in China adds to their drive, experts say. “This younger generation also embodies a sense of patriotism, especially as they navigate US restrictions and choke points in critical hardware and software technologies,” explains Zhang. “Their determination to overcome these barriers reflects not only personal ambition but also a broader commitment to advancing China’s position as a global innovation leader.”

Innovation Born of Crisis

In October 2022, the US government began putting together export controls that severely restricted Chinese AI companies from accessing cutting-edge chips like Nvidia’s H100. The move presented a problem for DeepSeek. The company started with a storage of 10,000 H100, but it needed more to compete with companies like OpenAI and Meta. “The problem we face has never been financing, but the export control of advanced chips,” Liang told 36Kr. in a second interview in 2024.

DeepSeek had to come up with more efficient methods to train its models. “They optimized their model architecture using a battery of engineering tricks – custom communication schemes between chips, reducing the size of fields to save memory, and an innovative use of the mixture model approach,” says Wendy Chang, a software engineer turned policymaker. analyst at the Mercator Institute for China Studies. “Many of these approaches are not new ideas, but combining them successfully to produce a cutting-edge model is a remarkable achievement.”

DeepSeek has also made significant progress on Multi-head Latent Attention (MLA) and Mixture-of-Experts, two technical designs that make DeepSeek models more cost-effective by requiring fewer computing resources to train. In fact, DeepSeek’s latest model is so efficient that it required a tenth of the computing power of Meta’s comparable Llama 3.1 model to train, according to the research institution Era AI.

DeepSeek’s willingness to share these innovations with the public has earned it considerable goodwill within the global AI research community. For many Chinese AI companies, developing open-source models is the only way to play catch-up with their Western counterparts, as it attracts more users and contributors, who in turn help the models grow. “They have now demonstrated that cutting-edge models can be built using less, though still a lot, of money and that the current standards of model building leave a lot of room for optimization,” Chang says. “We’re sure to see a lot more testing in this direction going forward.”

The news could spell trouble for the current US export controls, which focus on creating computing resource bottlenecks. “Existing estimates of how much AI computing power China has, and what they can achieve with it, could be overturned,” Chang says.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *