How the small Chinese AI start-up DeepSeek shocked Silicon Valley


A small Chinese artificial intelligence lab stunned the world this week by revealing the technical recipe for its flagship model, transforming its reclusive leader into a national hero who has resisted U.S. attempts to halt China’s high-tech ambitions.

DeepSeek, founded by hedge fund manager Liang Wenfeng, released its R1 model on Monday, explaining in a detailed paper how to build a large language model on a shoestring budget that can automatically learn and self-improve without human supervision.

US companies such as OpenAI and Google DeepMind pioneered the development of reasoning models, a relatively new area of ​​AI research that attempts to adapt models to human cognitive abilities. In December, San Francisco-based OpenAI released this Full version of his o1 model but kept his methods secret.

DeepSeek’s R1 release sparked a heated debate in Silicon Valley about whether better-resourced U.S. AI companies, including Meta and Anthropic, can maintain their technological edge.

Liang has now become a center of national pride in his own country. This week he was the only one AI The leader was selected to attend a highly publicized meeting of business leaders with the country’s second most powerful leader, Li Qiang. Entrepreneurs were told to “focus their efforts on disrupting key core technologies.”

In 2021, Liang began purchasing thousands of Nvidia GPUs for his AI side project while also running his quant trading fund High-Flyer. Industry insiders saw it as the eccentric actions of a billionaire looking for a new hobby.

“When we first met him, he was a very nerdy guy with a terrible haircut who was talking about building a 10,000-chip cluster to train his own models. We didn’t take him seriously,” said one of Liang’s business partners.

“He couldn’t articulate his vision other than to say, ‘I want to build this, and it will be a game-changer.’ We thought this was only possible through giants like ByteDance and Alibaba,” the person added.

Liang’s status as an outsider in the AI ​​field was an unexpected source of strength. At High-Flyer, he built a fortune by using AI and algorithms to identify patterns that could impact stock prices. His team was adept at using Nvidia chips to make money trading stocks. In 2023, he launched DeepSeek and announced his intention to develop human-level AI.

“Liang has built an exceptional infrastructure team that really understands how the chips work,” said a founder of a rival LLM company. “He took his best people from the hedge fund to DeepSeek.”

After Washington banned Nvidia from exporting its most powerful chips to China, local AI companies were forced to find innovative ways to maximize the computing power of a limited number of onshore chips – a problem that Liang’s team already knew how to solve.

“DeepSeek’s engineers know how to unlock the potential of these GPUs, even if they are not state-of-the-art,” said an AI researcher close to the company.

Industry insiders say DeepSeek’s unique focus on research makes the company a dangerous competitor because it is willing to share its breakthroughs rather than protect them for commercial purposes. DeepSeek has not raised money from outside funds or taken any significant steps to monetize its models.

“DeepSeek is run like the early days of DeepMind,” said an AI investor in Beijing. “The focus is solely on research and technology.”

Liang, who is personally involved in DeepSeek’s research, uses the proceeds from his hedge fund trading to pay top salaries for the best AI talent. Along with TikTok owner ByteDance, DeepSeek is known for giving AI engineers in China the highest compensation available, with staff based in offices in Hangzhou and Beijing.

“DeepSeek’s offices feel like a university campus for serious researchers,” the business partner said. “The team believes in Liang’s vision: to show the world that the Chinese can be creative and build something from the ground up.”

DeepSeek and High-Flyer did not respond to a request for comment.

Liang has described DeepSeek as a uniquely “local” company, made up of graduate students from leading Chinese schools as well as Peking, Tsinghua and Beihang universities, rather than experts from U.S. institutions.

In an interview with the domestic press last year, he said that his core team “did not have people returning from abroad.” They are all local. . . We have to develop the top talent ourselves.” DeepSeek’s identity as a purely Chinese LLM company has earned it widespread praise domestically.

DeepSeek said it used just 2,048 Nvidia H800s and $5.6 million to train a model with 671 billion parameters, a fraction of what OpenAI and Google spent on training models of comparable size.

Ritwik Gupta, an AI policy researcher at the University of California, Berkeley, said DeepSeek’s recent modeling releases show that “there is no moat when it comes to AI capabilities.”

“The first person to train models has to spend a lot of resources to get there,” he said. “But the second mover gets there cheaper and faster.”

Gupta added that China has a much larger talent pool of systems engineers than the US who understand how to make the most of computing resources to train and run models more cost-effectively.

Industry insiders say that while DeepSeek has shown impressive results with limited resources, it remains an open question as to whether it can remain competitive as the industry evolves.

Revenue from High-Flyer, its big backer, lagged in 2024, which a person close to Liang attributed to the founder’s attention being focused primarily on DeepSeek.

His US rivals are not standing still. They are building mega “clusters” of Nvidia’s next-generation Blackwell chips, creating the computing power that could once again lead to a performance gap against Chinese rivals.

This week OpenAI said this was the case Establishment of a joint venture with Japan’s SoftBank, called Stargate, planning to spend at least $100 billion on AI infrastructure in the US. Elon Musk’s xAI is massively expanding its Colossus supercomputer with more than 1 million GPUs to help train its Grok AI models.

“DeepSeek has one of the largest advanced computing clusters in China,” said Liang’s business partner. “At the moment they have enough capacity, but not for long.”

Additional reporting by Wenjie Ding in Beijing



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *