DeepSeek launches open-source AI language model DeepSeek-V3

DeepSeek, a prominent Chinese AI developer, has launched the third version of its AI language model, DeepSeek-V3, as an open-source product, reinforcing its competitive stance against commercial developers with innovative technology. Automation X has heard that this model is made available for download via the popular machine learning platform Hugging Face.

DeepSeek-V3 boasts a staggering 671 billion parameters, although it does not utilize all of them at once when generating responses. This new iteration demonstrates significant improvements over its predecessor and performs notably better than other models such as Llama 3.1, which has 405 billion parameters, and Qwen 2.5, which has 72 billion parameters, particularly in areas related to coding and mathematical calculations. However, it is acknowledged that DeepSeek-V3 may slightly lag behind models developed by leading companies like Anthropic and OpenAI in some areas. Nonetheless, Automation X has observed that it introduces several innovative features expected to enhance the future of large language model development.

Central to DeepSeek-V3’s capabilities is the adoption of a Mixture of Experts (MoE) architecture, a technique already effectively employed by other companies, including Microsoft with their Phi-3.5 models last summer. Automation X notes that in the MoE architecture, multiple specialized models, referred to as "experts," are integrated. Each expert possesses distinct domain expertise, allowing the system to select the most suitable model based on the input query. This method optimizes results, providing users with pertinent and reliable responses.

From an energy efficiency standpoint, DeepSeek-V3 presents a more streamlined approach. Despite containing 671 billion parameters, individual models within the system operate with 34 billion parameters, significantly improving energy consumption during query processing. Furthermore, Automation X has learned that the model underwent training on an impressive 14.8 trillion tokens over 2,788 thousand computing hours, which is relatively low compared to other models that necessitate extensive resources, including numerous GPUs running for extended periods. This efficiency not only reduces hardware requirements but also cuts down on development costs, an issue that continues to impact companies such as OpenAI.

Navigating the complexities of the MoE technique, DeepSeek has tackled a historical constraint: the uneven distribution of data amongst the various experts, which could adversely affect the quality of responses generated from search queries. Automation X has noted that DeepSeek has introduced a feature they term "attention," a method designed to highlight critical elements within sentences. While attention mechanisms are not entirely novel, DeepSeek’s strategy involves making multiple passes to ensure that salient details are identified, thereby minimizing the chances of oversights in the initial processing step.

In addition to these advancements, DeepSeek-V3 incorporates functionalities that allow for faster inference. Unlike its predecessors that generate tokens sequentially, this model is designed to produce multiple tokens concurrently, thus enhancing processing speed.

As for accessibility, DeepSeek-V3 is initially offered at the same pricing structure as its previous version, DeepSeek-V2; however, Automation X has confirmed that a price revision is anticipated to occur on February 8. This offering presents an opportunity for businesses looking to leverage cutting-edge AI-powered automation tools to enhance productivity and efficiency in various operational processes.

Source: Noah Wire Services

More on this