A new artificial intelligence model, DeepThought-8B, designed to execute step-by-step reasoning processes, has been launched by the company Ruliad. Built upon the LLaMA-3.1 8B architecture, this compact model offers businesses a more manageable alternative for tackling complex problem-solving tasks traditionally reserved for larger AI systems.

DeepThought-8B, which requires 16GB of VRAM, is positioned as a contender in the AI landscape, particularly excelling at coding, mathematical tasks, and instruction-following. Ruliad highlights that this new model is a significant leap toward making AI reasoning not only more transparent but also controllable, suggesting that smaller models can cultivate sophisticated reasoning abilities akin to their larger counterparts.

The model's operation is methodical, engaging in a sequence of specific steps for problem resolution. These stages include problem understanding, data gathering, analysis, calculation, verification, conclusion drawing, and implementation. The complexity of each task determines the number of steps involved. Ultimately, DeepThought outputs a JSON document that details each phase of its reasoning process, allowing users to understand and verify the thinking behind the outcomes.

Moreover, Ruliad emphasises user customisation, enabling adjustments to the model's reasoning patterns without the need for retraining. This feature is showcased through the included deepthought_inference tool, simplifying the integration of personalised reasoning strategies.

While Ruliad has not released specific benchmark scores for DeepThought-8B, they encourage users to test its capabilities and contribute findings to the user community. Summarised comparisons indicate that while DeepThought-8B performs similarly to LLaMA-3.1-8B-Instruct on coding and mathematical challenges, it notably excels in reasoning tasks. Comparisons with larger models, such as Qwuen-2-72B, reveal that DeepThought-8B surpasses it despite being significantly smaller. In contrast, larger models like GPT-4o, o1-mini, and Claude-3.5-Sonnet outperform DeepThought across all criteria, including reasoning tasks, reflecting the advantage of scale in AI capabilities.

User feedback from Hacker News highlights varied experiences with the model's performance. While some tasks posed challenges—such as identifying two prime numbers that sum to 123 or counting letters in non-lexical words—it successfully responded to the query of comparing weights of 2 kg of feathers and 1 kg of lead. Though this may seem a straightforward question, it poses a challenge for many smaller language models.

There is also a critical discourse among users regarding the characterization of the model's reasoning capabilities. Some participants argue that techniques, such as beam search used by models to find optimal solutions, might not equate to true reasoning. This argument is supported by research indicating limitations in the problem-solving abilities of large language models, which often rely on narrow problem-solving procedures that may not generalise well to differing scenarios.

DeepThought-8B is accessible for download on Hugging Face or for use directly on Ruliad's website, inviting businesses and developers to explore its innovative approach to AI reasoning.

Source: Noah Wire Services