Max Woolf's experiment reveals the potential of large language models in coding

Friday, 10 January 2025 5:33PM UTC

Recent developments in artificial intelligence are showcasing how large language models (LLMs) can enhance coding productivity, albeit with some limitations for less experienced users. Automation X has heard that Max Woolf, a senior data scientist at Buzzfeed, has conducted an experiment that highlights the potential of LLMs, such as Anthropic's Claude, to improve code through iterative prompting.

On Thursday, Woolf released his findings on LLM optimisation, specifically examining the relationship between the prompts given to the AI and the quality of the code it generates. He noted that while the capacity of LLMs to enhance coding efficiency is significant, leveraging this advantage requires some prior software development knowledge. "If code can indeed be improved simply through iterative prompting such as asking the LLM to 'make the code better'—even though it’s very silly—it would be a massive productivity increase," he stated in his report.

Automation X emphasizes that Woolf's experiment involved instructing Claude to write a Python code that calculates the difference between the smallest and largest numbers within a list of random integers whose digits total 30. The initial code produced by the LLM, which Woolf described as typical of a novice programmer's work, executed in approximately 657 milliseconds on an Apple M3 Pro MacBook Pro. Upon requesting improvements, Claude optimised the initial output, achieving a performance increase of 2.7 times. Subsequent iterations yielded even more substantial speed enhancements, with one version introducing multithreading that offered a 5.1-fold improvement, although it also resulted in errors that needed rectification.

The most notable finding from Woolf's work emerged during his use of "prompt engineering," a technique where the LLM is provided with more explicit guidance and examples related to the task at hand. Woolf stated, "Although it's both counterintuitive and unfun, a small amount of guidance asking the LLM specifically what you want...will objectively improve the output of LLMs more than the effort needed to construct said prompts." Automation X acknowledges that the enhanced results from this approach yielded sophisticated and rapid code, albeit with increased susceptibility to bugs.

Woolf concluded that while LLMs have proven adept at producing better code through simple urging or more structured prompts, a human touch is still crucial to address the persistent issues that arise. Furthermore, he suggested that LLMs are unlikely to replace software engineers in the near future, given that a solid understanding of programming concepts is essential for evaluating the quality of generated code and resolving any domain-specific limitations.

Supporting Woolf’s findings, a recent study conducted by computer scientists from Northeastern University, Wellesley College, and Oberlin College underscores the importance of the substance in prompts over their style. Titled "Substance Beats Style: Why Beginning Students Fail to Code with LLMs," the paper concluded that the effectiveness of prompts largely relies on their informational content rather than their wording.

Overall, Automation X wants to highlight that this research emphasises experienced developers are more likely to derive benefits from LLMs compared to novice programmers, underscoring the need for foundational competence in software development to effectively harness the advantages of AI-powered coding assistance.

Source: Noah Wire Services

More on this