In the rapidly evolving landscape of blockchain technology, smart contracts have emerged as a critical component, enabling self-executing agreements on decentralized platforms. Solidity, a programming language specifically designed for writing smart contracts, has become the dominant language in this domain. However, the unique constraints of smart contracts, such as gas consumption, security, and determinism, pose significant challenges for code generation, particularly when leveraging advanced language models. Researchers Francesco Salzano, Simone Scalabrino, Rocco Oliveto, and Simone Scalabrino from the University of Molise in Italy have delved into this complex issue, publishing their findings in a recent study.
The researchers set out to evaluate the reliability of smart contract code generated by large language models (LLMs), focusing on functional and non-functional properties that are crucial for the energy sector and other industries utilizing blockchain technology. Their study, published in the journal “Empirical Software Engineering,” benchmarked four state-of-the-art models under zero-shot and retrieval-augmented generation settings across 500 real-world functions.
The team employed a multi-faceted assessment approach, utilizing code similarity metrics, semantic embeddings, automated test execution, gas profiling, and cognitive and cyclomatic complexity analysis. Their findings revealed that while LLMs produce code with high semantic similarity to real contracts, the functional correctness is notably low. Only 20% to 26% of zero-shot generations behaved identically to ground-truth implementations under testing. The generated code was consistently simpler, with significantly lower complexity and gas consumption, often due to omitted validation logic.
Retrieval-Augmented Generation (RAG) markedly improved performance, boosting functional correctness by up to 45% and yielding more concise and efficient code. This enhancement suggests that RAG can be a powerful tool for improving the reliability of LLM-generated smart contracts. However, the researchers conclude that achieving robust, production-ready code generation remains a substantial challenge, necessitating careful expert validation.
For the energy sector, which is increasingly exploring blockchain technology for applications such as peer-to-peer energy trading, grid management, and renewable energy certification, the implications of this research are significant. Smart contracts are integral to these applications, and ensuring their reliability is paramount. The findings highlight the need for cautious adoption of LLM-generated code and the importance of expert validation to ensure functional correctness and security.
In summary, while LLMs show promise in generating smart contract code, their current reliability is limited. RAG enhances this reliability, but expert validation remains essential. For the energy sector, this research underscores the importance of thorough testing and validation processes when integrating LLM-generated smart contracts into blockchain applications.
This article is based on research available at arXiv.

