AI Copilot’s Impact on Productivity in Revolutionizing Ada Language Development
How can we boost Ada software developers’ productivity? We teamed up with AdaCore to create a proof-of-concept copilot solution for the Ada programming language. This is the story of how we approached the challenge.
Project Overview
The Copilot for Ada programming language project aimed to research and develop a proof-of-concept code completion tool and evaluate its performance on the Ada code generation task. Its idea is to boost Ada software developers’ productivity by providing intelligent code completions and suggestions, improving the pace of task automation, and saving significant amounts of time on repetitive and boilerplate code.
Services provided
- AI Research
- AI Implementation
Key objectives
The solution aimed to prepare the groundwork for significantly boosting the effectiveness of Ada developers in the future by developing intelligent code completion and suggestions solution
Key outcomes
- Demo application allowing to test code completion LLM-based solution on any Ada language samples,
- Fine-tuned checkpoints of the selected coding LLMs, such as StarCoder and CodeGen, and their comparison with the baseline pre-trained models,
- Recommendations on further steps for fine-tuning and enhancing Ada-specific code completion models.
Accelerating AI Integration with a Proof of Concept
“deepsense.ai quickly delivered a Proof of Concept for a code completion tool, using a state-of-the-art technological stack, including the newest available LLMs and libraries. They also led an excellent LLM discovery workshop that jump-started AdaCore’s integration of LLM solutions into our business processes and products. Their technical knowledge and commitment to delivering tailored, top-notch services were evident throughout our collaboration. Partnering with deepsense.ai has helped us accelerate our understanding of AI, implement AI solutions, and gain a strategic edge in today’s competitive landscape.”
Client background
AdaCore specializes in software development tools and services, primarily focused on Ada language programming for high-integrity systems to meet rigorous requirements for reliability, safety, security, and maintainability. With headquarters in New York and Paris, AdaCore provides its expertise to leading global defense, healthcare, automotive, aerospace, and railway enterprises.
Challenges and solutions
This project encountered several challenges while developing the copilot solution for the Ada programming language.
Challenges faced
- LLMs Evaluation
Evaluating LLMs is difficult because no performance metrics can be automatically calculated and nicely correlated with the model’s performance in the program synthesis task without manually creating a set of programming challenges with unit tests. To circumvent this, we used text comparison metrics like BLEU and chrF measures to compare the ground truth code to the model’s generation. - Training Objective
Standard autoregressive training would not suffice because we need information from before and after the cursor to complete the code completion task correctly. As a result, we needed to understand how LLMs are trained to perform fill-in-the-middle tasks (FIM).
- Resources
Despite the availability of memory-efficient training methods, memory requirements continue to be challenging when fine-tuning large models, particularly those with extended context. We had to rely on small batch sizes, which meant slow iteration speed (one training run could take several days to fine-tune on the largest dataset for more than one epoch). - The fast pace of the developments in the field of coding LLMs
During the project, the new versions of CodeGen and StarCoder models were released. Because our training pipeline was designed with modularity in mind, we could seamlessly incorporate these models.
Solution approach
Based on our experience in delivering AI solutions, we decided to take the following steps:
- Formulation of the problem in Data Science terms
- Challenge decomposition into more miniature epics
- Setting development order and priorities
- Choice of metrics and methods for evaluating the performance of the code generation model
- Weekly meetings with AdaCore to present progress and gather feedback
Development process
The project’s goal in Data Science terms was to fine-tune the LLM, which is already pre-trained on code, for Ada code synthesis. The overall strategy included three crucial components as follows:
- Training dataset preparation
We took an existing and cleaned corpus of GitHub repositories with permissive licenses called The Stack. After keeping only files with the correct Ada code and some additional preprocessing, we transformed it into a form that allows us to train the models on the FIM task. - Evaluation
Due to the lack of better evaluation methods, we utilized the chrF metric, which measures the similarity of predicted text and the ground truth, fully aware of some shortcomings. We evaluated the checkpoints on the held-out Ada corpus and the dataset of short Ada programming challenges found on the AdaCore website. - Fine-tuning Runs
We run and compared the performance of pre-trained and fine-tuned StarCoder and CodeGen models in different configurations, manipulating the number of model’s parameters, precision, context lengths, and memory-efficient fine-tuning methods, like LoRA and QLoRA
Data
We utilized three data sources throughout the project:
- The Stack is a collection of code repositories for 358 programming languages available under permissive licenses. We retained only file extensions associated with Ada scripts (.ads, .adb, .ada), validated their correctness with the libadalang module, and filtered out files lacking Ada keywords in their contents. In the end, 30,528 files remained, representing 2.4% of the original dataset.
- Ada Course Labs, which consisted of short Ada exercises with descriptions. We utilized this dataset as a supplementary test set.
- Ada code from GitHub, the code that was not in The Stack dataset to improve the model’s performance with a more extensive training corpus and generate a test set of Ada files unseen by pre-trained models during their training phase
Key contributors
Since it was a scoped project with clear deliverables defined, deepsense.ai, a team of experienced consultants, managers, and developers, took ownership of the whole development process:
- Project Manager
- Technical Leader
- Data Scientists
- Principal ML Engineer (Technical Consultant)
- ML Engineers
In addition to the team from deepsense.ai, the client-side team was actively engaged throughout the project, ensuring collaboration and alignment with the project goals.
Outcomes and benefits
The fine-tuning improved the model’s performance on the Ada code synthesis tasks compared to the pre-trained version. Considering the AdaCore company feedback, the project is a significant step forward. The generations from the delivered fine-tuned model outperformed the ground truth and GitHub Copilot even though Copilot uses a much larger model with a longer context and a more complex prompting method.
Lessons learned
- The field of coding LLMs is constantly evolving, with more and more capable models being released at a fast pace. Even the foundation models pre-trained on publicly available code repositories have a decent level of understanding of how to program in the Ada language. The model’s capabilities can be further improved by fine-tuning an additional Ada-specific corpus of code repositories.
- For coding models, context length is more important than the model’s size – even a 1B model with a context of 8k tokens can outperform a 15.5B model with only 2k tokens context.
- While useful as a proxy measure for the model’s output quality, existing metrics and benchmarks for coding LLMs are unreliable for differentiating models. The end-user subjective evaluation is still required to determine the model’s usefulness.
Summary
The co-pilot for the Ada programming language project aimed to enhance Ada developer productivity with an intelligent code completion tool. deepsense.ai overcame challenges in LLM evaluation, training, and data resource constraints through problem decomposition and choosing the suitable evaluation scheme. The project marked significant progress for AdaCore in leveraging ML efforts to improve Ada developers’ productivity.