AI Copilot's Impact on Productivity in Revolutionizing Ada Language Development

Table of contents

Project Overview

The Copilot for Ada programming language project aimed to research and develop a proof-of-concept code completion tool and evaluate its performance on the Ada code generation task. Its idea is to boost Ada software developers’ productivity by providing intelligent code completions and suggestions, improving the pace of task automation, and saving significant amounts of time on repetitive and boilerplate code.

Services provided

AI Research
AI Implementation

Key objectives

The solution aimed to prepare the groundwork for significantly boosting the effectiveness of Ada developers in the future by developing intelligent code completion and suggestions solution

Key outcomes

Demo application allowing to test code completion LLM-based solution on any Ada language samples,
Fine-tuned checkpoints of the selected coding LLMs, such as StarCoder and CodeGen, and their comparison with the baseline pre-trained models,
Recommendations on further steps for fine-tuning and enhancing Ada-specific code completion models.

Accelerating AI Integration with a Proof of Concept

M. Anthony Aiello, Head of Product & Innovation at AdaCore

“deepsense.ai quickly delivered a Proof of Concept for a code completion tool, using a state-of-the-art technological stack, including the newest available LLMs and libraries. They also led an excellent LLM discovery workshop that jump-started AdaCore’s integration of LLM solutions into our business processes and products. Their technical knowledge and commitment to delivering tailored, top-notch services were evident throughout our collaboration. Partnering with deepsense.ai has helped us accelerate our understanding of AI, implement AI solutions, and gain a strategic edge in today’s competitive landscape.”

Client background

AdaCore specializes in software development tools and services, primarily focused on Ada language programming for high-integrity systems to meet rigorous requirements for reliability, safety, security, and maintainability. With headquarters in New York and Paris, AdaCore provides its expertise to leading global defense, healthcare, automotive, aerospace, and railway enterprises.

Challenges and solutions

This project encountered several challenges while developing the copilot solution for the Ada programming language.

Challenges faced

LLMs EvaluationEvaluating LLMs is difficult because no performance metrics can be automatically calculated and nicely correlated with the model’s performance in the program synthesis task without manually creating a set of programming challenges with unit tests. To circumvent this, we used text comparison metrics like BLEU and chrF measures to compare the ground truth code to the model’s generation.
Training Objective Standard autoregressive training would not suffice because we need information from before and after the cursor to complete the code completion task correctly. As a result, we needed to understand how LLMs are trained to perform fill-in-the-middle tasks (FIM).

Resources Despite the availability of memory-efficient training methods, memory requirements continue to be challenging when fine-tuning large models, particularly those with extended context. We had to rely on small batch sizes, which meant slow iteration speed (one training run could take several days to fine-tune on the largest dataset for more than one epoch).
The fast pace of the developments in the field of coding LLMs During the project, the new versions of CodeGen and StarCoder models were released. Because our training pipeline was designed with modularity in mind, we could seamlessly incorporate these models.

Solution approach

Based on our experience in delivering AI solutions, we decided to take the following steps:

Formulation of the problem in Data Science terms
Challenge decomposition into more miniature epics
Setting development order and priorities
Choice of metrics and methods for evaluating the performance of the code generation model
Weekly meetings with AdaCore to present progress and gather feedback

Development process

The project’s goal in Data Science terms was to fine-tune the LLM, which is already pre-trained on code, for Ada code synthesis. The overall strategy included three crucial components as follows:

Training dataset preparation We took an existing and cleaned corpus of GitHub repositories with permissive licenses called The Stack. After keeping only files with the correct Ada code and some additional preprocessing, we transformed it into a form that allows us to train the models on the FIM task.
Evaluation Due to the lack of better evaluation methods, we utilized the chrF metric, which measures the similarity of predicted text and the ground truth, fully aware of some shortcomings. We evaluated the checkpoints on the held-out Ada corpus and the dataset of short Ada programming challenges found on the AdaCore website.
Fine-tuning Runs We run and compared the performance of pre-trained and fine-tuned StarCoder and CodeGen models in different configurations, manipulating the number of model’s parameters, precision, context lengths, and memory-efficient fine-tuning methods, like LoRA and QLoRA

Data

We utilized three data sources throughout the project:

The Stack is a collection of code repositories for 358 programming languages available under permissive licenses. We retained only file extensions associated with Ada scripts (.ads, .adb, .ada), validated their correctness with the libadalang module, and filtered out files lacking Ada keywords in their contents. In the end, 30,528 files remained, representing 2.4% of the original dataset.
Ada Course Labs, which consisted of short Ada exercises with descriptions. We utilized this dataset as a supplementary test set.
Ada code from GitHub, the code that was not in The Stack dataset to improve the model’s performance with a more extensive training corpus and generate a test set of Ada files unseen by pre-trained models during their training phase

Key contributors

Since it was a scoped project with clear deliverables defined, deepsense.ai, a team of experienced consultants, managers, and developers, took ownership of the whole development process:

Project Manager
Technical Leader
Data Scientists
Principal ML Engineer (Technical Consultant)
ML Engineers

In addition to the team from deepsense.ai, the client-side team was actively engaged throughout the project, ensuring collaboration and alignment with the project goals.

Outcomes and benefits

The fine-tuning improved the model’s performance on the Ada code synthesis tasks compared to the pre-trained version. Considering the AdaCore company feedback, the project is a significant step forward. The generations from the delivered fine-tuned model outperformed the ground truth and GitHub Copilot even though Copilot uses a much larger model with a longer context and a more complex prompting method.

Lessons learned

The field of coding LLMs is constantly evolving, with more and more capable models being released at a fast pace. Even the foundation models pre-trained on publicly available code repositories have a decent level of understanding of how to program in the Ada language. The model’s capabilities can be further improved by fine-tuning an additional Ada-specific corpus of code repositories.
For coding models, context length is more important than the model’s size – even a 1B model with a context of 8k tokens can outperform a 15.5B model with only 2k tokens context.
While useful as a proxy measure for the model’s output quality, existing metrics and benchmarks for coding LLMs are unreliable for differentiating models. The end-user subjective evaluation is still required to determine the model’s usefulness.

Summary

The co-pilot for the Ada programming language project aimed to enhance Ada developer productivity with an intelligent code completion tool. deepsense.ai overcame challenges in LLM evaluation, training, and data resource constraints through problem decomposition and choosing the suitable evaluation scheme. The project marked significant progress for AdaCore in leveraging ML efforts to improve Ada developers’ productivity.