Rethink Site Selection. Our AI Model Beats Traditional Choices 90% of the Time

90% of model-recommended sites outperformed legacy solutions in the US market

Meet our client

Client:

One of the biggest pharmaceutical company

Industry:

Healthcare / Pharma

Market:

Europe

Technology:

LLM

In a Nutshell

Client’s Challenge

Running clinical trials is a daunting task – 9 out of 10 trials are delayed due to the wrong site choice, and this can cost as much as $8 million per day of lost market time. Our client, a global pharmaceutical company, needed to optimize the site selection process for clinical trials to improve cost savings and impact on enrollment speed. Their existing approach required frequent manual intervention, lacked a structured framework, and struggled to integrate Real World Data (RWD) with clinical trial history, leading to delays, poor site performance, and limited patient diversity.

Our Solution

Our goal was to develop a modular, AI-driven platform that integrates diverse data sources (RWD, clinical trials databases, external datasets) to recommend trial sites based on enrollment potential, diversity scores, and historical success. The solution includes supervised ML models, automated evaluation pipelines, and an interactive dashboard used by over 70 decision-makers in Clinical Operations

Client’s Benefits

The AI system improved trial efficiency and site performance:

Our bespoke site selection recommendation engine successfully promoted ~58% of entities for multiple clinical trials.

Ecommended sites accounted for approximately 58% of all trial locations.
90% of model-recommended sites outperformed legacy solutions in the US market.
The platform accelerated planning, improved diversity, reduced manual effort, and became a key tool in clinical operations.

A Deep Dive

1. Overview

The project focused on developing a data-driven platform to support more effective and efficient clinical trial planning. The platform was designed to recommend optimal healthcare facilities based on key factors, including patient enrollment potential, diversity metrics, and the historical performance of trial sites.

The primary objectives of the project were to optimize the site selection process to minimize delays and reduce the risk of trial failures, integrate Real World Data (RWD) with historical Randomized Controlled Trial (RCT) data to support more informed decision-making, and improve forecasting around patient diversity and inclusion. A key emphasis was also placed on delivering a platform that is explainable, user-friendly, and collaborative, ensuring accessibility for clinical operations teams with limited technical backgrounds.

Key Outcomes

Increased adoption of data-driven site recommendations across trial planning.
Recommended sites accounted for approximately 58% of all trial locations.
90% of model-recommended sites outperformed legacy solutions in the US market.
Over 70 Clinical Managers are actively using the platform for site evaluation and selection.

2. Client

A leading global pharmaceutical company focused on medical innovation and medtech development.
Achievements / Context:

One of the pharma leaders to integrate AI across trial planning at this scale.
Strategic focus on improving clinical trial efficiency through data and automation.
Collaboration with deepsense.ai has been ongoing for over 3 years.

3. Challenge

Business Challenge

The client needed to streamline the selection of trial sites to ensure faster startup times, better patient enrollment, and higher trial success rates. Manual methods and legacy systems lacked consistency, coverage, and data-driven insights.

Technology Challenge

Fragmented and incomplete datasets across RWD and RCT sources
No reproducibility or historical tracking in existing selection processes
Limited tools for explainability and communication with non-technical decision-makers
Difficulties predicting site-level metrics such as enrollment and patient diversity

4. Solution

Approach

We designed a modular, AI-powered system capable of addressing multiple modeling objectives essential to clinical trial planning. The system includes components for predicting patient enrollment, forecasting diversity and inclusion outcomes, and estimating site performance based on historical and real-world data. It also supports integration and exploration of multiple data sources, enabling more comprehensive and accurate decision-making.

Key Components & Technologies

Data Sources: ClinicalTrials.gov, CMS OpenPayments, external RWD data vendors and internal RWD
Data Processing: ETL Pipelines for feature extraction and engineering using site-physician-trial relationships
Modeling: Supervised learning pipelines (modular and extendable)
Dashboard & Interface: Flask + Dash web application with collaborative features
Infrastructure: Microservices architecture with PostgreSQL and S3; access controlled via Entra ID

Functionality

The platform offers several key functionalities to support clinical trial planning. It provides site scoring based on Clinical Representation Excellence metrics, enrollment potential, and past trial performance. An interactive dashboard allows users to explore site recommendations and analyze results in a user-friendly interface. The system includes real-time monitoring of model performance to ensure reliability and transparency. Additionally, an automated evaluation pipeline continuously validates model outputs using historical data, ensuring the recommendations remain accurate and effective over time.

5. Process

To build a robust site selection platform, we followed a structured, end-to-end process that combined technical expertise with business alignment. Here’s how we delivered it and who was involved.

Steps Taken

Aligned on business KPIs and use cases (enrollment, Clinical Representation Excellence, etc.)
Built a matching pipeline between RWD and RCT data
Designed and trained ML models for each modeling objective
Developed interactive, role-specific dashboards for clinical decision-makers
Implemented an automated model evaluation pipeline
Integrated multiple external data sources to enhance accuracy
Supported end-users with training, documentation, and ongoing iteration

Expertise Involved

Data Scientists (modeling, RWD matching, evaluation metrics)
ML Engineers (pipeline architecture, modularization, automation)
Product & UX (collaboration design, user workflows)
DevOps & Backend (data infrastructure, cloud deployment, security)

6. Outcome

The site selection platform delivered measurable improvements in clinical trial planning. Below are the key quantitative and qualitative results that demonstrate its impact on the client’s operations, along with insights gained throughout the project.

Quantitative Results

Recommended sites accounted for approximately 58% of all trial locations
90% of trials using high-scoring sites showed better enrollment and diversity outcomes
More than 70 users actively using the dashboard across global trial planning teams
Reduction in manual overhead for site evaluation and report generation

Qualitative Results

Decision-making is now consistent, traceable, and explainable
Improved confidence and buy-in from clinical operations teams
Significant increase in data coverage and model trust over 3 years of iteration
Platform supports on-demand trial planning across multiple therapeutic areas

Lessons Learned

Throughout the project, we observed that cross-functional tools must be both interpretable and collaborative to ensure adoption across diverse teams. Effectively matching real-world data with historical randomized controlled trial data requires advanced techniques like graph modeling and the development of custom features. We also found that automated evaluation pipelines are essential for maintaining model accuracy, supporting continuous improvement, and building trust in AI-driven recommendations.

7. Summary

Final Thoughts

The AI-powered site selection platform has significantly transformed how the client plans clinical trials by enabling informed decisions based on multiple data sources. With a modular data integration, modelling, and user experience, the client now plans more effective, inclusive, and cost-efficient trials. The system is reproducible, scalable, and in active use across dozens of teams.