
90% of model-recommended sites outperformed legacy solutions in the US market
Meet our client
Client:
Industry:
Market:
Technology:
In a Nutshell
Client’s Challenge
Running clinical trials is a daunting task – 9 out of 10 trials are delayed due to the wrong site choice, and this can cost as much as $8 million per day of lost market time. Our client, a global pharmaceutical company, needed to optimize the site selection process for clinical trials to improve cost savings and impact on enrollment speed. Their existing approach required frequent manual intervention, lacked a structured framework, and struggled to integrate Real World Data (RWD) with clinical trial history, leading to delays, poor site performance, and limited patient diversity.
Our Solution
Our goal was to develop a modular, AI-driven platform that integrates diverse data sources (RWD, clinical trials databases, external datasets) to recommend trial sites based on enrollment potential, diversity scores, and historical success. The solution includes supervised ML models, automated evaluation pipelines, and an interactive dashboard used by over 70 decision-makers in Clinical Operations
Client’s Benefits
The AI system improved trial efficiency and site performance:
- Our bespoke site selection recommendation engine successfully promoted ~58% of entities for multiple clinical trials.
- Ecommended sites accounted for approximately 58% of all trial locations.
- 90% of model-recommended sites outperformed legacy solutions in the US market.
- The platform accelerated planning, improved diversity, reduced manual effort, and became a key tool in clinical operations.
A Deep Dive
1. Overview
The project focused on developing a data-driven platform to support more effective and efficient clinical trial planning. The platform was designed to recommend optimal healthcare facilities based on key factors, including patient enrollment potential, diversity metrics, and the historical performance of trial sites.
The primary objectives of the project were to optimize the site selection process to minimize delays and reduce the risk of trial failures, integrate Real World Data (RWD) with historical Randomized Controlled Trial (RCT) data to support more informed decision-making, and improve forecasting around patient diversity and inclusion. A key emphasis was also placed on delivering a platform that is explainable, user-friendly, and collaborative, ensuring accessibility for clinical operations teams with limited technical backgrounds.
Key Outcomes
- Increased adoption of data-driven site recommendations across trial planning.
- Recommended sites accounted for approximately 58% of all trial locations.
- 90% of model-recommended sites outperformed legacy solutions in the US market.
- Over 70 Clinical Managers are actively using the platform for site evaluation and selection.
2. Client
A leading global pharmaceutical company focused on medical innovation and medtech development.
Industry: Pharmaceutical / Healthcare
Market Value: Multinational operations in over 60 countries, serving millions of patients globally.
Achievements / Context:
- One of the pharma leaders to integrate AI across trial planning at this scale.
- Strategic focus on improving clinical trial efficiency through data and automation.
- Collaboration with deepsense.ai has been ongoing for over 3 years.
3. Challenge
Business Challenge
The client needed to streamline the selection of trial sites to ensure faster startup times, better patient enrollment, and higher trial success rates. Manual methods and legacy systems lacked consistency, coverage, and data-driven insights.
Technology Challenge
- Fragmented and incomplete datasets across RWD and RCT sources
- No reproducibility or historical tracking in existing selection processes
- Limited tools for explainability and communication with non-technical decision-makers
- Difficulties predicting site-level metrics such as enrollment and patient diversity
4. Solution
Approach
We designed a modular, AI-powered system capable of addressing multiple modeling objectives essential to clinical trial planning. The system includes components for predicting patient enrollment, forecasting diversity and inclusion outcomes, and estimating site performance based on historical and real-world data. It also supports integration and exploration of multiple data sources, enabling more comprehensive and accurate decision-making.
Key Components & Technologies
- Data Sources: ClinicalTrials.gov, CMS OpenPayments, external RWD data vendors and internal RWD
- Data Processing: ETL Pipelines for feature extraction and engineering using site-physician-trial relationships
- Modeling: Supervised learning pipelines (modular and extendable)
- Dashboard & Interface: Flask + Dash web application with collaborative features
- Infrastructure: Microservices architecture with PostgreSQL and S3; access controlled via Entra ID
Functionality
The platform offers several key functionalities to support clinical trial planning. It provides site scoring based on Clinical Representation Excellence metrics, enrollment potential, and past trial performance. An interactive dashboard allows users to explore site recommendations and analyze results in a user-friendly interface. The system includes real-time monitoring of model performance to ensure reliability and transparency. Additionally, an automated evaluation pipeline continuously validates model outputs using historical data, ensuring the recommendations remain accurate and effective over time.
5. Process
To build a robust site selection platform, we followed a structured, end-to-end process that combined technical expertise with business alignment. Here’s how we delivered it and who was involved.
Steps Taken
- Aligned on business KPIs and use cases (enrollment, Clinical Representation Excellence, etc.)
- Built a matching pipeline between RWD and RCT data
- Designed and trained ML models for each modeling objective
- Developed interactive, role-specific dashboards for clinical decision-makers
- Implemented an automated model evaluation pipeline
- Integrated multiple external data sources to enhance accuracy
- Supported end-users with training, documentation, and ongoing iteration
Expertise Involved
- Data Scientists (modeling, RWD matching, evaluation metrics)
- ML Engineers (pipeline architecture, modularization, automation)
- Product & UX (collaboration design, user workflows)
- DevOps & Backend (data infrastructure, cloud deployment, security)
6. Outcome
The site selection platform delivered measurable improvements in clinical trial planning. Below are the key quantitative and qualitative results that demonstrate its impact on the client’s operations, along with insights gained throughout the project.
Quantitative Results
- Recommended sites accounted for approximately 58% of all trial locations
- 90% of trials using high-scoring sites showed better enrollment and diversity outcomes
- More than 70 users actively using the dashboard across global trial planning teams
- Reduction in manual overhead for site evaluation and report generation
Qualitative Results
- Decision-making is now consistent, traceable, and explainable
- Improved confidence and buy-in from clinical operations teams
- Significant increase in data coverage and model trust over 3 years of iteration
- Platform supports on-demand trial planning across multiple therapeutic areas
Lessons Learned
Throughout the project, we observed that cross-functional tools must be both interpretable and collaborative to ensure adoption across diverse teams. Effectively matching real-world data with historical randomized controlled trial data requires advanced techniques like graph modeling and the development of custom features. We also found that automated evaluation pipelines are essential for maintaining model accuracy, supporting continuous improvement, and building trust in AI-driven recommendations.
7. Summary
Final Thoughts
The AI-powered site selection platform has significantly transformed how the client plans clinical trials by enabling informed decisions based on multiple data sources. With a modular data integration, modelling, and user experience, the client now plans more effective, inclusive, and cost-efficient trials. The system is reproducible, scalable, and in active use across dozens of teams.