
The AI voicebot transformed an unstable product into a robust AI scheduling voicebot that responds 10x faster, uses 20x fewer tokens per call, and doubled booking conversion rates from 10% to 20%.
Meet our client
Client:
Industry:
Market:
Technology:
In a Nutshell
Client’s Challenge
The client aimed to automate doctor-patient appointment scheduling, increasing self-scheduled bookings to let doctors focus on key tasks and allow patients to book anytime. However, their existing AI solution was ineffective, with only 10% of callers successfully booking appointments.
Our Solution
We redesigned the solution by advanced prompt engineering, and adding evaluation and monitoring tools to optimize conversations. We created a natural conversation flow that allowed the system to gather and provide all necessary information without relying on rigid scripts.
Client’s Benefits
Our automated solution increased the conversion rate from 10% to 20%, significantly improving the efficiency of the appointment scheduling process. We also optimized token usage, reducing costs, and improved system latency for smoother and faster interactions.
A Deep Dive
Client
The client is a global healthcare technology platform that connects patients with medical providers. It operates in 13 countries, serving 90 million patients and over 300,000 doctors.
The platform offers a range of services and features, including online appointment booking, doctor reviews and ratings, telemedicine services, and more.
Overview
An AI-driven voicebot booking assistant was developed to automate patient appointment scheduling for a leading online healthcare platform.
The project aimed to:
- improve and streamline the existing booking process,
- increase the percentage of calls that result in scheduled visits,
- improve system performance.
Key objectives included automating repetitive scheduling tasks, boosting the booking conversion rate, and reducing response latency.
The implemented solution achieved numerous improvements, including a roughly doubled booking conversion rate, a significant improvement in response time, and a 20-fold drop in token usage per call.
Challenge
Improving the booking process was crucial to unlock growth from call-based appointments, reduce operational bottlenecks, and deliver a seamless patient experience that drives higher clinic utilization and revenue.
- Business challenge: The clinic’s appointment booking process was inefficient, yielding a low conversion rate. Few patient calls ended with a successful booking, limiting the platform’s ability to grow visits from the call-based stream.
- Technology challenge: The existing LLM-powered voice assistant was unstable and slow. It frequently failed to complete bookings (for example, omitting available dates or hallucinating doctor information). Its design concatenated the entire conversation into a vast system prompt each turn, leading to uncontrolled token usage and high latency. This made the solution difficult to control, debug, or improve.
Process
To deliver these results, the team followed a structured process from diagnostic analysis to deployment and tuning.
- Diagnostic audit: The team first analyzed the existing assistant, collecting data on failure points and performance. We prepared a detailed report that highlighted specific user workflows where the assistant struggled most (e.g., misinterpreted intents, dropped context) and pain points of the current design.
- Design new architecture: Next, we developed a stateful conversation architecture, building a framework inspired by RASA CALM. This framework allowed us to define structured conversation flows and systematically map out the sequence of actions the assistant should follow based on dialogue context and user intent.
- Prompt redevelopment: We created a new prompt template and a synthetic dataset of booking dialogues. Starting from scratch, we wrote concise prompts to trigger the flow logic.
- Testing and rollout: The solution was first tested internally with scripted calls. It was deployed progressively, gradually expanding the user base while continuously monitoring performance and user drop-off points. Metrics and feedback guided further tuning.
Solution
To address these challenges, the team overhauled the assistant’s architecture and prompts:
- Structured conversation flow: We introduced explicit memory management, conversation tracking, and predefined dialogue flows to enhance the conversation experience. Each conversation turn updates the current booking state, which allows AI to maintain a predictable booking sequence.
- Simplified, efficient prompts: We rewrote the system prompts to be concise and targeted. By starting from scratch, building a dialogue evaluation dataset, and iterating on prompt design, we minimized token consumption. Smaller prompts meant each call used far fewer tokens, making the assistant significantly cheaper and faster.
- Logging and traceability: Every step of the conversation is logged. This full traceability enables engineers to inspect how the assistant arrived at each decision. It also makes it possible to rollback or tweak individual actions without side effects.
- Quality control and evaluation: We built an evaluation framework with test dialogue scenarios. The assistant was iteratively tested against these cases, and metrics (e.g., drop-off stage in the flow) were used to identify and address bottlenecks.
Benefits
- Business results: The booking conversion rate rose from ~10% to ~20% after the new version of the assistant went live.
- Efficiency gains: Token usage per conversation fell significantly (previously, single calls often used >30,000 tokens with the whole conversation totalling hundreds of thousands of tokens, now only uses 3000 – 7000 tokens per conversation turn). API response latency improved by ~10 times (median call time dropped from over 5 seconds to ~0.5 seconds).
- System stability: By enforcing structured actions, the assistant no longer hallucinates availability or prematurely ends calls. Full logging provides visibility into each turn, allowing rapid debugging.
- Lessons learned: We found that defining explicit dialogue states and actions is critical for a reliable assistant. Iterative prompt testing with real user scenarios exposed edge cases that needed fixes. Continuous monitoring of conversation metrics and user feedback is vital for ongoing improvement. Modular design (small prompts and controlled flows) enabled safe rollouts and quick fixes.
Summary
By combining LLMs with structured flow frameworks and precise prompt engineering, the team delivered a faster, more reliable, and easier-to-maintain solution, driving higher clinic efficiency and patient throughput.