Building a Scalable Telemedicine Platform: How CareBridge Delivered 2 Million Virtual Consultations in 18 Months

CareBridge Health, a fast-growing telehealth provider serving 14 U.S. states, went from 30,000 video consultations per month to 2 million in 18 months. This case study breaks down how a lean four-person platform team replaced a patchwork of video SDKs and manual scheduling scripts—including a Google Sheet scheduler cron-job and a Flask app backed by SQLite—with a production-grade AWS architecture. The result: 99.97% uptime, sub-500 ms P95 application latency, and a 63% reduction in average wait time from appointment request to confirmed slot. We cover the full technical migration, including the HIPAA compliance framework, real-time notification and EHR-integration layers, the AI-powered patient intake pipeline that cut per-visit processing time from 18 minutes to 3 minutes, and every major infrastructure decision—from CDK-driven IaC to Calico-based egress filtering and the selective WebRTC-MCU architecture that saved $380,000 per year. The article also examines the AI triage pipeline, the FHIR-compliant EHR integration layer that unlocked three multi-year health system partnerships, and the sprint-driven rollout plan that delivered on the 90-day HIPAA deadline four days early.

From Scheduling Spreadsheets to 2 Million Consultations

In March 2024, CareBridge Health was running on spreadsheets. The founding team — two practicing physicians and a developer who had written the first prototype over four weekends — had built a contact-triage system in Python and Flask, embedded a Twilio video widget, and processed clinician availability via a shared Google Sheet that human coordinators updated twice daily. The company delivered roughly 30,000 virtual consultations that quarter across five U.S. states. Revenue was growing at around 80% month-over-month. The spreadsheet was already collapsing.

When we joined as the platform team, the situation was acute on multiple fronts. Clinicians were triaging patients via personal phones during video calls because the platform had no built-in triage UI. Every inbound patient-intake form generated an average 40-page PDF that staff manually filed into the EHR. The scheduling engine, backed by a single SQLite file, crashed under concurrent writes, capping simultaneous appointments at twelve per minute and creating deadlocks at peak hours. There was no HIPAA audit trail, no access control, no encryption at rest — clinicians accessed patient records via direct SQL queries from a Flask shell. The company could not sign any major health system contracts in that posture.

Eighteen months later, CareBridge Health runs well over 2 Million consultations annually across 14 states, has reduced average patient wait time from appointment request to confirmation from 22 minutes to 8 minutes, cut no-show rates from 32% to 11%, and reached 99.97% platform availability — all while reducing core infrastructure spend by 41% through a deliberate shift to serverless compute and a tiered media pipeline that replaced per-minute Twilio billing with selective WebRTC routing.

This case study examines every decision — from the CDK module structure that made environments reproducible in 90 minutes, to the WebRTC mesh-vs-MCU arbitration that saved $380,000 per year, to the AI workflow that reduced average patient intake time from 14 minutes to 3 minutes per visit.

Challenge

The Monolithic Prototype

The original CareBridge architecture was a single Flask application served from one 5 GB DigitalOcean droplet with SQLite as its database, no load balancer, no TLS termination layer, no CI–CD pipeline, and no roles or access control. The integration between scheduling and video calls was a cron job running every ten minutes, reading a Google Sheet, and pushing video-room links via SMS.

Key Pain Points

Database bottleneck: A single SQLite file with patient records, appointment logs, clinician schedules, and billing records crashed under concurrent writes. The in-house lock-based write queue capped simultaneous appointments at twelve per minute, creating scheduling deadlocks at peak hours.
Video infrastructure cost: Each Twilio video room billed at $0.0015 per participant minute. At 400 concurrent sessions, monthly costs ran above $138,000 per year, and any call above ten participants required a manual bridging step that failed roughly 23% of the time.
No HIPAA audit trail: Clinicians accessed patient records via direct SQL queries in the Flask shell. No audit log, no RBAC, no encryption at rest. The company operated in a regulatory grey zone and could not sign any major health system contracts.
Manual triage pipeline: Every inbound patient-intake form required a coordinator to review, transcribe into the EHR, and forward to the clinician — an average 18 minutes per form and a throughput ceiling of 30 new consults per coordinator daily.
No-show revenue bleeding: With no reminder system, 32% of confirmed appointments disconnected without being seen. At average reimbursement of $185 per visit, that was $1.1M in annual lost revenue at then-current volume.

Hard Constraints

Three non-negotiable constraints shaped every architectural decision before any technical work began: (1) HIPAA BAA qualification before the SoftBank-backed partnership mobilization, a 90-day hard deadline; (2) a 15-person engineering ceiling that mandated extreme automation discipline; and (3) demonstrated positive ROI within 12 months so the platform paid for its own build.

Goals

Tier 1 — Survival and Compliance

HIPAA BAA qualification with full audit trace before day 90.
Complete platform restoration with no data loss during migration.
44% uptime across Texas and Florida during the first 90 production days.
Zero confirmed PHI access incidents.
Billing integration with three major clearinghouses by go-live.

Tier 2 — Operational Transformation

Reduce patient wait time from 22 minutes to under 10.
Reduce no-show rate from 32% to under 18%.
Complete EHR integration for all 142 clinicians within months one through four.
Enable self-service availability calendars for all clinicians.
Reduce intake processing time per visit from 18 minutes to under 8.

Tier 3 — Scale and Technical Ambition

Support 500 concurrent video sessions with room for 1,000 at peak.
Hit sub-500 ms P95 application latency for non-video requests.
FHIR-compliant APIs within six months of launch.
AI patient intake targeting a 70% time reduction over manual processing.

Approach

Strangler Fig Architecture

We selected the strangler fig pattern over a big-bang rewrite: a full rewrite would have missed the 90-day launch deadline; incremental patches would have perpetuated the underlying fragile patterns. The strangler fig approach — incrementally replacing routed monolith components while keeping the original system fully operational during the transition — permitted parallel workstreams, weekly releases, and safe rollback at the route level.

An Amazon API Gateway front-door with path-based routing was the migration anchor. Individual paths moved to the new microservices layer in order of business impact and technical independence: patient registration first (low dependency, high maintenance cost), then video session orchestration, then clinical records, then billing, and the scheduling engine last — deliberately, because scheduling coupled to every other concern in the platform.

Telemedicine consultation in progress

Technology Stack

Layer	Choice	Why
API Gateway	Amazon API Gateway	Managed auth, WAF, per-route throttle
Compute	Lambda + Fargate	Lambda for event tasks; Fargate for long-running services
Primary DB	Aurora PostgreSQL + 2 read replicas	Transactional stability, JSONB for FHIR-compatible health records
Sessions & cache	ElastiCache Redis (cluster mode)	Sub-ms session lookup and JWT blacklist
Recording Storage	S3 + KMS + CloudFront	HIPAA-eligible, globally replicated for partner EHR access
Observability	Datadog + X-Ray + CloudWatch	Business KPIs and full distributed tracing in one UI
IaC	AWS CDK (TypeScript)	Single source of truth, reversible, peer-reviewed in PR

Implementation

Months 1–2: Compliance-First Foundation

Before any business service ran, the build team spent two months instrumenting the platform for compliance and observability. In HIPAA-regulated environments, observability is not an afterthought—it is the primary mechanism for demonstrating accountable security posture. Without it, every subsequent service delivery was operating blind.

The CDK app was structured into six fully isolated stacks: NetworkStack (VPC, TGW, WAF), DataStack (Aurora, Redis, S3), AppStack (API routes, Lambda, EventBridge), MediaStack (MediaSoup pods, TURN relay, S3 recording buckets), ObservabilityStack (Datadog, X-Ray, alerting), and CIStack (self-hosted GitHub Actions runners on Fargate). Each stack had independent unit tests and a dedicated CodePipeline stage; a PipelineStack orchestrated the hierarchy so any infrastructure change triggered rollback detection before promoting to production.

All services ran in private subnets with no public IP addresses. Egress from Lambda and Fargate passed through a dedicated Calico egress firewall on two c5.xlarge NAT instances, allowing only explicitly listed public endpoints required for Twilio signaling and MediaSoup TURN relays. Explicit allow, deny everything else—this principle prevented a WebSocket library misconfiguration from ever reaching any endpoint that could carry PHI.

Months 3–5: Video — WebRTC Mesh with Selective MCU

We evaluated three architectures: (1) pure Twilio, (2) self-hosted open-source WebRTC mesh, and (3) selective routing: one-on-one calls over direct WebRTC inside a coturn TURN relay on Fargate across three AZs, group calls of more than four participants routed to a self-hosted MediaSoup MCU.

Usage data showed 96% of all sessions were one-on-one consultations. Routing the bulk of traffic to direct peer connections eliminated Twilio per-minute charges for those sessions entirely. The MCU handled the remaining 4% of group sessions at a fraction of the cost. The selective MCU cut video media costs by 87%, saving roughly $380,000 per year at projected 1.2 M-consult volume.

Months 4–6: Patient Records, FHIR, and Clinical Workflow

The most architecturally complex piece was the EHR integration layer. CareBridge clinicians used Epic, Cerner, and athenahealth, and every health system partner required its own FHIR server integration. We treated this not as a one-off connector but as a canonical data model: a Patient Domain Service normalized records from all upstream sources into a single canonical form stored in Aurora JSONB columns, and every mutation emitted an FHIR-compliant event to EventBridge for downstream adapters to consume.

An MLTriage Worker pre-processed inbound intake forms using a fine-tuned RoBERTa model (async Lambda, 4 vCPU) that extracted ICD-10 codes, medication entries, and chief-complaint categorizations in an average of 3.2 minutes per visit—down from 18 minutes of manual entry by coordinating staff. Clinicians reported a 28% reduction in pre-call preparation time.

Healthcare analytics dashboard

Months 6–9: Scheduling Engine — Last, Deliberately

Scheduling sat at the center of every concern in the platform — video room creation, patient notifications, billing events, clinician calendar syncs, and EHR write confirmations all passed through it. For that reason it was deliberately the last service to be extracted. Rebuilt as two orthogonal services: an AvailabilityService (read-optimized, DynamoDB-backed, single O(1) lookup per search, sub-50 ms at the 99th percentile) and a BookingService (write service, Aurora PostgreSQL, two-phase-commit pattern with per-slot locking and atomic confirmation on downstream-acknowledgment).

The BookingService pushed a confirmation job to SNS–SQS, which triggered multi-channel reminders (SMS, email, in-app push) at 72 hours, 24 hours, and 2 hours before each appointment. A waitlist promotion protocol was built in: if a patient had not connected within 10 minutes of the scheduled start, the system moved them to the next available slot and notified the clinician within 15 minutes, reducing both patient wait time and clinician idle time. No-show rates dropped from 32% to 11%, recovering approximately $3.2 M in annual revenue by month 8.

Results

Metric	Pre-Migration	Post-Migration	Change
Monthly video consultations	30,000	166,000+	+453%
Platform uptime	89%	99.97%	+10.97 pp
Booking latency (P95)	4,200 ms	480 ms	89% faster
DB crash frequency	Every 11 days	Zero	100% eliminated
No-show rate	32%	11%	66% reduction
Cost per consult	$0.48	$0.09	81% cheaper
Clinician idle time/visit	18 min	5 min	72% reduction
Patient wait (request-confirm)	22 min	8 min	64% improvement

Recovered no-show revenue crossed $3.2 M within eight months of launch, and the platform paid back its build cost in month 10. Clinician onboarding time dropped from 6.5 hours to 45 minutes (89% reduction) thanks to automated setup and self-service availability calendars. Patient NPS rose from 31 to 63, driven primarily by shorter wait times and reliable appointment reminders. The HIPAA BAA qualification was achieved on day 86 of the 90-day deadline, unlocking three major health system partnerships that required it as a contractual precondition.

Metrics

Tier 1 — Business-Facing SLIs

Video success rate: Target ge; 97.5%. Actual: 98.2%.
Booking latency P95: Target ge; 1s. Actual: 480 ms P95 / 280 ms P50.
Clinician credentialing SLA: Target ge; 14 hours. Actual: 8.3 hours median.
Cost per consult: Target ge; $0.15 by month 12. Actual: $0.09 and declining.

Tier 2 — Per-Service SLOs

Service	Error Rate	Latency P99	Throughput
Availability Service	<0.1%	<50 ms	5,000 RPM
Booking Service	<0.2%	<200 ms	1,000 RPM
Video Orchestration	<0.5%	<800 ms	500 conc.
Patient Records / FHIR	<0.1%	<300 ms	3,000 RPM
MLTriage Worker	<1% async fail	(async)	100 RPS
E-Prescribe Pipeline	<0.5%	<3 s	200 conc.

Post-launch, the platform operated 210 consecutive days without a P0 incident. Eight P1 incidents occurred during the first year; all were resolved within one hour. The highest-severity migration incident was a DynamoDB provisioned-capacity underestimate during an onboarding surge in month 11; it was resolved in 22 minutes using on-demand capacity rescaling.

Lessons Learned

1. Compliance Is Architecture, Not a Checklist

Embedding audit logging as a structural concern in API Gateway and every Lambda function—rather than layering it on as a post-hoc add-on—eliminated a six-month audit-remediation effort that a comparable platform undertook in parallel. In regulated industries the audit chain must be a first-class design constraint from day one.

2. Async Timeout Defaults Are Wrong for Health-Data Shapes

SQS was initially configured with a 5-minute visibility timeout—the standard recommendation. In production, patient-intake forms occasionally exceeded that window on complex multi-page submissions, causing messages to re-queue three times and then land in the DLQ unprocessed. The fix was a second SQS queue with a 30-minute visibility timeout dedicated to the ML extraction step. Health-data pipeline defaults must be derived from actual production data shape, not documentation benchmarks.

3. Short-Lived JWT Tokens Win Over Long-Lived Sessions

Connection-based eager revocation in Redis consumed 4.2 GB of memory after two weeks and forced a session-restart that briefly invalidated all active clinicians. Switching to 4-hour JWT tokens with refresh-token rotation and an 8-hour idle timeout cut Redis memory usage by 94% and eliminated the restart problem without reducing perceived session uptime.

4. Video Cost Engineering Is Real ROI, But Clinical Workflow Is Bigger

Eighty-seven percent video cost savings ($380,000/year) was significant. But the same engineering effort put into the intake pipeline and scheduling engine produced $3.2 M in annual recovered no-show revenue in the same engagement window. The highest ROI on a health platform consistently comes from improving clinical and administrative workflows, not video room efficiency.

5. IaC Permissions Must Be as Granular as Data Permissions

In month 10, a mis-deployed IAM policy gave a staging role access to a production S3 bucket holding flight session recordings—caught before promotion only by CDK assertion tests blocking the PR. The fix: CDK synth assertions as a blocking step in every pipeline, plus stack-scoped CDK roles where the scheduling engineer deploys only AvailabilityService and BookingService stacks. Infrastructure permissions in regulated environments must match data permission granularity, not operate with open-rooted access.

Conclusion

CareBridge Health’s 18-month journey from spreadsheet-based prototype to production-grade telehealth platform defies the more common narrative of graduated cloud rewrites. The platform scaled from 30,000 to over 2 Million annual consultations. Uptime moved from 89% to 99.97%. Cost per consult dropped 81%. Patient wait time from request to confirmation fell by 64%. No-show revenue was fully recovered by month 8—before the initial build contract was even complete.

The decisions that mattered most were architectural discipline: build the compliance layer before the business layer; decouple async processing early to prevent cascading failures; route video per utilization level rather than provisioning a one-size-fits-all MCU; and build the scheduling engine last deliberately, as the apex dependency, because only that ordering avoids scheduling-to-scheduling rollback fragility. Most cloud health platforms cannot restart from zero after a day-1 misstep. We built one that could.

Building a Scalable Telemedicine Platform: How CareBridge Delivered 2 Million Virtual Consultations in 18 Months

From Scheduling Spreadsheets to 2 Million Consultations