Webskyne
Webskyne
LOGIN
← Back to journal

22 May 202621 min read

From 40% Attrition to Industry-Leading Engagement: How LearnPath Built a Platform That Students Actually Stayed On

When LearnPath launched in 2021 as an upskilling platform targeting mid-career professionals in Southeast Asia, the early numbers were brutal but honest — 40% of new users vanished within their first week, only 7% completed any course, and the median student spent a total of 47 minutes on the platform before churning. The founding team ran three flawed experiments — adding courses, bundling mentoring, deploying a referral program — before a behavioral scientist interview project revealed that the platform was not bad or jarring: it made commitment feel overwhelming and invisible. Over 18 months, a systematic rebuild driven by that insight and executed by a lean team of eight engineers turned those numbers around. Course completion sits at 51% today, 74% of graduates earn a promoted or new role within six months of finishing, and the platform serves 127,000 active learners across six countries. This case study walks through every diagnostic insight, architectural decision, technical experiment, and hard-won organizational lesson behind that transformation — from onboarding redesign and session-integrity architecture to a full data-layer migration and a notification engine rebuilt on behavioral science rather than generic scheduling.

Case Studyedtechplatform engineeringuser retentionbehavioral designSaaS metricspersonalizationstartup architecturecase study
From 40% Attrition to Industry-Leading Engagement: How LearnPath Built a Platform That Students Actually Stayed On

Overview

In early 2021, LearnPath was founded by three former bootcamp instructors who wanted structured, job-relevant upskilling accessible to working professionals across Southeast Asia. They raised a pre-seed round of $800,000, built an MVP on a Node.js + MongoDB stack, and launched in August with two courses in data analytics and product management.

The initial traction was enough to generate second-round interest. Within three months, 4,200 registered, 900 started a course, and 320 of those paid for a premium track — churning through a healthy spring funnel. But volume masked a structural problem: within seven days of signing up, 40% of enrolled students never logged in again. The median student spent 47 minutes total on the platform before they stopped. In a category where 12-week courses are the norm and completion is the engine of word-of-mouth growth, this was a death spiral wearing a startup disguise.

The founding team’s instinct was to add more courses, improve the marketing funnel, and iterate on the UI. Three months and $60,000 in largely wasted spend later, those levers had moved the numbers by less than 3%. The real insight came when the team hired a behavioral scientist who spent two weeks interviewing churned students and analyzing session-level data. The signal was clear: students were not abandoning the platform because of course quality. They were dropping off because the platform made it feel like an overwhelmingly large commitment they could not reliably fit into their lives.

From May 2021 to October 2022, LearnPath ran a comprehensive platform and experience redesign that touched every layer of the system — from the database schema to the notification engine to the team’s internal review of quality benchmarks. This case study walks through the problems they identified, the architecture and product decisions they made, and the results that followed. It is a story about how fixing product-hardening starts at the infrastructure layer but must be supported by organizational, behavioral, and cultural shifts at every level.

By the end of this study, you will see how a lean team of 8 engineers, one designer, and one behavioral researcher executed a transformation comparable in scope to a much larger platform — using a combination of deliberate architecture, ruthless prioritization, and rigorous measurement.

Challenge

The Problem Was Real and Sprinting Faster Would Not Fix It

The LearnPath team initially believed the answer lay in content expansion. A third course in cloud engineering was added. Mentoring sessions were bundled with premium tracks. A referral program was rolled out. Each initiative generated a small bump in monthly revenue — and simultaneously hid a worsening efficiency problem. New students were costing $18 in marketing and platform spend to acquire, but only 4 cent in revenue per engaged user in the first 10 days. The unit economics were sustainable only if and until the top of funnel grew fast enough, and that was not the case.

What the acquisition metrics papered over was not visibility in dashboards — it was deliberately hidden in session-level data no one had looked at closely.

Hidden Problems the Dashboard Was Not Showing

A forensic analysis of the platform’s event logs — a Google Analytics 4 + Mixpanel hybrid that nobody had built automated segment definitions for — revealed five compounding issues that no single metric had highlighted before:

1. Onboarding was a cliff, not a ramp. The registration flow had 11 steps, two of them unrelated to the user’s course selection. Each additional step lost 18–24% of active intent: users who started Day 1’s first video in week one were 46x more likely to reach week three than students who had not. Step 6 — a request for career goals in a text field — had a 71% drop-off rate, and it added nothing essential to personalized recommendations because the recommendation algorithm was not consuming its output.

2. Session fragmentation destroyed habit formation. Each lesson was composed of 4–7 micro-lessons of 3–5 minutes embedded in a scroll-heavy page. The second half of each video loaded asynchronously without a loading indicator. On 3G connections — plausible for 60% of Southeast Asia’s active mobile user base at the time — half of students who started a session saw a frozen video before the page had a chance to detect the failure and offer a retry. That failure produced an immediate tab closure, and researchers observed the same behavioral pattern in 80% of chat-interview responses.

3. Progress was invisible until completion. The course progress UI showed only a linear bar at the top of the page. Nested quizzes, peer-review components, office-hours recordings, and project briefs were accessible only through drop-down menus two layers deep. Students completed weeks one and two at 73% retention; from week three onward, the rate dropped below 30%. When researchers interviewed students who had left after week two, the consistent explanation was that none of them had a mental model of where they were or what was expected of them in the near term.

4. Feedback loops were absent and post-lesson actions were non-existent. After finishing a lesson, students landed on a page with three links to “Related Resources” — links that should have been pre-contextualized but were largely empty — and nothing else. No summary or key-takeaway was offered. No quick quiz locked the lesson. No peer-discussion prompt appeared. Students exited and the platform treated that exit as a completed lesson, inflating progress numbers and demonstrating nothing about actual learning.

5. The notification strategy was counterproductive. Email reminders triggered 72 hours of inactivity, regardless of the student’s actual pace or available time. Push notifications ignored timezone and context, and they arrived at students twice per day with generic messaging. 34% of students who received more than four emails during their first two weeks unsubscribed or marked messages as spam — before they had engaged with a single lesson.

Business Impact of These Hidden Problems

The day-to-day reality for the team was a set of choices that felt only partly under their control. Marketing spend was either flat or rising. Engagement was structurally blocked by platform shortcomings that no messaging or pricing change could reach inside. The team’s sense of what was going on began to diverge from what was actually happening in the product.

Within the executive conversation, a growing anxiety took hold: the story was not breaking well enough to sustain a Series B. Word-of-mouth referrals were already tracking ahead of revenue growth, and retention was trending in the wrong direction. In plain institutional terms: the board was asking questions about unit economics, the CEO was asking about growth plans, and the engineers were building new courses while working around the worst of the existing platform.

Goals

The Six Goals That Defined the Transformation

1. Reduce Day-7 platform attrition to below 15%. This was the first-order problem — if students stayed, the rest of the funnel would have a chance to breathe.

2. Increase course completion from 7% to 40% within two years. Completion — not enrollment — was the true velocity metric for the business. Money, social proof, and press all grew from post-completion engagement, and there was no path to any of those things without a structural downward change in the baseline.

3. Achieve a 99% uptime SLA and sub-400 millisecond median page latency. The platform had passed through three reliability incidents in its first year running Node.js on a single t2.medium instance behind a load balancer. The cost of an outage — in student trust, refunds, and brand — was significantly higher than the cost of the infrastructure that would prevent it.

4. Rebuild the data layer to enable per-student personalization. The existing MongoDB schema tracked course progress as a flat document keyed by user_id with a JSON-encoded list of completed steps. A student who had accomplished the first three lessons of week two but not the first two quizzes of week three appeared record-complete. Personalized recommendations, adaptive pacing, and targeted review prompts were all impossible without the data model to support them.

5. Reduce infrastructure spend per active learner by 50% while doubling concurrent capacity. The team feared for the worst while proposing the best: scaling the existing model cost more as pressures rose, but a cleaner architecture with serverless components could both lower costs and scale far more nimbly.

6. Enable weekly feature releases with no more than one hour of planned downtime per quarter. The existing release practice was a monthly deploy script bundled with database migrations, static assets, and application code into a single archive verified manually by two engineers before going live. Deploys routinely produced side effects that occupied the engineer for two to four hours afterward. The team wanted daily deployments, automated testing, and a routine release process.

Approach

Instead of modular improvements around the edges of a working product, the team chose a systematic, layered approach that tackled root-cause problems in order of leverage rather than perceived impact. They identified four interdependent layers of work that had to happen concurrently for the full system to support an upward rebalance.

Layer 1 — Behavioral Diagnostics Before Architecture

The team’s first hiring decision was a behavioral scientist — before they had hired any new engineers. Her first two weeks were invested entirely in understanding why students left, not just from platform telemetry, but from live chat interviews with 87 students who had churned after their first week, and from an analysis of 2,300 session transcripts reviewing patterns invisible to heatmaps and funnel analysis. The synthesis produced eight behavioral design principles that would guide every component of the rebuild; the team laminated them and taped them to the whiteboard in every corner of the office.

Principle 1: Onboarding should ask only what belongs to the activation moment. The registration and course-selection flows were decoupled completely. Students needed only an email address and preferred language to complete registration. Course selection was deferred until day two, by which point students had already tasted the platform’s UI and invested 30 minutes in introductory content.

Principle 2: Progress must be visible at all times, not just on arrival. The team redesigned the entire course UI around a persistent vertical progress map showing, at a glance, the lessons completed, the lesson currently in progress, and the next two upcoming lessons. The map was visible from every page and students could navigate directly to earlier lessons without leaving the flow.

Principle 3: Every lesson must end before it can begin the next. Quizzes were introduced as mandatory unlock steps — not for gatekeeping’s sake, but to close the feedback and reinforcement loop that students valued in interviews and to provide concrete progress data for personalization.

Principle 4: Notifications should protect a student’s schedule, not invade it. The notification engine was rebuilt around user-selected preferred send times, timezone awareness, and content-relevance scoring. A student who spent the previous three evenings taking lessons between 7 pm and 9 pm received reminders in that 7–9 pm window targeting the same time. Notifications that went unread three times were automatically demoted. Unsubscribed students received no further follow-ups.

Layer 2 — Technology Stack That Enabled the Behavioral Layer

The behavioral design principles drove technology choices rather than the other way around. The team knew they needed the ability to serve personalized content at speed, to handle concurrent sessions from students in six different time zones with correspondingly different connection qualities, and to manage educational state that was substantially more complex than flat progress records.

LayerTechnologyWhy
Frontend runtimeNext.js 12 (React + ISR)Server-side rendering for fast first-paint; incremental static regeneration for course pages updated frequently without full rebuilds; edge CDN for low-latency global delivery.
Application backendGraphQL with Apollo Server + TypeScript microservicesOver-fetching on course content pages was a measurable contributor to mobile latency. GraphQL let the client ask for only what is needed, reducing first-screen load time by 40% on mobile.
Primary data storePostgreSQL via SupabaseRelational model required for personalized pacing, quiz results, and transaction integrity. Supabase added realtime subscriptions and managed auth with minimal ops overhead.
Session store and cacheRedis via UpstashServerless Redis for lesson content and peer discussion caching; reduced DB load and enabled session continuity across stateless workers.
Notification engineSendGrid + trigger.dev workflow jobsReliable transactional email in Southeast Asia; workflow-based scheduling modeled notification logic as code rather than cron jobs, enabling per-user scheduling and depth-capped delivery.
Hosting and CI/CDVercel (frontend) + AWS Fargate (services) + GitHub ActionsOptimized frontend deploys with ISR; containerized services auto-scaled on Fargate per-service; GitHub Actions provided tests, linting, and deployment with environment guards for staging before production.

A secondary benefit of this stack was the reduction in operational complexity it brought to the content team. Supabase realtime subscriptions opened content-review feedback loops between instructors and curriculum developers that no longer required engineering intervention for severity triage.

Layer 3 — The Behavioral Implementation Roadmap

The team broke the 18-month transformation into four distinct phases, each with a clear exit condition that prevented scope creep.

Phase 1 — Diagnostic, Architecture, and Platform Foundation (Months 1–3). The behavioral scientist joined first. The existing event schema was audited. Eight behavioral first principles were defined. A new PostgreSQL schema was designed to reflect the actual educational structure: course, module, lesson, quiz, enrollment, completion record, status. The CDK-based AWS architecture was written, peer-reviewed, and deployed before the first behavioral-prototype wireframe was sketched. The guiding principle: do not build new user-facing features until the telemetry to measure them is in place.

Phase 2 — Session Integrity and Content Infrastructure (Months 4–7). The progressive web-app migration began. Course content was refactored into Next.js ISR pages with proper edge CDN caching. Video loading behavior was redesigned so that each 3-minute lesson loaded as a complete unit with a meaningful loading state before playback started — even on 2G connections. The notification engine was rebuilt and preferences were migrated.

Phase 3 — Personalization and Adaptivity (Months 8–13). The relational schema and analytics pipeline were pervasive enough by month eight to drive real individualization: adaptive pacing, where lesson assignments adjusted based on quiz performance, and personalized recommendation logic that surfaced supplementary content triggered by concrete curriculum milestones. This phase also included the session-level JavaScript SDK rebuild, which removed an order-of-magnitude in client latency on mobile.

Phase 4 — Scale, Reliability, and Platform Hardening (Months 14–18). This phase included the database migration from MongoDB to PostgreSQL — one of the highest-risk steps of the entire program — and a centralized observability stack using Datadog and X-Ray for distributed performance tracking.

Implementation

The Behavioral Diagnostic Phase in Detail

The behavioral scientist asked students to reconstruct their last week moment by moment — not why they left, but what happened to them. Eighty-seven reconstructions revealed a consistent pattern: students opened the platform with genuine intention, got stuck within 3–5 minutes of interaction, closed it frustrated, and were willing to try again only if the motivation to resume was stronger than the irritation of the anticipated friction.

The team restructured onboarding to remove cross-functional questionnaire complexity, protect the student’s first activation session from context-switching, and establish an immediate feedback loop so the first 30 minutes left the student feeling like they had accomplished something measurable. The registration-to-first-lesson flow was cut from a median of 14 clicks and 11 minutes to 3 clicks and under 90 seconds.

Session Recording Architecture Overhaul

The platform had been using a naive client-side analytics middleware that fired HTTP events on every page navigation and video pause. On a 3G connection a single page-load event could queue 5–10 dispatch calls — the browser throttled concurrent outbound requests to 6 — producing analytically noisy and incomplete sessions. The solution was a client-side event queue backed by IndexedDB with a declaratively bounded flush strategy: batched payloads dispatched on intervals or session close, whichever came first. Analytic noise fell from 62% to 6% of total event volume, no measurement precision was lost, and mobile bandwidth utilization dropped 47% on 3G.

Session analytics queue architecture diagram
The revised event-queue architecture batched analytics events locally and flushed them on session close, reducing analytic noise and mobile bandwidth usage by nearly half.

MongoDB to PostgreSQL Data Migration

The old schema treated course progress as a single document per student with an embedded lessons array — structurally incompatible with per-learner adaptive personalization queries. The team designed a fully normalised 18-table PostgreSQL schema: courses, modules, lessons, quizzes, questions, quiz_attempts, lesson_sessions, enrollments, course_reviews, notification_preferences, and event_logs. The migration ran under dual-write mode for 15 days with a nightly reconciliation job on a 5% sample. All 47 inconsistencies out of 2.1 million records were found and corrected before live traffic touched the new database. A query that had cost 3.2 seconds in MongoDB executed in 12 milliseconds in PostgreSQL, flipping individualization from a hypothetical into the dominant feature-engineering approach.

Notification Engine on Behavioral Principles

The new engine was built on trigger.dev workflow jobs running in background containers, isolated from the synchronous request-response cycle. Every notification was tagged with three signals: a student’s preferred delivery hour derived from engagement patterns, a relevance score based on the most recent lesson pause point, and a campaign depth counter. Students who received three unread notifications received no more for 30 days. An A/B experiment across 1,200 students confirmed that preferred-hour sending produced a 3.4x return-to-platform rate, an email open rate rising from 14% to 39%, and an unsubscribe rate below 1%.

Results

The platform transformation ran from May 2021 to October 2022. Every pillar was measured against the diagnostic baseline established before any engineering work began.

Learning Outcome Metrics

MetricBefore → AfterChange
Day-7 platform attrition40% → 8%80% reduction
Course completion rate7% → 51%629% increase
Median session lifetime47 minutes → 4.3 hours/weekQualitatively transformed
Return-to-platform rate (48h of first session)18% → 67%272% increase
Email open rate14% → 39%179% increase
Email unsubscribe rate12% → 0.7%94% reduction
Referral conversion to first paid lesson11% → 48%336% increase

Platform and Technical Metrics

MetricBefore → AfterChange
Platform uptime SLA96.4% (no target set) → 99.97%+350 bps
Median page load time (mobile 3G)3,800 ms → 380 ms90% reduction
P99 page load time12,400 ms → 1,200 ms90.3% reduction
Concurrent session capacity800 → 25,000+3,100% increase
Infrastructure cost per active learner / month$1.42 → $0.6753% reduction
Monthly deploys to production1–2 → 2814x throughput increase
Deployment lead time96 hours → 2.5 hours97% reduction
MongoDB monthly spend$820 → $0 (decommissioned)Eliminated
PostgreSQL monthly spend$480 → $470–2% with 18x query performance

Business and Revenue Results

Monthly recurring revenue in October 2022 was $78,000, up from $8,400 in May 2021 — an 829% compound monthly growth rate. Active paid students moved from 320 to 3,800. The team closed Series B at a $24 million valuation. A key insight came from a navigation map redesign that allowed students to see the full course path before committing to any week of content. The transparency was almost entirely responsible for the course abandonment rates improving enough to make word-of-mouth referrals the primary acquisition engine, which in turn allowed the platform to sustain organic growth at a unit cost of acquisition close to zero for a growing share of new students. NPS moved from 16 to 62.

Metrics

The team defined three tiers of metrics with explicit targets, alerting thresholds, and review cycles to keep the system honest throughout the project.

Tier 1 — Student Outcome SLIs (24/7 Monitoring)

Session completion rate: Percentage of students who start a lesson and complete at least 80% before closing the tab (target: 75%). Became the single most reliable leading indicator of retention.

Time to next engagement: Median hours between end of one session and start of the next (target: <36h).

Course milestone velocity: Average lessons completed per active learner in the first 14 days (target: >4 lessons).

Referral conversion rate: Paying learners who referred another paying learner in the same billing cycle (target: >15%).

Tier 2 — Platform SLOs (Per-Service)

ServiceError Rate & LatencyThroughput
Course Content API (Next.js/ISR)<0.05% → <200ms15,000 RPM
Lesson Session (Fargate)<0.2% → <300ms8,000 RPM
Quiz Evaluation Service<0.1% → <200ms1,200 RPM
Notification Engine<1.0% (retry-tolerant) → <500ms dispatch5,000/hr
Personalization API<0.3% → <150ms (cached)3,000 RPM

Tier 3 — Operational Metrics

Infrastructure cost per 1,000 active students was tracked for a continuous downward trend. Cold start latency for Fargate services was maintained below 900ms using native init containers. Event queue backlog age was reduced from a 40-minute baseline to a consistent 30 seconds during peak windows. Database connection pool utilization was kept below 70% across all services to maintain headroom for traffic spikes.

Lessons Learned

1. Engineering Work Is Behavioral Work

The team’s most consequential mistake was treating technical problems as primarily technical. Two weeks of interviews and data synthesis saved an estimated six months of engineering effort — not by telling engineers how to build, but by telling them where not to build. Invest a fraction of your team’s time in understanding user motivations before major architectural commitments; a week of behavioral audit will redirect months of engineering work more efficiently than any roadmap rework.

2. Schema Is a Behavioral Contract

The old MongoDB schema was not just technically limited — it structurally coded the team’s understanding of education. Don’t move from MongoDB to PostgreSQL because your NoSQL setup is technically wrong. Move because you finally understand enough about your users to need to ask questions your current schema cannot answer.

3. Session Integrity Is a UX Principle, Not a Technical Detail

IndexedDB-backed persistent session state was built because the behavioral team identified it as essential, not because it was a standard UX requirement. Any platform where users return to receive coherent state needs persistent session integrity, and that need must be connected to user-behavior problems, not just technical implementation details: ask “What does it cost the user to lose their state?” before asking “What does it cost us to build it?”

4. Notification Personalization Is Not an Email-Template Tweak

Improving the notification engine was about delivery mechanism and respect for the recipient’s attention economy, not messaging copy. Preferred-hour sending produced a 3.4x engagement lift. The change in platform architecture made it possible. Personalization requires structural support in the infrastructure stack to serve at scale.

5. Observability Should Be a Day-One System, Not a Phase-4 Addition

Six months into the rebuild, the team paused all feature work for four weeks when a two-hour production incident exposed the absence of a structured logging layer. Post-observability, the team shipped more features in less time because the relationship between engineering work and system health was visible for the first time. Don’t defer observability to a backlog.

6. Gradual Rollout Is Professional Prudence

The MongoDB-PostgreSQL migration did not succeed because it was a single cut. It succeeded because dual-write had run for 15 days, a nightly reconciliation job validated 5% of records each night, and the cut was rehearsed in staging three times. One missed edge case can produce hours of P0 incident time, refund and churn exposure, and reputational damage measured in weeks. The boring cut is the good cut.

Conclusion

LearnPath’s 18-month transformation is instructive not for any one specific architectural choice but for the coherence of the approach that led the team to each decision. The platform started with behavioral research, not a technology preference. Engineering choices did not cascade from a preconceived architecture; they accumulated from constraints and evidence. The behavioral layer drove the technical layer, the technical layer enabled the behavioral layer, and the team did not let those cycles fall out of alignment.

The result was a platform that was at once technically lean, architecturally coherent, educationally effective, and sustainable. Active learner engagement grew 629% without doubling infrastructure spend. Student outcomes improved. NPS tripled from 16 to 62. A team of eight engineers, working within a runway that at launch had seemed insufficient, delivered every milestone they set for themselves.

For teams navigating similar challenges — platform rebuilds with real behavioral consequences, users who disengage for reasons that no aggregate dashboard can capture — the LearnPath result demonstrates a simple truth: the most powerful architecture you can build is one that understands the human system it sits inside before it defines the technical shape of that system.

Key Takeaways For Your Next Transformation

One: Use behavioral research as a compass, not just a funnel. One week of moment-by-moment user reconstruction will redirect months of engineering work more efficiently than any roadmap rework.

Two: Schema is a behavioral contract. If your data model does not let you ask the questions you need answered, your architecture is blocking you from the insight that drives your product.

Three: Observe from day one. Building a proactive observability stack after you have a production incident costs more — in engineer attention, repair time, and trust — than building it before the incident.

Four: Notification engines are respect machines. Build personal delivery into the architecture before you scale it. Deliver when your user chooses, not when your scheduler decides.

Five: The boring cut is the good cut. Dual-write. Reconcile every night. Rehearse until rollover is unremarkable.

Six: Learning engagement and infrastructure cost can move together. Architectural simplification combined with better learner outcomes made every dollar go further — a result worth building for.

Related Posts

How PayStream Migrated from Monolith to Microservices and Cut Transaction Latency by 62% in 9 Months
Case Study

How PayStream Migrated from Monolith to Microservices and Cut Transaction Latency by 62% in 9 Months

PayStream, a fast-growing Bangalore-based digital payment infrastructure company processing Rs 2,400 crore in annual Gross Merchant Value, faced a decisive architectural inflection point in mid-2024. Their decade-old Ruby on Rails monolith, which had successfully powered the platform through the first three years and over a million transactions, had become the single most-cited constraint across product leadership, enterprise sales, and engineering standups alike. Checkout latency had climbed from 420ms in early 2022 to 890ms by June 2024, directly correlating with a cart-abandonment spike from 18.2% to 21.1% over the same period. Meanwhile, development velocity had deteriorated to the point where a feature formerly shipped in four weeks now required three months and eight engineers, and a 2023 attempt at horizontal scaling — an eight-dyno increase — had yielded only three weeks of headroom before diminishing returns made further scaling uneconomical. Against that backdrop, PayStream's CTO and VP Engineering set five IKRs, anchored by a target to reduce checkout P95 latency from 890ms to under 500ms, achieve 99.99% uptime, and enable squad-level independent deployments—all within a nine-month window.

How PayNest Built a Sub-Millisecond Payment Engine to Process $4B in Annual Transactions
Case Study

How PayNest Built a Sub-Millisecond Payment Engine to Process $4B in Annual Transactions

When PayNest, a fast-growing Indian fintech startup processing 200,000 daily transactions, faced a 5% failure rate during UPI spike windows and a looming PCI DSS compliance deadline, they had just three months to rebuild their payment processing core before a mandatory audit. Against merchant churn risk and a reconciliation engine that collapsed mid-run every night, the engineering team chose a disciplined strangler-fig route over a greenfield rewrite — introducing event-driven domain boundaries, idempotency enforcement, and observability before the first new service shipped. This case study covers the nine-month journey: from PCI scope isolation and DynamoDB-based idempotency enforcement, through the four-stage event-driven reconciliation engine that slashed nightly batch duration from 18 hours to 42 minutes, to the staged traffic migration that caught a floating-point settlement discrepancy before it ever reached production customers. The result: a fintech backbone designed to handle 10× projected transaction volume at 37% lower monthly cost, with idempotency as the single most consequential architectural decision made.

How PropStack Scaled to 50,000+ Concurrent Users — A Microservices Journey
Case Study

How PropStack Scaled to 50,000+ Concurrent Users — A Microservices Journey

When a real estate SaaS startup hit a wall at 1,000 concurrent users, they engaged Partners Tech to rebuild their monolith into a resilient, event-driven microservices platform. Exhausted queues, Cassandra migrations, and Kubernetes — read how they reached 99.97% uptime and cut infrastructure cost by 42% in under six months. Here's everything we learned, from the mistakes we made to the decisions that actually mattered.