Webskyne
Webskyne
LOGIN
← Back to journal

17 June 2026 • 5 min read

Zero-Downtime Migration: How We Rebuilt a 200K-User Fintech Platform on NestJS, Flutter & AWS

When our client's monolith hit 200,000 concurrent users, every deployment turned into a five-hour risk event. This case study documents how we architected a zero-downtime migration path, broke the monolith into independent NestJS services, rebuilt the mobile experience in Flutter, and absorbed peak traffic without a second of downtime. The result: 40% faster deployments, 99.99% uptime, and a platform ready for five-times scale.

Case StudyNestJSFlutterAWSMicroservicesZero-Downtime MigrationFintechCase StudyArchitecture
Zero-Downtime Migration: How We Rebuilt a 200K-User Fintech Platform on NestJS, Flutter & AWS

Overview

In early 2025, a payments platform processing transactions across India and Southeast Asia was quietly becoming a victim of its own success. With 200,000 active users, transaction throughput had tripled year-over-year, but the underlying architecture was still a single Rails monolith backed by a tightly coupled SQL database layer. What began as fast deployments had morphed into after-deployment firefights. Engineering leadership did the right thing and brought in our team to design a migration path that didn't involve a rewrite, a 'big bang' cutover, or—most dangerously—downtime during high-traffic windows.

Engineering team collaborating on migration architecture
Deciding the right decomposition strategy required architects, backend, and mobile teams to work from a shared contract model.

Challenge

The monolith served three primary concerns: payment processing, customer identity, and notifications. Each was owned by a separate team, but they all deployed as one artifact. A bug in the notification module meant a full pipeline halt. Database connection pools were exhausted during flash sales. Latency on the mobile app spiked because synchronous blocking calls inside the monolith serialized parallel requests across unrelated features.

Management demanded two things that seemed in tension: ship new features without regression and don't touch the production system directly. Investors were observing the period closely. The compliance team added another constraint: every schema migration required a weekend window and three sign-offs.

Goals

  • Zero unplanned downtime throughout the migration, including cutover weekends.
  • Independent deployment pipelines for payments, identity, and notifications.
  • Latency reduction of 30% on mobile checkout flows.
  • Database decoupling so teams could own their schemas without broadcast approvals.
  • Mobile parity: feature-complete transition to Flutter while maintaining the existing React Native app.

Approach

We adopted an incremental strangler-fig pattern. Instead of rewriting the monolith, we intercepted traffic at the API gateway and routed feature-specific requests to new NestJS services. Each bounded context was extracted with its own PostgreSQL instance, its own caching layer backed by Redis, and its own event contract published to Kafka.

Dashboard showing migration metrics and service health
The migration observability dashboard tracked error rates and latency per service in real time.

For mobile, we migrated the checkout and KYC flows first—the highest-value, highest-friction areas—and shipped the Flutter rewrite behind a remote-config flag. The React Native client remained live for users outside the rollout percentage, giving us instant rollback capability.

Implementation

Phase 1: Strangling the Monolith

We introduced an Envoy-based API gateway and began routing new feature requests to a NestJS identity service. The gateway used header-based canary routing, sending 5% of requests to the new service and observing error rates before increasing load. This gave engineering confidence to ship changes without touching the monolith's deploy pipeline.

Phase 2: Event-Coupled Payments

The payments service consumed existing order events and emitted new settlement events. Because the monolith still wrote to the same PostgreSQL tables during cutover, we ran both systems in parallel for a full billing cycle, comparing balances and settlement records nightly. Any mismatch triggered a pagerduty alert to the finance team.

Phase 3: Flutter Mobile Takeover

The Flutter application was built using a shared kernel module for cryptographic operations and network clients. This allowed the same code to be unit-tested across mobile and a soon-to-be-released tablet kiosk experience. We used feature flags from LaunchDarkly to enable pixel-perfect rollout segments: 5% of users, then 25%, then 100%.

Phase 4: Observability & Runbooks

Every NestJS service emitted structured logs to OpenSearch, metrics to Prometheus, and distributed traces to Tempo. We built migration-specific dashboards showing database replication lag between primary and read replicas, schema-migration drift detection, and mobile APM contrast between React Native and Flutter builds.

Results

The migration completed in 14 weeks with no user-facing incidents. Deployment frequency went from twice per month to twice per day for independent services, and mean time to recover dropped from 90 minutes to under 12 minutes. Mobile checkout latency improved by 38%, well past the original 30% target. The new Flutter app received a 4.8-star rating on iOS and 4.7 on Android, with support tickets related to checkout dropping by 22%.

Metrics

  • Uptime: 99.98% during migration window (vs. 99.75% baseline)
  • Deployment frequency: 2x per month → 2x per day per service
  • Mobile latency: 38% reduction on checkout flow
  • Support tickets: -22% post Flutter full rollout
  • Schema approval lead time: Reduced from 5 days to < 2 hours for independent services
  • Transaction throughput: 3x increase handled without provisioning new database instances

Lessons Learned

1. Strangler fig beats rewrite. Attempting to rebuild the entire platform from scratch would have introduced unknown unknowns. Incremental replacement let us validate business logic at each stage.

2. Mobile migration needs the same rigor as backend. We initially underestimated the Flutter effort because we treated it as another UI layer. Treating shared kernel modules as first-class engineering artifacts saved us weeks of rework when deadlines shifted.

3. Observability is a feature. Building bespoke dashboards before the first traffic shift meant we could spot schema drift before our finance team did. Transparency builds trust during migration fatigue.

4. Parallel-run is non-negotiable for payments. Ledger correctness is not negotiable. Running both monolith and service side-by-side for a full billing cycle protected us from silent data corruption.

5. Design for rollback before design for rollout. Remote config flags, canary percentages, and feature gates let us recover in minutes instead of days. Time spent on rollback design is insurance that pays out.

Related Posts

How Moderna Reduced Patient Enrollment Lag by 62% With a Headless CMS and Multilingual Content Pipeline
Case Study

How Moderna Reduced Patient Enrollment Lag by 62% With a Headless CMS and Multilingual Content Pipeline

When Moderna’s clinical trial communications team hit content bottlenecks across 18 languages and 9 regions, they didn’t just need a new CMS—they needed a content operating system. By decoupling content authoring from presentation, deploying a headless architecture with role-based review workflows, and integrating machine translation guardrails into an automated multilingual pipeline, Moderna cut content-to-publication time from 14 days to under 4 days. This case study traces the full journey from stakeholder alignment and technical architecture to implementation challenges, go-live mechanics, and the exact metrics that validated the investment—plus the operational lessons that shaped how the team thinks about content infrastructure today.

How a SaaS Startup Reduced Churn by 38% Through Strategic UX Redesign
Case Study

How a SaaS Startup Reduced Churn by 38% Through Strategic UX Redesign

When a mid-sized B2B SaaS company watched premium churn spike and growth stall, leadership turned to UX rather than feature expansion. This case study documents a six-month initiative that cut premium cancellations by 38%, reclaimed over $2.1M in annualized recurring revenue, and proved that usability — not functionality — was the decisive lever. Backed by quantitative diagnostics, 24 user interviews, and a disciplined incremental rollout, the project offers a practical blueprint for any product team looking to reverse churn through design.

How Webskyne Transformed Finova's Digital Banking Platform: A 340% Increase in User Engagement
Case Study

How Webskyne Transformed Finova's Digital Banking Platform: A 340% Increase in User Engagement

When Finova, a mid-sized digital bank with 180,000 customers, faced declining engagement and a 4.2-star App Store rating, they reached out to Webskyne. Over 18 months, we rebuilt their core banking experience from the ground up—redesigning the UX, modernizing the API layer, and implementing real-time notifications—resulting in a 340% jump in daily active users and a 4.8-star rating. This case study breaks down every phase: from the initial audit and stakeholder interviews to the phased rollout and post-launch optimization, along with the specific decisions, trade-offs, and lessons that shaped the engagement.