From Legacy Systems to Modern Commerce: How Webskyne Engineered a 10M+ User Logistics Platform
When a major logistics provider approached Webskyne in early 2024, their monolithic dispatch system was buckling under 5x traffic growth and frequent outages during festival seasons. Over the next eight months, we architected and delivered a distributed, mobile-first logistics platform using Flutter, Next.js, NestJS, and AWS — cutting API latency by 62%, eliminating major outages, and enabling real-time driver tracking for over ten million users. This case study walks through the full journey: from the messy legacy audit and ambitious goal-setting, through our phased implementation strategy, to the measurable business outcomes and the hard-won engineering lessons that now inform every major engagement we take on.
Case StudyCase StudyLogistics PlatformFlutterNestJSNext.jsAWSMicroservicesSystem Migration
## Overview
In January 2024, Webskyne was brought in by a mid-sized logistics provider serving South and Southeast Asia to rethink their core customer-facing technology stack. The incumbent system was a five-year-old monolith running on bare metal, with two legacy mobile apps that had not been updated in eighteen months, and a backend API layer that became increasingly brittle during peak demand windows. The client’s business was simultaneously expanding — driven by cross-border e-commerce partnerships and a growing network of last-mile delivery partners — which meant the existing architecture could no longer be incrementally patched.
Our mandate was clear: design and deliver a modern, scalable, and maintainable logistics platform that could support growing user demand, real-time tracking, and a seamless omnichannel experience without recurring architectural car crashes during festive seasons.
---
## Challenge
The client was experiencing a growing disconnect between what their business needed and what their technology could reliably provide. Customer support tickets related to tracking delays and failed payments had doubled year-over-year, and the engineering team was spending more than half of its sprint capacity on firefighting rather than feature delivery.
Specifically, the system suffered from three foundational issues:
1. **Monolithic Architecture Bottlenecks.** A single Java monolith handled API routing, order management, driver dispatch, and payment processing in one process. During peak logins, the database became the single hotspot for cross-cutting concerns. Any code change risked bringing down adjacent modules.
2. **Fragile Mobile Experience.** Two separate native codebases — one for customers and one for delivery partners — made it difficult to roll out consistency fixes, bug patches, or design updates in parallel. An update to one store often lagged the other by weeks, creating divergent feature sets.
3. **Observability and Incident Response Gaps.** Logging was siloed across on-prem databases, RabbitMQ, and a third-party SMS gateway. There was no centralized tracing, so root-cause analysis for customer-visible issues routinely took hours rather than minutes.
In addition, the business had aggressive expansion targets: entering three new countries in twelve months, integrating with two new e-commerce marketplaces, and supporting 10 million monthly active users by the end of 2025.
---
## Goals
We established clear, measurable goals at kickoff to keep the project accountable to both business and technical outcomes.
- **Scalability:** Support 10 million monthly active users and 5x peak transactional throughput without degradation in P95 API latency.
- **Reliability:** Achieve 99.95% availability across the platform during high-traffic windows (festive seasons handled as a dedicated load scenario).
- **Omnichannel Consistency:** Launch feature-parity customer and driver apps within two weeks of each backend release.
- **Operational Excellence:** Reduce mean time to recovery (MTTR) for critical incidents from hours to under fifteen minutes.
- **Cost Discipline:** Maintain infrastructure cost growth linearly with user volume, targeting no more than a 1.2x cost increase for every 2x traffic increase.
These goals gave us non-negotiable guardrails. Any architectural decision that would compromise one of these areas automatically triggered a review meeting with the client’s technical leadership.
---
## Approach
Rather than a big-bang rewrite, we proposed a phased strangler-fig migration. This allowed the client to keep running their existing monolith while new services were validated in production.
**Phase 1 — Architecture and Platforming.** We conducted a full architectural audit, defined bounded contexts, and stood up the foundational infrastructure on AWS. Our team worked closely with the client’s senior engineers to understand domain logic deeply and avoid surprising them with a black-box replacement.
**Phase 2 — Core Services Migration.** We extracted the first bounded context — real-time driver tracking and dispatch — into a new NestJS microservice pattern backed by PostgreSQL and Redis, all running on AWS ECS with autoscaling policies driven by CloudWatch metrics.
**Phase 3 — Mobile Parity and UX Refresh.** Using Flutter, we delivered a single codebase that produced both the customer and driver apps. We adopted a shared design system to ensure UI consistency and built feature toggles so newer modules could be rolled out independently.
**Phase 4 — Observability and Gradual Cutover.** We instrumented every service with OpenTelemetry, set up centralized dashboards in Grafana, and performed a gradual traffic shift using feature flags and canary deployments. This ensured the client’s operations team built confidence before 100% cutover.
---
## Implementation
The implementation phase was where theory met production reality. Let’s walk through the key technical decisions and how we executed them.
### Backend: NestJS Microservices
We chose NestJS for its opinionated, TypeScript-first structure and mature support for microservice patterns, including Kafka integration, distributed tracing, and built-in validation. The driver-tracking service became the first bounded context we extracted.
We designed a command-and-query separation using a lightweight CQRS pattern. Write paths (location updates, dispatch events) went through a Kafka pipeline, keeping the primary PostgreSQL cluster from becoming a hotspot during peak load. Read paths were served via Redis-backed caches with sub-second TTLs tuned to the expected frequency of driver movement.
Amazon ElastiCache (Redis) handled session state, rate limiting counters, and a 30-second location cache. AWS RDS (PostgreSQL) remained the system of record for orders, payments, and driver metadata.

### Mobile: Flutter Single Codebase
We delivered two applications — ShipTrack for customers and DriveMate for delivery partners — from a single Flutter repository using shared business logic packages and platform-specific wrapper modules.
Customer-facing features included real-time order tracking via Mapbox, push notifications using Firebase Cloud Messaging, and an in-app wallet with transaction history. The driver app introduced route optimization suggestions using Google’s Directions API, a simplified two-step OTP-based delivery verification, and offline-first synchronization so drivers could capture evidence and scan parcels even when connectivity was intermittent.
We defined a feature toggle layer using a custom RemoteConfig implementation. This allowed us to deploy updates to both platforms simultaneously but enable features for only a subset of geographies or user segments during rollout.
### Frontend: Next.js Dashboard
The operations dashboard for fleet managers and support teams was built on Next.js 14 with the App Router. Server components rendered frequently accessed fleet metrics to reduce client hydration costs, while API routes acted as a thin gateway between the frontend and NestJS backend services.
We implemented role-based access using Casbin policies at the API gateway level and incorporated SSO via the client’s existing Azure AD tenant. This ensured the dashboard was secure and familiar for internal users.
### Infrastructure and DevOps
We provisioned the environment using Terraform modules stored in the client’s private registry. ECS Fargate was chosen for the NestJS services to avoid patching overhead, and ALB with path-based routing directed traffic to the appropriate service.
CI/CD pipelines were built on GitHub Actions. Every merge to main triggered linting, unit tests, integration tests, image builds, and staged rollouts to non-production environments. The client’s team inherited full ownership of these pipelines within the first month of engagement.
We also introduced Chaos Monkey-style testing in staging for critical paths using Gremlin. While this sounded aggressive initially, the client’s operations team later described it as the catalyst for their increased confidence in production.
---
## Results
By the end of the eight-month engagement, the platform had been fully migrated and was serving production traffic for the client’s entire user base.
Customer satisfaction scores rose from an average of 3.1 out of 5 to 4.6 out of 5 within the first quarter of launch, driven largely by the elimination of tracking delays and the introduction of proactive delivery notifications. The client’s operations team reported a 70% reduction in escalation tickets related to tracking or payment failures.
Engineers who had previously resigned themselves to an estimated six months of improvement work before they could deliver new features suddenly found themselves shipping customer-requested modules in weeks rather than months. Team morale, measured in anonymous quarterly surveys, shifted significantly, with 82% of engineers reporting they found their work “meaningful and impactful” compared to 31% the previous year.
On the business side, the platform’s reliability during the festive season — historically the most problematic period — became a marketing asset. The client used their near-perfect uptime record to close two new e-commerce marketplace integrations earlier than expected.
---
## Metrics
Here is a quantitative view of the improvements we observed during and after the migration.
| Metric | Baseline (Pre-Engagement) | Post-Launch | Change |
|--------|---------------------------|-------------|--------|
| Peak API P95 Latency | 820ms | 310ms | -62% |
| Monthly Active Users Supported | 2.1M | 11.4M | +443% |
| Platform Availability | 99.2% | 99.97% | +0.77pp |
| Critical Incidents (Per Quarter) | 6 | 1 | -83% |
| MTTR | 4.2 hours | 11 minutes | -96% |
| Infrastructure Cost per 1000 deliveries | $14.20 | $9.80 | -31% |
These numbers are not merely vanity metrics. They represent real money saved, real customer trust gained, and real engineering time reclaimed from unplanned work.
---
## Implementation Timeline
To give a sense of pace, the project unfolded across six production milestones:
**Weeks 1–4:** Discovery, domain modeling, and environment setup. We interviewed every stakeholder and mapped seven critical user journeys.
**Weeks 5–10:** First NestJS service (driver tracking) released to internal beta with 5% production traffic.
**Weeks 11–18:** Flutter mobile apps launched to beta customers, feature toggle layer deployed, and Redis caching tuned for peak load.
**Weeks 19–26:** Dashboard rolled out to operations teams, autoscaling policies pressure-tested, and observability stack fully instrumented.
**Weeks 27–32:** Phased production cutover completed. 100% traffic routed to new backend services. Monolith decommissioned.
**Weeks 33–36:** Post-launch optimization, knowledge transfer workshops, and client handover with ongoing L3 support available on standby.
---
## Lessons
This engagement surfaced several lessons that have since shaped how Webskyne approaches large-scale platform rebuilds.
**1. Strangler Fig Beats Big Bang.** Incremental extraction of bounded contexts protected the client’s revenue stream and gave the engineering team confidence that any failure could be contained. We would not do a big-bang rewrite again under similar constraints.
**2. Shared Codebases Reduce Technical Debt Faster Than Documentation.** Moving the client’s two native apps to a shared Flutter codebase eliminated an entire category of inconsistencies. Documentation helps, but aligned code paths enforced parity automatically.
**3. Observability Is Not a Post-Launch Phase.** Investing in OpenTelemetry, structured logging, and dashboards from day one transformed what could have been a chaotic week of debugging into a controlled rollout. Instrumentation is infrastructure, not overhead.
**4. Introduce Chaos Early.** Gremlin testing in staging exposed single points of failure that manual reviews had missed. A controlled blast radius is far preferable to an uncontrolled one in production.
**5. Knowledge Transfer Begins on Day One.** Our engineers paired with the client’s team from week one, not week thirty-six. By the time the engagement concluded, the client’s in-house engineers were fully capable of operating, scaling, and extending the new platform without our direct involvement.
---
## Conclusion
The success of this logistics platform stands as a testament to what is possible when strategic patience, rigorous engineering, and close client collaboration come together. We did not simply deliver a new system — we delivered a more resilient operation, a more engaged engineering team, and a stronger competitive position for our client in an increasingly demanding market.
For organizations facing similar modernization challenges, the most important lesson may be this: the question is rarely whether to modernize, but how to do it in a way that safeguards the business while unlocking new possibilities. Webskyne continues to refine this approach on every engagement, and we are eager to help your team navigate the same transformation.