How Webskyne Reduced API Latency by 62%: A Complete AWS-NestJS Migration Case Study

When a mid-size SaaS platform struggled with 800ms+ API response times and escalating infrastructure costs, Webskyne led a full-stack migration from a monolithic Express.js architecture to a modern NestJS microservices stack on AWS. This case study walks through the discovery process, architectural decisions, implementation challenges, and measurable results that delivered a 62% latency reduction while cutting monthly infrastructure spend by 40%.

## Overview In early 2025, a rapidly growing SaaS platform serving 120,000+ monthly active users found itself at a critical inflection point. Their legacy Express.js monolith, which had served faithfully since launch, was now a performance bottleneck and operational liability. API response times had crept up to 800+ milliseconds during peak hours, database queries were timing out under moderate load, and the team's ability to ship new features had slowed to a quarterly cadence. Webskyne was engaged to conduct a full architectural audit and lead the migration to a modern, scalable stack. Over six months, we redesigned the backend from a single monolithic Express server into a modular NestJS microservices architecture deployed across AWS ECS, RDS, and CloudFront. The transformation resolved the immediate performance crisis while establishing a foundation for the next phase of growth. ## The Challenge The client's technical debt had accumulated over four years of rapid feature development. Six engineers worked on the same codebase, with overlapping logic, unclear service boundaries, and minimal automated testing. The production database had grown to 2.3 terabytes without a formal archiving strategy. Network calls between loosely coupled modules traversed the same process space, creating unpredictable failure cascades during traffic spikes. Compounding the architecture issues were operational challenges. Deployments required coordinated downtime windows. Rollbacks were risky and time-consuming. Monitoring was limited to basic server metrics-no distributed tracing, no structured logging, no alerting on business-critical endpoints. The team had visibility into whether the server was running, but not into whether the business was functioning correctly. The operational cost trajectory was unsustainable. The client was paying roughly $12,000 monthly for a single large EC2 instance with an oversized RDS cluster that was severely underutilized during off-peak hours yet saturated during peak periods. The pricing model rewarded overprovisioning; there was no cost-aware auto-scaling strategy. ## Project Goals We established four primary goals at the project kickoff, each with specific, measurable success criteria. **Performance:** Reduce p95 API latency from 800ms to under 300ms across all core endpoints. This target was based on industry benchmarks for similar SaaS workloads and represented a realistic but ambitious improvement. **Scalability:** Achieve horizontal scalability so that the system could handle a 3x traffic increase without requiring proportional infrastructure cost increases. The architecture needed to support independent scaling of high-traffic and low-traffic services. **Developer Experience:** Reduce deployment time from a coordinated three-hour window to under fifteen minutes for standard changes. Introduce a full CI/CD pipeline with automated testing, and reduce the bug escape rate to production by 80%. **Cost Efficiency:** Reduce monthly infrastructure spend by at least 30% through right-sizing, auto-scaling, and better resource utilization. Target: $8,000 monthly run rate within twelve weeks of stabilization. ## Our Approach Webskyne's methodology began with a two-week discovery phase. We instrumented the existing system with APM tools, captured real production traffic patterns, and mapped every inter-service communication path. We identified that 60% of requests were hitting a relatively small set of hot database rows, while another 25% involved computationally expensive operations that were blocking the event loop in the Express app. Our technical strategy centered on three pillars: modularization, observability, and cloud-native deployment patterns. We chose NestJS as the framework because of its opinionated architecture, built-in dependency injection, and excellent support for microservices via both TCP and Redis transports. TypeScript's strong typing aligned with the client's need for long-term maintainability, and NestJS's decorator-based approach made gradual migration possible-new features could be built in NestJS alongside legacy endpoints without a big-bang cutover. For data persistence, we introduced a read-replica strategy. Write operations remained on the primary RDS instance, but read-intensive queries were routed to a smaller replica instance. We implemented a cache-layer using Redis for frequently accessed reference data, reducing database load by an estimated 40% within the first month. On the deployment side, we containerized each service using Docker and orchestrated via AWS ECS with Fargate launch type. This eliminated the need to manage EC2 instances directly and allowed per-service auto-scaling based on CloudWatch metrics. CloudFront served as the CDN and WAF layer, protecting against common attack vectors while accelerating static asset delivery. ## Implementation ### Phase 1: Foundation and CI/CD (Weeks 1-3) The first phase established the scaffolding that made everything else possible. We set up a multi-environment deployment pipeline using GitHub Actions: each pull request triggered a lightweight test suite in a staging environment; merges to main promoted to production. Secrets were managed via AWS Secrets Manager with automatic rotation for database credentials. We introduced a shared NestJS library for common concerns-logging, authentication middleware, error handling, and API response formatting. This library enforced consistent patterns across services and reduced boilerplate by an estimated 70% in the first service built with it. We also implemented structured logging using Pino, with JSON-formatted log streams routed to CloudWatch Logs. Every request received a unique trace ID that propagated through all downstream services, enabling distributed trace reconstruction in X-Ray. ### Phase 2: Core Service Migration (Weeks 4-10) We identified three domains for our first migration wave: user management, billing, and notifications. Each domain became an independent NestJS service communicating via a Redis message broker. The user management service handled authentication and profile operations, serving as the foundation for the other two. The migration followed a strangler fig pattern-we deployed the new NestJS service alongside the existing Express endpoints and used API Gateway routing to gradually shift traffic. This allowed us to validate correctness under real production load before fully retiring legacy routes. The most technically challenging aspect was the billing service. The existing implementation interleaved complex invoice calculation logic with database operations in ways that were difficult to disentangle. We introduced an event-driven architecture: mutations to billing records emitted domain events, which the notifications service consumed asynchronously. This decoupling eliminated the tight coupling that had made the legacy code so fragile. ### Phase 3: Performance Optimization and Hardening (Weeks 11-16) With the core services running stably, we focused on the performance targets. We implemented a multi-level caching strategy: in-memory LRU caches within each NestJS service for frequently accessed reference data, Redis for cross-service caching, and CloudFront edge caching for read-only API responses where appropriate. Database optimizations included composite indexing on high-traffic query paths, query result caching at the application layer, and the introduction of read replicas. We also implemented connection pooling via Prisma, which dramatically reduced connection overhead under concurrent load. For the NestJS services, we leveraged NestJS's built-in caching module and Scoped providers to minimize instantiation overhead in high-throughput code paths. We also configured the Node.js garbage collector for better throughput in long-running processes. ### Phase 4: Observability and Cost Reduction (Weeks 17-24) The final phase focused on operational excellence and cost optimization. We built a comprehensive observability stack: X-Ray for distributed tracing, CloudWatch Alarms for proactive incident detection, and a custom dashboard in Grafana that unified metrics from all services. Cost optimization was achieved through right-sizing ECS task definitions based on actual resource utilization rather than guesswork, implementing ECS Service Auto Scaling with target tracking policies, and migrating batch workloads to AWS Lambda where feasible. Storage lifecycle policies moved historical logs to S3 Glacier after thirty days, reducing log retention costs by 80%. ## Results The results exceeded our initial targets across every dimension. **Latency:** The p95 API latency dropped from 800ms to 298ms-a 62% reduction. The p99 latency improved from 1.8 seconds to 650ms, meaning that even the worst-performing endpoints were significantly more reliable. These improvements were achieved without any changes to the client-facing applications, making the transition seamless for end users. **Scalability:** The system now handles 3.5x the original traffic volume during peak periods (Black Friday 2025 was the first major stress test). The auto-scaling configuration responded within two minutes of load increases, and the system sustained a 23% traffic spike for three consecutive days without degradation. **Developer Productivity:** Deployment time decreased from three hours to eight minutes. The CI/CD pipeline runs 847 unit tests and 62 integration tests in under four minutes. Code review turnaround time decreased by 40% because the modular architecture made code ownership clear and change impact easier to assess. **Cost Efficiency:** Monthly infrastructure costs dropped to $7,100-a 41% reduction from the original $12,000. The combination of Fargate's serverless containers, right-sized task definitions, and Lambda for batch jobs eliminated the overprovisioning that had characterized the previous infrastructure. ## Key Metrics | Metric | Before | After | Improvement | |--------|--------|-------|-------------| | p95 API Latency | 800ms | 298ms | -62% | | p99 API Latency | 1.8s | 650ms | -64% | | Monthly Infra Cost | $12,000 | $7,100 | -41% | | Deployment Time | 3 hours | 8 minutes | -96% | | Peak Concurrent Users | 4,500 | 15,750 | +250% | | Bug Escape Rate | 18% | 3.2% | -82% | | Mean Time to Recovery | 4.2 hours | 28 minutes | -89% | | Database Connections (avg) | 340 | 85 | -75% | ## Lessons Learned **Incremental migration beats big-bang replacement.** The strangler fig pattern allowed us to prove correctness under real production conditions and minimized risk. Had we attempted a full cutover, the likelihood of significant downtime and data inconsistencies would have been substantially higher. **Observability is not optional.** We could not have achieved the latency targets without the visibility provided by X-Ray and structured logging. The difference between ` + "`" + `something feels slow` + "`" + ` and ` + "`" + `this specific database query on the billing service is taking 400ms` + "`" + ` is the difference between weeks of investigation and hours. **Caching is a multiplier, not a fix.** Our caching strategy amplified the benefits of every other optimization. By reducing database load through caching, we freed resources that allowed the remaining un-cached queries to execute faster. Caching made our architectural improvements compound. **Type safety pays dividends in complex domains.** NestJS's reliance on TypeScript's type system caught hundreds of potential bugs at compile time that would have surfaced as production incidents in the loosely typed JavaScript codebase. The initial learning curve for the team was real, but the reduction in null reference errors and type mismatches paid for itself within the first month. **Right-sizing requires data, not intuition.** Our initial Fargate task definitions were still too large because we had relied on vendor recommendations rather than actual utilization data. After two weeks of monitoring real CPU and memory usage patterns, we reduced task sizes by 60% without sacrificing performance. ## Conclusion This migration demonstrated that a well-executed backend modernization project can simultaneously achieve aggressive performance targets, reduce operational costs, and improve team velocity. The six-month investment delivered compounding returns: the platform is now positioned to handle the next phase of customer growth without requiring proportional increases to the engineering or operations budget. The client's engineering team, initially skeptical about the scope of changes, became advocates for the modular architecture. Within three months of completing the migration, they had shipped three major features that would have required significant coordination in the legacy monolith-features that contributed directly to a 40% increase in paid subscriptions. For engineering leaders facing similar performance and scalability challenges, this case study affirms that the path forward is not necessarily a complete rewrite. With the right methodology, the right team, and the right technology choices, a targeted migration can deliver transformative results without disrupting the business. --- *This case study was produced by the Webskyne editorial team. For technical inquiries or to discuss similar projects, contact the Webskyne engineering practice.*

How Webskyne Reduced API Latency by 62%: A Complete AWS-NestJS Migration Case Study

Related Posts

How a Tier-2 Indian Bank Cut Digital Onboarding Time by 68% with Flutter, NestJS, and AWS

How a Regional Logistics Startup Cut Delivery Costs by 34% With a Single Architecture Change

From Legacy Monolith to Micro-Frontends: How a B2B SaaS Platform Cut Deployment Time by 78%