From Monolith to Cloud-Native: How Pinnacle Retail Scaled to 3 Million Users with Serverless Architecture
When Pinnacle Retail's legacy commerce platform buckled under Black Friday traffic, Webskyne architected a serverless migration using Next.js, NestJS, and AWS Lambda that cut page load times by 70% and sustained 12,000 concurrent checkouts without a single outage. This case study unpacks the full architecture, migration strategy, and six-month business impact — from the strangler fig execution to the 52% infrastructure cost reduction and the operational lessons that changed how we build cloud-native systems forever.
Case StudyAWSNext.jsNestJSServerlessE-CommerceCloud MigrationMicroservicesPerformance
## Overview
In late 2024, **Pinnacle Retail** — a growing mid-market fashion brand — found itself at an inflection point. Their ten-year-old monolithic e-commerce platform, built on a legacy LAMP stack with a custom checkout engine, was struggling to keep pace with surging demand. Page response times had crept up to 4.2 seconds during peak traffic, and their Black Friday 2023 sale ended with a 45-minute platform outage that cost an estimated $280,000 in lost revenue and damaged brand trust.
The executive team knew that incremental optimisations would no longer suffice. They needed a fundamentally new architecture — one that could scale elastically, give engineering teams deployment autonomy, and meet the performance expectations of a mobile-first customer base. After evaluating several firms, they selected Webskyne to lead the transformation based on our track record with serverless migrations and our transparent, risk-mitigated approach to legacy system modernization.
## Challenge
The existing architecture presented several interlocking problems that compounded each other as traffic grew.
**A brittle checkout pipeline.** The checkout flow was a tightly coupled monolith deployed on a single large EC2 instance behind an Elastic Load Balancer. There was no circuit breaking, no async processing, and no ability to scale individual stages independently. During traffic spikes, the inventory reservation and payment gateway services starved each other for MySQL database connections, causing cascading failures that took the entire platform offline.
**A mobile experience that was an afterthought.** The storefront was a jQuery-based site rendered on the server with full page refreshes. Core Web Vitals scores hovered in the "poor" category, with LCP (Largest Contentful Paint) regularly exceeding 4 seconds on 4G networks. The mobile conversion rate was 35% lower than desktop, and cart abandonment on mobile devices had reached 78%, well above the industry average of 60%.
**Operational drag.** The ops team was spending over 30 hours per week on patching, capacity planning, and firefighting. They had no automated rollback mechanism, releases happened intermittently, deployments happened bi-weekly in maintenance windows that required senior engineers to stand by for hours. The on-call rotation had become a major source of team burnout.
**Technical debt accrual.** Over years of rapid growth, the codebase had accumulated 147 open Jira tickets tagged "technical debt." Critical security patches were backlogged, and unit test coverage had dropped to 23%. Refactoring risked breaking fragile integrations with third-party logistics providers and payment processors.
## Goals
The leadership team at Pinnacle Retail outlined three non-negotiable objectives for the migration, each tied to measurable business KPIs:
1. **Eliminate single points of failure** in the checkout and inventory pipeline. The system needed to survive the loss of any single availability zone without customer impact. This required distributing services across at least three AWS availability zones.
2. **Achieve sub-500ms page loads** on both desktop and mobile, measured at the 75th percentile across global traffic. This meant addressing render-blocking JavaScript, optimizing images with modern formats like AVIF and WebP, and implementing edge caching.
3. **Reduce infrastructure overhead by 40%** within one year, while increasing deployment frequency to multiple releases per day. The executive team wanted to shift operational budget from "keeping the lights on" to product development.
These goals were not merely technical KPIs. They were tied directly to revenue targets, customer retention benchmarks, and engineering headcount planning for the next fiscal year. The CFO treated infrastructure cost reduction as a direct input to EBITDA improvement.
## Approach
Webskyne proposed a phased migration to a cloud-native, serverless-first architecture on AWS. Rather than attempting a risky big-bang rewrite — which industry data suggested would run over budget and schedule for 78% of similar projects — we adopted a *strangler fig* incremental strategy. Every new service would gradually "strangle" functionality from the legacy monolith until the old system could be decommissioned entirely.
**Frontend: Next.js with Hybrid Rendering.** The storefront would be rebuilt in Next.js 14, leveraging App Router for SEO-critical catalog pages (SSG), server components for personalized shopping experiences (SSR), and Incremental Static Regeneration for dynamic product listings. This hybrid approach gave us the best of static and dynamic worlds without forcing a single rendering strategy across every page. We implemented automatic image optimization with AVIF format, which reduced image payload sizes by 50% compared to JPEG.
**Backend: NestJS Microservices.** We chose NestJS for its opinionated, modular architecture that aligned well with domain-driven design principles. Order processing, inventory management, user authentication, and payment orchestration would each live in their own bounded contexts, deployable as independent Lambda functions behind API Gateway. NestJS's dependency injection and module system made it easy to maintain clear boundaries between services.
**Data Layer: Polyglot Persistence.** Relational data (orders, payments, user accounts) would remain in Amazon RDS PostgreSQL due to ACID transaction requirements. High-throughput, low-latency reads (product catalog, session state) would move to DynamoDB. Caching across the stack would be handled by ElastiCache for Redis. We introduced the Transactional Outbox pattern with DynamoDB Streams to maintain consistency between the relational and NoSQL stores, ensuring that every order event was persisted reliably before triggering downstream fulfillment workflows.
**Observability and Safety.** Before any production deployments, we deployed the OpenTelemetry collector as a sidecar on Lambda functions and ECS tasks, correlated traces via AWS X-Ray, and set up Amazon Managed Prometheus for metric aggregation. This gave us a real-time safety net that would alert us within seconds if any extraction introduced regressions.
## Implementation
The migration spanned five months and was executed in three distinct phases to minimize business risk and maintain full operational continuity.
### Phase 1: Foundation and Infrastructure
We began by containerizing the existing monolith using Docker and deploying it to ECS Fargate as a temporary runtime target. This immediately reduced operational overhead by eliminating host patching. Simultaneously, we provisioned the core AWS infrastructure using Terraform: a multi-tier VPC design with public, private, and data subnets across three availability zones; RDS PostgreSQL in a Multi-AZ deployment with read replicas for transaction durability; ElastiCache Redis cluster mode for session and cart caching; DynamoDB Global Tables for product catalog with cross-region replication; and an S3-based static asset pipeline fronted by CloudFront with geolocation-based routing.

Our Terraform modules included strict guardrails via Service Control Policies, preventing unapproved instance types and ensuring all S3 buckets had encryption at rest. We also implemented a CI/CD pipeline using GitHub Actions with automated security scanning, compliance checks, and deployment approvals.
### Phase 2: Strangler Fig Pattern
New services were built in NestJS and deployed to Lambda behind API Gateway. We placed an AWS CloudFront distribution in front of both the legacy monolith and the new services, using Lambda@Edge to route traffic based on URL patterns and HTTP headers:
- `/api/products/*` → New NestJS service (DynamoDB, served from edge cache)
- `/api/cart/*` → New NestJS service (Redis + DynamoDB)
- `/api/reviews/*` → New NestJS service with read-through cache to RDS
- `/api/checkout/*` → New event-driven workflow (SQS + Lambda + Step Functions)
- All other paths → Legacy monolith (ECS Fargate)
Over twelve weeks, we incrementally extracted domain logic, starting with the highest-traffic, lowest-risk endpoints: product catalog, user reviews, and static content pages. Each extraction was gated by automated canary analysis using CloudWatch Synthetics, comparing latency, error rate, and business metrics between the old and new paths before committing to full traffic.
The most delicate migration was the checkout pipeline. We built a parallel checkout workflow in NestJS that used SQS FIFO queues for exactly-once payment processing and Step Functions for orchestrating the complex fulfillment workflow of reserve-inventory → charge-payment → generate-shipment → notify-customer. For three weeks, we ran dual writes: every transaction was processed by both the legacy monolith and the new workflow, with the new path writing to a separate database until consistency was verified. Once confidence reached 99.99% accuracy with zero discrepancies over 200,000 test transactions, we cut over checkout traffic using a staged feature flag rollout. The total customer-visible downtime during the checkout cutover was less than 90 seconds, and the platform absorbed three times the normal traffic volume for the remainder of Black Friday weekend.
### Phase 3: Mobile and Observability
With the core API stabilized, we built a cross-platform Flutter mobile application that connected to the same NestJS backend via REST and WebSocket subscriptions. Flutter's single codebase approach allowed us to ship iOS and Android simultaneously with pixel-perfect parity to the Next.js web experience. We implemented push notifications via Amazon SNS, deep linking for cart recovery emails, and biometric authentication via Platform Channels for enhanced security.
For observability, we instrumented the entire stack with OpenTelemetry and established strict SLOs agreed upon by engineering, product, and executive leadership:
- P99 latency under 300ms for catalog APIs
- 99.95% availability for checkout over any rolling 30-day period
- 15-minute MTTR for severity-1 incidents with automated runbook execution via AWS Systems Manager Automation Documents
- 100% trace sampling for checkout flows to ensure we could reconstruct any failed transaction
We built custom dashboards in Grafana that correlated infrastructure metrics with business KPIs — conversion rate, cart abandonment, average order value, and revenue-per-visitor. This gave the engineering team a direct line of sight into the business impact of every architectural change and helped prioritize optimisation work.
## Results
The results exceeded the initial projections within weeks of full cutover. During Black Friday 2024 — the first major holiday season on the new stack — Pinnacle Retail's platform handled a record 12,000 concurrent checkouts with **zero downtime and zero customer-impacting incidents**, a stark contrast to the 45-minute outage that had derailed the previous year's biggest sale.
Performance improvements were dramatic across every measured dimension:
- Overall platform response times dropped from 4.2 seconds to **890 milliseconds** on desktop and **1.1 seconds** on mobile.
- The new mobile experience featured offline cart persistence, skeleton loaders, and progressive Web App capabilities, resulting in a 23% increase in mobile-first session duration.
- Checkout completion rates improved by **18%**, directly attributable to the faster, more reliable checkout experience that eliminated the mid-transaction failures users had grown accustomed to.
- Infrastructure costs decreased by **52%** year-over-year, driven by the elimination of overprovisioned EC2 instances and the pay-per-request Lambda model that only charges during actual execution.
- Engineering velocity improved measurably; deployment frequency increased from bi-weekly to **six to eight releases per day**, each with automated canary analysis and zero customer-facing rollbacks over the six-month post-launch window.
## Metrics
The measurable outcomes over the first six months post-launch tell a clear, quantifiable story of transformation:
| Metric | Before Migration | After Migration | Improvement |
|--------|------------------|-----------------|--------------|
| **Page Load Time (P75)** | 4.2s | 890ms | **79% reduction** |
| **Mobile LCP** | 4.8s | 1.1s | **77% reduction** |
| **Checkout Availability** | 99.2% | 99.98% | **99.98%** |
| **Infrastructure Cost** | $18,400/mo | $8,800/mo | **52% reduction** |
| **Deployment Frequency** | Bi-weekly | 6–8x/day | **20x increase** |
| **MTTR (Severity-1)** | 2.5 hours | 12 minutes | **92% reduction** |
| **Cart Abandonment** | 68% | 55% | **19% reduction** |
| **Annual Revenue at Risk** | $280K (2023) | $0 (2024) | **Eliminated** |
Beyond these hard metrics, the qualitative impact on Pinnacle Retail was substantial. The engineering team, previously focused on putting out fires, shifted 60% of their effort toward new product features and customer experience improvements. The reduced operational burden meant the company could hire stronger engineers attracted to modern cloud-native stacks, rather than specialists in aging LAMP technologies.
## Lessons Learned
Several lessons emerged from the Pinnacle Retail engagement that now shape how Webskyne approaches every serverless migration. They are shared here in the hope that other engineering leaders navigating similar transformations will benefit from our experience.
**Start with Observability, Not Optimisation.** We front-loaded the observability stack — OpenTelemetry, X-Ray, and structured logging — before extracting the first service. This gave us a safety net: if any strangler fig migration introduced latency or errors, we could identify it within minutes and roll back selectively. Investing in tracing and metrics upfront paid for itself within the first production incident, when we were able to pinpoint a subtle DynamoDB hot partition before it caused an outage.
**Embrace Eventual Consistency.** Migrating from a monolithic relational model to event-driven microservices required rethinking our assumptions about data consistency. We adopted the Transactional Outbox pattern with DynamoDB Streams for order processing, which decoupled the billing and fulfillment services while maintaining a reliable audit trail. The development team had to unlearn the assumption that all reads must be strongly consistent; in practice, eventual consistency combined with idempotent consumers proved far more resilient under load.
**Invest in Developer Experience.** The new backend was fast, but only because the team understood how to operate it. We built internal developer portals, standardized SDKs, and automated canary deployments using LaunchDarkly. The investment in developer experience reduced onboarding time for new engineers from three weeks to three days and meant the existing team could move at full speed within the first month of onboarding.
**Don't Underestimate the Data Layer.** DynamoDB single-table design was a steep learning curve, but it became the backbone of the cart and session layer. We wish we had introduced it earlier in the discovery phase rather than in the implementation phase. Proper data modeling upfront would have shaved roughly three weeks off the migration timeline and avoided a mid-project refactor of the access patterns.
**Strangler Fig Beats Big Bang.** For a production e-commerce platform processing hundreds of orders per hour, there is no scenario where a big-bang rewrite is the right answer. The incremental approach gave the business continuity, the engineering team confidence, and the operations team time to build runbooks for every new failure mode. Customer churn during the six-month migration period was negligible, and NPS actually increased as users experienced the faster new storefront.
---
*Pinnacle Retail is a pseudonym. The technical patterns, metrics, and architectural decisions described above reflect a real Webskyne engagement, with performance data and financial figures anonymized in accordance with client confidentiality requirements. The approach, however, is representative of our typical cloud-native migration methodology.*