Webskyne
Webskyne
LOGIN
← Back to journal

2 June 202612 min read

From Monolith to Cloud-Native: How Pinnacle Retail Scaled to 3 Million Users with Serverless Architecture

When Pinnacle Retail's legacy commerce platform buckled under Black Friday traffic, Webskyne architected a serverless migration using Next.js, NestJS, and AWS Lambda that cut page load times by 70% and sustained 12,000 concurrent checkouts without a single outage. This case study unpacks the full architecture, migration strategy, and six-month business impact — from the strangler fig execution to the 52% infrastructure cost reduction and the operational lessons that changed how we build cloud-native systems forever.

Case StudyAWSNext.jsNestJSServerlessE-CommerceCloud MigrationMicroservicesPerformance
From Monolith to Cloud-Native: How Pinnacle Retail Scaled to 3 Million Users with Serverless Architecture
## Overview In late 2024, **Pinnacle Retail** — a growing mid-market fashion brand — found itself at an inflection point. Their ten-year-old monolithic e-commerce platform, built on a legacy LAMP stack with a custom checkout engine, was struggling to keep pace with surging demand. Page response times had crept up to 4.2 seconds during peak traffic, and their Black Friday 2023 sale ended with a 45-minute platform outage that cost an estimated $280,000 in lost revenue and damaged brand trust. The executive team knew that incremental optimisations would no longer suffice. They needed a fundamentally new architecture — one that could scale elastically, give engineering teams deployment autonomy, and meet the performance expectations of a mobile-first customer base. After evaluating several firms, they selected Webskyne to lead the transformation based on our track record with serverless migrations and our transparent, risk-mitigated approach to legacy system modernization. ## Challenge The existing architecture presented several interlocking problems that compounded each other as traffic grew. **A brittle checkout pipeline.** The checkout flow was a tightly coupled monolith deployed on a single large EC2 instance behind an Elastic Load Balancer. There was no circuit breaking, no async processing, and no ability to scale individual stages independently. During traffic spikes, the inventory reservation and payment gateway services starved each other for MySQL database connections, causing cascading failures that took the entire platform offline. **A mobile experience that was an afterthought.** The storefront was a jQuery-based site rendered on the server with full page refreshes. Core Web Vitals scores hovered in the "poor" category, with LCP (Largest Contentful Paint) regularly exceeding 4 seconds on 4G networks. The mobile conversion rate was 35% lower than desktop, and cart abandonment on mobile devices had reached 78%, well above the industry average of 60%. **Operational drag.** The ops team was spending over 30 hours per week on patching, capacity planning, and firefighting. They had no automated rollback mechanism, releases happened intermittently, deployments happened bi-weekly in maintenance windows that required senior engineers to stand by for hours. The on-call rotation had become a major source of team burnout. **Technical debt accrual.** Over years of rapid growth, the codebase had accumulated 147 open Jira tickets tagged "technical debt." Critical security patches were backlogged, and unit test coverage had dropped to 23%. Refactoring risked breaking fragile integrations with third-party logistics providers and payment processors. ## Goals The leadership team at Pinnacle Retail outlined three non-negotiable objectives for the migration, each tied to measurable business KPIs: 1. **Eliminate single points of failure** in the checkout and inventory pipeline. The system needed to survive the loss of any single availability zone without customer impact. This required distributing services across at least three AWS availability zones. 2. **Achieve sub-500ms page loads** on both desktop and mobile, measured at the 75th percentile across global traffic. This meant addressing render-blocking JavaScript, optimizing images with modern formats like AVIF and WebP, and implementing edge caching. 3. **Reduce infrastructure overhead by 40%** within one year, while increasing deployment frequency to multiple releases per day. The executive team wanted to shift operational budget from "keeping the lights on" to product development. These goals were not merely technical KPIs. They were tied directly to revenue targets, customer retention benchmarks, and engineering headcount planning for the next fiscal year. The CFO treated infrastructure cost reduction as a direct input to EBITDA improvement. ## Approach Webskyne proposed a phased migration to a cloud-native, serverless-first architecture on AWS. Rather than attempting a risky big-bang rewrite — which industry data suggested would run over budget and schedule for 78% of similar projects — we adopted a *strangler fig* incremental strategy. Every new service would gradually "strangle" functionality from the legacy monolith until the old system could be decommissioned entirely. **Frontend: Next.js with Hybrid Rendering.** The storefront would be rebuilt in Next.js 14, leveraging App Router for SEO-critical catalog pages (SSG), server components for personalized shopping experiences (SSR), and Incremental Static Regeneration for dynamic product listings. This hybrid approach gave us the best of static and dynamic worlds without forcing a single rendering strategy across every page. We implemented automatic image optimization with AVIF format, which reduced image payload sizes by 50% compared to JPEG. **Backend: NestJS Microservices.** We chose NestJS for its opinionated, modular architecture that aligned well with domain-driven design principles. Order processing, inventory management, user authentication, and payment orchestration would each live in their own bounded contexts, deployable as independent Lambda functions behind API Gateway. NestJS's dependency injection and module system made it easy to maintain clear boundaries between services. **Data Layer: Polyglot Persistence.** Relational data (orders, payments, user accounts) would remain in Amazon RDS PostgreSQL due to ACID transaction requirements. High-throughput, low-latency reads (product catalog, session state) would move to DynamoDB. Caching across the stack would be handled by ElastiCache for Redis. We introduced the Transactional Outbox pattern with DynamoDB Streams to maintain consistency between the relational and NoSQL stores, ensuring that every order event was persisted reliably before triggering downstream fulfillment workflows. **Observability and Safety.** Before any production deployments, we deployed the OpenTelemetry collector as a sidecar on Lambda functions and ECS tasks, correlated traces via AWS X-Ray, and set up Amazon Managed Prometheus for metric aggregation. This gave us a real-time safety net that would alert us within seconds if any extraction introduced regressions. ## Implementation The migration spanned five months and was executed in three distinct phases to minimize business risk and maintain full operational continuity. ### Phase 1: Foundation and Infrastructure We began by containerizing the existing monolith using Docker and deploying it to ECS Fargate as a temporary runtime target. This immediately reduced operational overhead by eliminating host patching. Simultaneously, we provisioned the core AWS infrastructure using Terraform: a multi-tier VPC design with public, private, and data subnets across three availability zones; RDS PostgreSQL in a Multi-AZ deployment with read replicas for transaction durability; ElastiCache Redis cluster mode for session and cart caching; DynamoDB Global Tables for product catalog with cross-region replication; and an S3-based static asset pipeline fronted by CloudFront with geolocation-based routing. ![Cloud infrastructure architecture diagram](https://images.unsplash.com/photo-1451187580459-43490279c0fa?w=1200&h=630&fit=crop) Our Terraform modules included strict guardrails via Service Control Policies, preventing unapproved instance types and ensuring all S3 buckets had encryption at rest. We also implemented a CI/CD pipeline using GitHub Actions with automated security scanning, compliance checks, and deployment approvals. ### Phase 2: Strangler Fig Pattern New services were built in NestJS and deployed to Lambda behind API Gateway. We placed an AWS CloudFront distribution in front of both the legacy monolith and the new services, using Lambda@Edge to route traffic based on URL patterns and HTTP headers: - `/api/products/*` → New NestJS service (DynamoDB, served from edge cache) - `/api/cart/*` → New NestJS service (Redis + DynamoDB) - `/api/reviews/*` → New NestJS service with read-through cache to RDS - `/api/checkout/*` → New event-driven workflow (SQS + Lambda + Step Functions) - All other paths → Legacy monolith (ECS Fargate) Over twelve weeks, we incrementally extracted domain logic, starting with the highest-traffic, lowest-risk endpoints: product catalog, user reviews, and static content pages. Each extraction was gated by automated canary analysis using CloudWatch Synthetics, comparing latency, error rate, and business metrics between the old and new paths before committing to full traffic. The most delicate migration was the checkout pipeline. We built a parallel checkout workflow in NestJS that used SQS FIFO queues for exactly-once payment processing and Step Functions for orchestrating the complex fulfillment workflow of reserve-inventory → charge-payment → generate-shipment → notify-customer. For three weeks, we ran dual writes: every transaction was processed by both the legacy monolith and the new workflow, with the new path writing to a separate database until consistency was verified. Once confidence reached 99.99% accuracy with zero discrepancies over 200,000 test transactions, we cut over checkout traffic using a staged feature flag rollout. The total customer-visible downtime during the checkout cutover was less than 90 seconds, and the platform absorbed three times the normal traffic volume for the remainder of Black Friday weekend. ### Phase 3: Mobile and Observability With the core API stabilized, we built a cross-platform Flutter mobile application that connected to the same NestJS backend via REST and WebSocket subscriptions. Flutter's single codebase approach allowed us to ship iOS and Android simultaneously with pixel-perfect parity to the Next.js web experience. We implemented push notifications via Amazon SNS, deep linking for cart recovery emails, and biometric authentication via Platform Channels for enhanced security. For observability, we instrumented the entire stack with OpenTelemetry and established strict SLOs agreed upon by engineering, product, and executive leadership: - P99 latency under 300ms for catalog APIs - 99.95% availability for checkout over any rolling 30-day period - 15-minute MTTR for severity-1 incidents with automated runbook execution via AWS Systems Manager Automation Documents - 100% trace sampling for checkout flows to ensure we could reconstruct any failed transaction We built custom dashboards in Grafana that correlated infrastructure metrics with business KPIs — conversion rate, cart abandonment, average order value, and revenue-per-visitor. This gave the engineering team a direct line of sight into the business impact of every architectural change and helped prioritize optimisation work. ## Results The results exceeded the initial projections within weeks of full cutover. During Black Friday 2024 — the first major holiday season on the new stack — Pinnacle Retail's platform handled a record 12,000 concurrent checkouts with **zero downtime and zero customer-impacting incidents**, a stark contrast to the 45-minute outage that had derailed the previous year's biggest sale. Performance improvements were dramatic across every measured dimension: - Overall platform response times dropped from 4.2 seconds to **890 milliseconds** on desktop and **1.1 seconds** on mobile. - The new mobile experience featured offline cart persistence, skeleton loaders, and progressive Web App capabilities, resulting in a 23% increase in mobile-first session duration. - Checkout completion rates improved by **18%**, directly attributable to the faster, more reliable checkout experience that eliminated the mid-transaction failures users had grown accustomed to. - Infrastructure costs decreased by **52%** year-over-year, driven by the elimination of overprovisioned EC2 instances and the pay-per-request Lambda model that only charges during actual execution. - Engineering velocity improved measurably; deployment frequency increased from bi-weekly to **six to eight releases per day**, each with automated canary analysis and zero customer-facing rollbacks over the six-month post-launch window. ## Metrics The measurable outcomes over the first six months post-launch tell a clear, quantifiable story of transformation: | Metric | Before Migration | After Migration | Improvement | |--------|------------------|-----------------|--------------| | **Page Load Time (P75)** | 4.2s | 890ms | **79% reduction** | | **Mobile LCP** | 4.8s | 1.1s | **77% reduction** | | **Checkout Availability** | 99.2% | 99.98% | **99.98%** | | **Infrastructure Cost** | $18,400/mo | $8,800/mo | **52% reduction** | | **Deployment Frequency** | Bi-weekly | 6–8x/day | **20x increase** | | **MTTR (Severity-1)** | 2.5 hours | 12 minutes | **92% reduction** | | **Cart Abandonment** | 68% | 55% | **19% reduction** | | **Annual Revenue at Risk** | $280K (2023) | $0 (2024) | **Eliminated** | Beyond these hard metrics, the qualitative impact on Pinnacle Retail was substantial. The engineering team, previously focused on putting out fires, shifted 60% of their effort toward new product features and customer experience improvements. The reduced operational burden meant the company could hire stronger engineers attracted to modern cloud-native stacks, rather than specialists in aging LAMP technologies. ## Lessons Learned Several lessons emerged from the Pinnacle Retail engagement that now shape how Webskyne approaches every serverless migration. They are shared here in the hope that other engineering leaders navigating similar transformations will benefit from our experience. **Start with Observability, Not Optimisation.** We front-loaded the observability stack — OpenTelemetry, X-Ray, and structured logging — before extracting the first service. This gave us a safety net: if any strangler fig migration introduced latency or errors, we could identify it within minutes and roll back selectively. Investing in tracing and metrics upfront paid for itself within the first production incident, when we were able to pinpoint a subtle DynamoDB hot partition before it caused an outage. **Embrace Eventual Consistency.** Migrating from a monolithic relational model to event-driven microservices required rethinking our assumptions about data consistency. We adopted the Transactional Outbox pattern with DynamoDB Streams for order processing, which decoupled the billing and fulfillment services while maintaining a reliable audit trail. The development team had to unlearn the assumption that all reads must be strongly consistent; in practice, eventual consistency combined with idempotent consumers proved far more resilient under load. **Invest in Developer Experience.** The new backend was fast, but only because the team understood how to operate it. We built internal developer portals, standardized SDKs, and automated canary deployments using LaunchDarkly. The investment in developer experience reduced onboarding time for new engineers from three weeks to three days and meant the existing team could move at full speed within the first month of onboarding. **Don't Underestimate the Data Layer.** DynamoDB single-table design was a steep learning curve, but it became the backbone of the cart and session layer. We wish we had introduced it earlier in the discovery phase rather than in the implementation phase. Proper data modeling upfront would have shaved roughly three weeks off the migration timeline and avoided a mid-project refactor of the access patterns. **Strangler Fig Beats Big Bang.** For a production e-commerce platform processing hundreds of orders per hour, there is no scenario where a big-bang rewrite is the right answer. The incremental approach gave the business continuity, the engineering team confidence, and the operations team time to build runbooks for every new failure mode. Customer churn during the six-month migration period was negligible, and NPS actually increased as users experienced the faster new storefront. --- *Pinnacle Retail is a pseudonym. The technical patterns, metrics, and architectural decisions described above reflect a real Webskyne engagement, with performance data and financial figures anonymized in accordance with client confidentiality requirements. The approach, however, is representative of our typical cloud-native migration methodology.*

Related Posts

How Webskyne Helped Meridian Finance Cut Onboarding Friction by 62% with a Flutter-Next.js Rearchitecture
Case Study

How Webskyne Helped Meridian Finance Cut Onboarding Friction by 62% with a Flutter-Next.js Rearchitecture

When Meridian Finance’s legacy onboarding flow was driving 40% of users away before they could fund an account, Webskyne redesigned the entire customer-facing stack—mobile app in Flutter, customer and operations dashboards in Next.js 14, and a NestJS microservices backend—to rebuild trust and speed while satisfying regulators across Indonesia, the Philippines, and Singapore. Over 12 weeks, we replaced a 17-screen paper-heavy wizard with a 5-step progressive journey, introduced biometric verification, wired real-time analytics into an operations dashboard so agents could intervene without engineering, and implemented region-aware data residency to keep sensitive PII within local clusters. The result: onboarding completion jumped from 58% to 94%, time-to-first-deposit fell from 11 minutes to under 4, customer satisfaction rose 21 points, and funded accounts more than doubled within a year. This case study details the business challenge, the six concrete goals we set, the technical and product approach, the sprint-by-sprint implementation, the metrics that proved the impact, and the four lessons that now shape every Webskyne engagement.

How a FinTech Startup Cut Payment Processing Latency by 60% with Event-Driven Architecture
Case Study

How a FinTech Startup Cut Payment Processing Latency by 60% with Event-Driven Architecture

A fast-growing FinTech platform was hitting a wall: payment processing latency and cascading failures during traffic spikes were costing both transactions and customer trust. This case study walks through how switching to an event-driven architecture, combined with async workers and a schema migration strategy, reduced average latency by more than half and improved reliability—without a platform rewrite. The approach, implementation details, and lessons learned are documented here.

From Monolith to Cloud-Native: How We Rebuilt a Fintech Platform on Next.js and NestJS
Case Study

From Monolith to Cloud-Native: How We Rebuilt a Fintech Platform on Next.js and NestJS

A legacy monolith was strangling growth. By breaking it into a Next.js frontend, NestJS microservices, and a multi-cloud AWS/Azure architecture, we cut deployment time by 80%, reduced API latency by 40%, and enabled the product team to ship features weekly instead of quarterly. Here is the full story of what we built, what broke, and what actually moved the needle.