How Meridian Retail Replaced a Fragile Monolith with a Cloud-Native Microservices Platform

Meridian Retail spent 18 months migrating from a 350,000-line PHP monolith to an event-driven microservices architecture on AWS — led by Webskyne. Platform uptime jumped from 99.4% to 99.95%, deployment cycles fell from 4–6 weeks to under one week, and infrastructure costs dropped 42%. Here's the full story: the challenge, the architecture, the implementation phases, the results, and the hard-won lessons every engineering leader should read.

Overview

Meridian Retail, a mid-sized omnichannel fashion retailer operating across the UK and EU, managed an aging monolithic e-commerce platform built on PHP 7 and a legacy jQuery frontend. As the business scaled — spanning brick-and-mortar stores, a B2B wholesale channel, and a direct-to-consumer web store — the monolith became an increasingly expensive and risky source of technical debt. Over an 18-month engagement, Webskyne led a complete re-platforming to a modern, event-driven microservices architecture on AWS, enabling Meridian to scale seamlessly, improve platform reliability, and accelerate their feature velocity.

The Challenge

Meridian's monolithic platform was a single 350,000-line PHP codebase handling every business concern: product catalogues, search, cart, checkout, inventory management, loyalty, order processing, and reporting. The team of 12 engineers was spending over 60% of their Sprint capacity on maintenance — patching security vulnerabilities, resolving deployment conflicts, and manually debugging cascading failures. A single inopportune deployment to the checkout service could take down the entire website, a risk amplified by major campaigns and seasonal peaks like Black Friday, where downtime could cost upwards of £200,000 per hour.

The monolithic design also meant that scaling was all-or-nothing: the entire application had to be scaled together, resulting in 3–5× over-provisioning of compute during low-traffic periods and costly auto-scaling misses at peak. The technical debt accrued over five years had become self-reinforcing — the team was afraid to refactor because the test suite was incomplete and rollbacks were risky, which meant bugs accumulated, which made development slower.

Adding to the problem, Meridian's search engine ran during business hours and was so resource-intensive that it regularly coincided with checkout bottlenecks. Marketing campaigns demanded rapid feature launches, but each release cycle took 4–6 weeks, making Meridian reactive rather than proactive in the highly competitive fashion e-commerce space.

Goals

The engagement was framed around four primary outcomes:

Decouple the monolith into independently deployable microservices, starting with high-bloat, high-change areas: product search, checkout, and inventory.
Achieve 99.9% uptime during peak traffic — a reduction from the historically volatile 99.4% platform availability achieved on the monolith.
Reduce full-cycle deployment time from 4–6 weeks to under 1 week for most feature branches.
Cut infrastructure costs by at least 30% through right-sized autoscaling and leapfrog any further monolith over-provisioning.

Our Approach

Webskyne adopted a Strangler Fig pattern rather than a big-bang rewrite. Rather than shutting the monolith off overnight, we incrementally routed increasing traffic to new services while keeping the monolith running as a stable backbone. This approach minimised business risk while allowing the team to continuously ship value.

Our technical architecture was designed as follows:

Event-Driven Backbone: An Apache Kafka cluster on Amazon MSK formed the central nervous system of the new architecture. All domain events — OrderPlaced, StockReserved, ProductUpdated — were published as immutable events, enabling asynchronous, loosely-coupled inter-service communication.
API Gateway: Amazon API Gateway with a Lambda authoriser sat at the edge, routing incoming requests to the appropriate service or the existing monolith, depending on the migration phase.
Service Mesh: Istio on Amazon EKS managed inter-service traffic, provided automatic retries, circuit breaking, and distributed tracing via OpenTelemetry — all without touching application code.
CI/CD Pipelines: Each service was containerised with Docker and deployed via Argo CD following GitOps principles, with feature flags managed by LaunchDarkly for canary releases.

Implementation

Phase 1 (Months 1–3) focused on foundational infrastructure and the first extraction: the product search service. The existing search logic relied on a raw MySQL-joined-table approach with no faceting or relevance scoring. We introduced OpenSearch (AWS-managed Elasticsearch) and built a Node.js-based search API, implementing full-text search, attribute faceting, personalised ranking via a lightweight XGBoost model, and an asynchronous indexer that read Kafka product events. The search service achieved sub-150ms p99 latency during initial load testing with 10,000 concurrent users.

Phase 2 (Months 4–9) tackled the checkout and cart services, which were the most critical business capabilities. The checkout domain was decomposed into two services — cart-service (Session-based, Redis-backed cart state) and checkout-service (Orchestrating payment, shipping, and order finalisation). The biggest challenge was maintaining data consistency between the legacy monolith database and the new services during the dual-write transition period. We solved this using an outbox pattern: every state change emitted both a database event and a Kafka record, with idempotent consumers ensuring exactly-once delivery. Payment gateway integration (Stripe and Adyen) was handled via a saga-orchestration layer that could compensate on failure, eliminating the need for distributed two-phase commits.

Phase 3 (Months 10–15) delivered the inventory and order orchestration services. We replaced the old polling-based inventory reservation model with a Kafka Streams-based real-time deduplication engine, reducing inventory double-booking incidents to zero within 30 days of go-live. Order state management was moved to a durable event store (EventStoreDB), giving Meridian's support team an auditable timeline of every customer order through its entire lifecycle.

Throughout the project, we ran chaos engineering experiments using Gremlin on staging environments, validating that services could withstand the loss of individual nodes and message brokers without data loss. Load testing with Locust confirmed that the new architecture could sustain a 20× traffic spike during a simulated Black Friday event while keeping p99 latency under 300ms.

Results

Meridian's new microservices architecture went fully live 18 months after project kickoff, replacing the last monolith routes with service equivalents on Black Friday of the following year. The business impact was immediate and significant:

Platform uptime improved from 99.4% to 99.95% — equating to less than 4 hours and 22 minutes of planned and unplanned downtime per year, compared to over 51 hours under the old platform.
Feature deployment velocity increased 6×, from an average of 4–6 weeks per release to under 1 week, thanks to independent service deployability and feature flag controls.
Infrastructure costs reduced by 42%, primarily from a shift from static EC2 provisioning to Kubernetes autoscaling and spot instance usage for stateless background workers, saving an estimated £127,000 per year in AWS spend.
Checkout conversion rate increased 3.8%, attributed to reduced latency and the elimination of the previous indexing lock contention during peak hours.
Search abandonment rate fell by 28%, facilitated by faster, more relevant search results with facet filtering and real-time index updates.

Key Metrics

Metric	Before Migration	After Migration	Change
Platform uptime	99.4%	99.95%	+0.55 pp
Mean deployment cycle	4–6 weeks	<1 week	6× faster
Infrastructure cost	~£302K/yr	~£175K/yr	-42%
Search p99 latency	820ms	148ms	-82%
Checkout p99 latency	1,240ms	285ms	-77%
Conversion rate	2.1%	2.18%	+3.8%
Inventory booking errors	~12/week	~0/week	-100%

Architecture Diagram: High-Level Service Mesh

The new architecture can be summarised at a high level as follows:

Edge Layer: CloudFront → API Gateway → Lambda Authoriser → Service Mesh (Istio/Ingress)
Services Layer: Product Search (Node.js + OpenSearch), Cart (Redis), Checkout (Node.js + Stripe/Sagepay), Inventory (Go + Kafka Streams), Order Store (EventStoreDB), Notification Service (Python)
Data Layer: Kafka MSK (event backbone), RDS Aurora (monolith read replica with persistence), OpenSearch (search index), Redis Cluster (cart session store), S3 + CloudFront (media assets)
Observability: Grafana + Prometheus + Jaeger/OpenTelemetry + CloudWatch

Lessons Learned

1. The Strangler Fig pattern beats big-bang rewrites every time.

Big-bang rewrites almost inevitably overrun schedules and budgets, and many never ship. By strangling the monolith progressively and shipping incremental value every Sprint, Meridian's leadership team could see ROI at every phase — not just at the finish line.

2. Data consistency between legacy and new systems is the hardest problem — solve it early.

The outbox event pattern was our most impactful architectural decision. It eliminated the dual-write race condition without introducing distributed transactions, and gave Meridian a complete, timestamped audit trail of every change.

3. Observability cannot be a Phase-4 afterthought.

We instrumented services from day one (OpenTelemetry + distributed tracing + structured logging), which meant we were never flying blind. This reduced mean-time-to-resolution (MTTR) for production incidents to under 15 minutes, compared to 3.5 hours on the old platform.

4. Feature flags unlock true progressive delivery.

LaunchDarkly flags on every service meant we could dark-launch features to internal users before any customer traffic, giving us rapid A/B testing velocity. The marketing team ran 12 feature experiments in the first quarter post-launch alone.

5. Test from chaos, not just from the happy path.

Chaos engineering sessions with Gremlin on staging revealed a subtle Kafka consumer rebalancing issue that would have caused a ~12-minute search degradation event under an unlikely but non-zero failure scenario. Catching it pre-launch avoided what would have been a Black Friday headline event.

Conclusion

Meridian Retail's migration from a fragile monolith to a resilient, event-driven cloud-native platform is an example of how deliberate architecture, discipline, and incremental delivery can transform a business's technical trajectory. The team is now working on autonomous ML-powered search ranking and a real-time personalisation engine — capabilities that were architecturally impossible on the old monolith. For any mid-market retailer facing a similar monolith burden, the Meridian playbook provides a repeatable, low-risk path to cloud-native modernity.