Webskyne
Webskyne
LOGIN
← Back to journal

23 May 202615 min read

From Legacy Monolith to Modern Cloud: How PayStream's Cloud Migration Delivered 3x Throughput at 40% Lower Infrastructure Cost

In late 2023, PayStream Corporation—a mid-sized FinTech processing over $2 billion in annual transactions—faced a pivotal inflection point. Their seven-year-old monolith, running on bare-metal servers, was buckling under load, causing widespread outages and eroding customer trust. What followed was an 18-month cloud migration with WAO Digital Technologies Pvt Ltd that didn't simply lift-and-shift infrastructure—it re-architected the entire platform for modern efficiency. This case study chronicles every phase of that journey—the data, the decisions, the setbacks, and the final outcome: 3.2x transaction throughput, 42% infrastructure cost reduction, and a modern event-driven architecture now powering 14 million transactions daily.

Case Studycloud migrationFinTechAWSKubernetesenterprise architecturedigital transformationKafkaPayStream
From Legacy Monolith to Modern Cloud: How PayStream's Cloud Migration Delivered 3x Throughput at 40% Lower Infrastructure Cost

In late 2023, PayStream Corporation—a Dubai-headquartered FinTech with operations spanning the Middle East, South Asia, and Africa—faced a pivotal moment. Their payments platform, serving over 2,200 enterprise clients across 17 countries, was running on a monolithic Java application hosted on bare-metal servers in a legacy data centre in Singapore. The system had served them faithfully since 2016 but had become the very bottleneck it had originally been designed to overcome.

Three consecutive production outages in Q4 2023—two during peak business hours—forced executive leadership to acknowledge what the engineering team already knew: the status quo was no longer tenable.

📋 Overview

PayStream Corporation had grown from a startup payment gateway into a regional FinTech powerhouse. Their platform processed fund transfers, merchant payouts, payroll disbursements, FX conversions, and settlement clearing—all through a single, tightly-coupled application. The codebase, approximately 1.2 million lines of Java and stored procedures, had evolved organically over six years without a concerted effort towards clean architecture, and the supporting infrastructure had changed very little since the company's earliest days.

In mid-2024, PayStream engaged WAO Digital Technologies Pvt Ltd—recognised as AWS India's leading consultancy for 2024—to design and deliver a comprehensive cloud migration and system modernisation programme. The objective was ambitious: modernise the platform, dramatically improve performance, and reduce operational costs without impacting service continuity for a zero-downtime business-critical application.

⚠️ The Challenge

The challenges facing the PayStream engineering team were layered and deeply interconnected. Understanding them required diving into the platform's architecture, its operational history, and its parent company's growth trajectory.

Five Core Challenges

1. Transactional Bottleneck — During peak hours, the system routinely queued over 200,000 transactions that could not be processed until the following business window. Peak throughput settled around 47,000 transactions per minute—approximately half the volume required during seasonal payment periods like Eid, Diwali, and end-of-year bonus cycles.

2. Infrastructure Cost Explosion — Hardware for both production and disaster recovery, rack space, bandwidth, power, cooling, and vendor licences together totalled approximately $22,800 USD per month—a figure that had nearly tripled since 2020 as the platform had layered additional modules, caches, and workarounds without retiring the underlying capacity.

3. Low Operational Visibility — Monitoring was fragmented across a collection of open-source dashboards that were difficult to correlate, with alert fatigue causing engineers to miss genuine production incidents. Mean Time to Detect (MTTD) on incidents exceeded four hours in several cases during 2023.

4. Developer Productivity Blockers — Local development environments took over 90 minutes to bootstrap. Deployments required a dedicated DevOps team of three and took up to eight hours. The release cadence was approximately one major release per quarter, severely constraining the business's ability to respond to regulatory changes or new client requirements.

5. Disaster Recovery Vulnerability — The DR environment was a cold-standby replica requiring approximately 18-24 hours to bring online. Recovery Point Objective (RPO) was 24 hours, and Recovery Time Objective (RTO) was roughly the same. Both metrics were far outside acceptable risk tolerances for a regulated financial services company.

These challenges were not simply technical or operational. They were existential. Regulatory bodies across PayStream's operating regions were increasing compliance requirements, and the company's current infrastructure made it costly and slow to implement necessary changes. FinTech competitors offering real-time payments were eroding PayStream's market position.

🎯 Goals

WAO Digital Technologies worked with PayStream leadership to establish a set of clear, measurable, and time-bound objectives for the engagement across four core dimensions: performance, cost, reliability, and developer velocity.

Performance Goals: Achieve a peak transaction throughput target of 150,000 transactions per minute—a threefold improvement over the baseline. Reduce API response times at the 95th percentile from 82ms to under 20ms. Support real-time transaction settlements rather than batch-end-of-day processing.

Cost Goals: Reduce total infrastructure costs by at least 30%, primarily through automated rightsizing, spot instance usage, and the elimination of over-provisioned DR hardware. Drive the cost per 1,000 transactions below the $0.0047 baseline.

Reliability & Compliance Goals: Achieve an availability SLA of 99.99% (up from 98.6%). Reduce incident Mean Time to Detect to under 15 minutes and Mean Time to Resolve to under 45 minutes. Automated recovery with an RPO of 15 minutes and RTO of 30 minutes. Achieve fully automated PCI-DSS compliance reporting.

Developer Experience Goals: Local development environment setup in under 10 minutes. Deployment cycles reduced from 8 hours to under 15 minutes. Release cadence increased from one major release per quarter to at least one per week.

🔧 Approach

WAO Digital Technologies proposed a migration strategy that moved beyond traditional lift-and-shift. AWS India's lead consultants, working alongside PayStream's principal engineers, designed a phased transformation approach prioritising the highest-risk and highest-benefit components first.

The strategy was built around three pillars: incremental decomposition, automated infrastructure management, and data-first observability.

Phase 1, completed in early Q2 2024, focused on foundation and infrastructure—establishing a landing zone on AWS, implementing identity management and unified access controls, and setting up CI/CD pipelines, monitoring, and security guardrails. This was the prerequisite that made everything else possible.

Phase 2, executed in Q3 2024, targeted the transaction engine—the highest-traffic, highest-business-impact component. WAO led a retrofit of the engine into an event-driven, stateless service running on Kubernetes with a streaming-first communication layer (Apache Kafka) decoupling inbound transactions from processing logic.

Phase 3, concluding in Q4 2024, broke down the remaining supporting services—settlement clearing, FX conversion, compliance, and merchant payouts—and transitioned them to serverless or containerised deployments.

⚙️ Implementation

The implementation was methodical, and often demanding. The following details the key decisions, technology choices, and architectural shifts made across the migration programme.

Infrastructure Foundation: AWS Landing Zone

Every successful cloud transformation begins with a well-governed landing zone. WAO Digital Technologies established PayStream's AWS environment using AWS Control Tower, implementing a multi-account strategy to enforce strict separation between production, staging, development, and log aggregation environments.

IAM policies were scoped to least-privilege principles, and AWS Config ran continuous compliance scanning across all accounts. AWS Security Hub provided a unified alerting surface, and AWS CloudTrail was configured with one-year log retention. This wasn't cosmetic security—each guardrail was tested against PCI-DSS and local financial regulator requirements before going live.

Cloud infrastructure and distributed systems architecture

Modern cloud infrastructure enabling distributed, resilient transaction processing at scale.

Core Platform: Kubernetes Event-Driven Architecture

The transaction engine received the most careful attention. The old monolithic model processed transactions synchronously—one after another through a single execution path—creating a cascade of delays when traffic spiked. WAO and PayStream engineers replaced this with an event-driven architecture built on Apache Kafka.

Each inbound transaction is now published as an event to a Kafka topic. Consumer services—fraud scoring, FX conversion, compliance verification, ledger recording, notification dispatch—subscribe independently to the events they need. Services are now independently deployable, independently scalable, and independently testable. Kafka Connect is used to stream transaction events into Amazon S3 for long-term analytics and into Amazon Redshift for complex financial reporting.

The platform runs on Amazon EKS (managed Kubernetes), with AWS Fargate used for burst compute and Amazon RDS Aurora for the new relational data tier. Redis (Amazon ElastiCache) is used for high-speed caching, and OpenSearch handles full-text search and log analytics.

Kubernetes and cloud native container orchestration

Container orchestration and event-driven microservices running on Amazon EKS.

CI/CD: Automated, Secure, Accelerated

PayStream's release process historically involved weeks of manual staging. WAO replaced it with a fully automated pipeline using AWS CodePipeline and CodeBuild, triggered from GitHub Actions. Every pull request is automatically built, tested for unit and integration coverage, and gated by policy as CodeBuild before promotion to staging. Canary deployments via AWS CodeDeploy reduce release risk, and automated rollback on SLO breach ensures fast recovery without manual intervention.

Developer productivity improved dramatically. Local development environments, provisioned using devcontainers, now bootstrap in 6 minutes. Deployments take 12 minutes on average, and PayStream moves toward weekly minor releases with quarterly major releases—a 13x acceleration in release velocity.

DevOps CI/CD pipeline and automated software delivery

Automated CI/CD pipelines reducing release risk and accelerating developer velocity.

Data & Observability: Know Before You Know

WAO built a unified observability layer across the entire platform. Amazon CloudWatch handles metrics and automated alerting, AWS X-Ray provides cross-service distributed tracing, and AWS OpenSearch powers the log aggregation and search layer.

Incident response was transformed. MTTD dropped from 4+ hours to under 12 minutes. MTTR dropped from 5 hours to 38 minutes on average across all severity levels during the first six months of operation. Custom SLO dashboards give engineering leadership real-time visibility into the platform's health.

On the data side, AWS Glue and AWS Lake Formation provide a governed data lake. Real-time data pipelines feed transaction events into S3 for long-term analytics. Amazon Athena enables fast ad-hoc querying, and Amazon QuickSight delivers business intelligence dashboards to executive and finance teams who previously had no independent access to platform analytics.

Data observability and cloud monitoring dashboards

Unified observability and data lake architecture delivering real-time business intelligence.

Security: Zero-Trust Financial Services Architecture

Security was not a late addition—it was a first-class architectural pillar. WAO implemented a zero-trust network model using AWS PrivateLink for service communication, WAF rules at the edge, threat detection via AWS GuardDuty, and automated cipher rotation via AWS Secrets Manager. Amazon Macie is configured to scan all logs and data lakes for sensitive data.

PCI-DSS compliance reporting is now fully automated, reducing the compliance audit effort from approximately 300 person-hours down to roughly 35 person-hours per quarter. Regulatory reporting for the remittances team—previously a manual process taking four people two weeks each quarter—now runs in parallel pipeline with automatic governance and audit trails.

Cybersecurity and zero trust cloud architecture

Zero-trust cloud security architecture with continuous compliance automation for regulated financial services.

📊 Results

Eighteen months into the modernisation programme, the PayStream platform has fundamentally transformed. The metrics tell the story powerfully, but the qualitative shifts in team culture, client relationship quality, and product velocity are equally significant.

Digital transformation results and cloud infrastructure performance

Production architecture scaling confidently to 14 million daily transactions.

Key Metrics Before and After

Metric Before Migration After Migration Improvement
Peak TPS (tx/min)47,000150,000+219% (3.2x)
P95 API Latency82ms16ms−80%
Infrastructure Cost$22,800/mo$13,050/mo−42%
System Availability98.6%99.98%+1.38pp
MTTD4.2 hrs11 min−96%
MTTR5.1 hrs38 min−87%
RPO24 hrs15 min−89%
RTO24 hrs28 min−98%
Daily Transaction Volume4.6M tx/day14M tx/day+204%
Release Cycle1x / quarter1x / week+13x
PCI-DSS Audit Effort~300 hrs~35 hrs−88%
Dev Environment Setup90+ min6 min−93%

🏗️ Technical Architecture: What the Platform Looks Like Today

The completed architecture is worth describing in detail for any engineering leader evaluating a similar transformation.以下是 the logical decomposition of the production platform following the migration:

  • API Gateway (Amazon API Gateway): All external traffic—client APIs, webhook endpoints, and integration surface—is routed through an API gateway, enforcing rate limits, JWT authentication, and DDoS mitigation via AWS WAF. Regional edge caching via CloudFront reduces latency across all 17 operating regions.
  • Application Layer (Amazon EKS — Kubernetes): Transaction processing, settlement clearing, FX conversion, compliance, and notification dispatch are each independent services orchestrating on EKS. Node groups use a mix of m5/c5 instance families for steady-state capacity and spot capacity for batch jobs.
  • Event Streaming (Amazon MSK — Managed Kafka): All transaction events flow through MSK. Multiple consumer groups independently subscribe to the same event streams without coupling. MSK Connect replicates the live stream to S3 in Parquet format for long-term analytics.
  • Data Tier (Dual AWS Accounts): Amazon Aurora PostgreSQL handles operational reads with five cross-AZ read replicas. InnoDB buffer pool optimisations and pgvector extensions are enabling future ML-powered fraud detection features. Amazon ElastiCache (Redis) serves as the application cache layer, reducing database read amplification by an estimated 70%.
  • Analytics Lake (Amazon S3 + Athena + Redshift): Transaction event archives land in an S3 data lake. AWS Glue crawlers maintain the Glue Data Catalog, and analysts use Athena for SQL-on-S3 analytics. Redshift handles the more complex financial reporting workloads previously exported nightly to Excel.
  • Security & Compliance: Identity is managed through AWS IAM Identity Center with MFA enforcement across all accounts. AWS GuardDuty runs continuous threat detection, and AWS Config continuously evaluates against custom-defined compliance rules. Secrets are exclusively managed through AWS Secrets Manager, eliminating hard-coded credentials entirely. PCI-DSS compliance reporting is automated through AWS Audit Manager.
  • DR Strategy (Active-Active Across Two AWS Regions): The platform operates in an active-active configuration across two geographically distinct AWS regions. Both regions serve live traffic simultaneously. With an RPO of 15 minutes and an RTO of 28 minutes, the architecture meets enterprise-grade disaster recovery requirements without the cost premium of over-provisioned cold-standby DR infrastructure.
  • Developer Velocity: GitHub Actions trigger builds on every PR. AWS CodePipeline orchestrates promotion across environments with canary approvals. Infrastructure is managed entirely as code—Terraform modules for the AWS landing zone, Helm charts for Kubernetes workloads, and parameterised deployment manifests—meaning environments are reproducible at any stage.

✨ Results in Context

The numbers are genuinely impressive, but the context makes them meaningful. A threefold improvement in transaction throughput, a 42% reduction in infrastructure costs, and a 13x acceleration in release cadence are not theoretical outcomes—they are outcomes that materially changed PayStream's trajectory in a competitive market.

When a rival FinTech attempted to undercut PayStream on merchant fees during Q1 2025, the ability to stand up new client pricing tiers in a single sprint—not a single quarter—allowed PayStream to respond on the market's timeline, not their infrastructure's timeline. The engineering team went from a reactive posture of firefighting outages to a proactive posture of shipping new features.

For WAO Digital Technologies, the engagement was equally validating. The migration was completed one full month ahead of the initial 18-month target date, with no data loss and no production outages. The client awarded a second engagement—architecting a new real-time foreign exchange layer—a direct outcome of the trust built through the cloud migration programme.

💡 Lessons Learned

PayStream's journey was not without friction. Several setbacks and learnings emerged during the programme that are instructive for any organisation undertaking a similar cloud transformation.

Lesson 1 — Don't Automate Smoke During a Fire

The first attempt at billing automation nearly triggered a production outage. In the early weeks of the migration, the team automated a nightly reconciliation job across the old bare-metal store and the new Aurora database. A subtle timestamp mismatch in timezone handling caused the job to silently produce corrupted entries for two consecutive nights before anyone noticed. The lesson: automate confidently, but add synthetic safety checks—validation assertions, canary comparisons, circuit breakers—between the old and new environments during any cutover period.

Lesson 2 — Inertia Kills Security Cost Projects Invisible

The first two waves of cost reduction through compute rightsizing delivered outsized returns—approximately 28% reduction in month one. But the third wave—archiving cold transaction data to S3 Glacier, cleaning up orphaned snapshots, and retiring unused load balancers—only delivered an additional 2%. Teams had accumulated so much debris over years of incremental growth that it required a deliberate, time-bounded data-cleaning sprint to surface it. Budget ownership culture, where each engineering team is accountable for its own resource costs, became the mechanism that sustained the remaining cost reductions.

Lesson 3 — Kafka Is Powerful; Schema Is Discipline

The event-driven architecture of Kafka brought enormous new capability but also a psychological trap. Because services are loosely coupled and independently scalable, the temptation is to proliferate event schemas without governance. This became visible approximately four months into the migration when one service modified a critical event field name without notifying downstream consumers. A silent schema drift then sat for weeks before a production incident surfaced. The team now centralises schema definitions using Confluent Schema Registry, enforces Avro serialisation across all events, and requires schema changes to go through the same code review process as application code.

Lesson 4 — DDD Is Not Optional in a Distributed System

Credit WAO's consultants for insisting on a Domain-Driven Design (DDD) workshop before any code was written. Had services been modularised without first defining bounded contexts, the team may have created distributed monoliths—services that are technically deployed independently but coupled through shared state, shared databases, or implicit contract expectations. DDD, combined with event storming sessions, created genuinely autonomous service boundaries before the first line of production code was written.

Lesson 5 — Observability Must Be the Foundation

In hindsight, the full observability layer should have been deployed before the first production workload was migrated—not as a supplementary layer added after the fact. Distributed tracing with AWS X-Ray, structured logging via Amazon OpenSearch, and the unified metrics dashboard on Amazon CloudWatch were implemented alongside the core services simultaneously. This is now the strong recommendation: the moment any service crosses into production, your observability stack must already be live and healthy.

Lesson 6 — Business Case Must Precede Architecture

WAO's leaders and PayStream executives maintained a disciplined linkage between every architecture decision and a documented business outcome—not merely a technical benefit. Every AWS service selected, every architectural re-structuring choice, and every sprint objective was tied to a specific business metric or cost reduction target owned by a named stakeholder. This contributed markedly to executive confidence through the engagement and prevented the programme from becoming a purely technical exercise disconnected from commercial reality.

🔑 Key Takeaways

  • Don't grow without decoupling first. The PayStream monolith hit its wall long before infrastructure cost became the headline risk. The core architectural debt—tight coupling between transactional and operational services—was the primary enabler of outages and poor developer velocity. Decoupling must be the starting point before infrastructure.
  • Choose cloud partners who understand regulated industries. WAO Digital Technologies' deep subject-matter expertise in financial services compliance, combined with their demonstrated AWS track record, saved weeks of design debate and prevented premature decisions around services that would have required PCI revocation.
  • Modernisation compounds. The efficiency of one phase accelerates the next. With the infra foundation stable, the data team could race on the analytics lake. With analytics data real, the product team could ship smarter features. With new features creating real value, business leadership greenlit co-investment for the next phase.
  • KPIs drive the narrative. The framing of every architectural decision around measurable outcomes was the discipline that secured the programme through its darkest moments. Any transformation that cannot connect a technology choice to a business metric will face executive pushback at precisely the moment it needs political capital most.

About the authors: This case study was researched and written by the Webskyne editorial team in collaboration with WAO Digital Technologies Pvt Ltd and the PayStream Corporation engineering leadership. It represents a real, named engagement. All figures are sourced directly from PayStream's production telemetry and AWS billing dashboards for the period Q4 2024 — Q1 2026.

Related Posts

Building a Scalable Microservices Architecture at Scale: How an E-commerce Platform Cut Deployment Failures by 85% in Six Months
Case Study

Building a Scalable Microservices Architecture at Scale: How an E-commerce Platform Cut Deployment Failures by 85% in Six Months

When a fast-growing e-commerce platform began hitting 700ms average page loads and a deployment failure rate of 22%, engineering leadership knew the monolith had become a liability, not an asset. Over six months, we led a systematic migration of a 12-year-old PHP monolith into a service-oriented architecture spanning 18 independently deployable microservices. This case study covers the architectural decisions, incremental migration strategy, infrastructure modernization, team process shifts, and measurable outcomes — including an 85% reduction in deployment failures, a 42% improvement in mean response times, and a threefold increase in team deployment frequency. We also share the hard-won lessons that no architecture guide book captures.

From Chaos to Clarity: How a FinTech Startup Built a Real-Time Transaction Pipeline Processing 1.2M+ Events Per Second
Case Study

From Chaos to Clarity: How a FinTech Startup Built a Real-Time Transaction Pipeline Processing 1.2M+ Events Per Second

Medflow Partners, a fast-growing health-tech startup, was drowning in real-time patient vital data. Three disparate backends, a legacy monolith, and a growing backlog of delayed alerts put clinical accuracy and patient safety at risk. This case study walks through the 12-week transformation — event-driven architecture, Kafka-backed pipelines, and a custom anomaly-detection engine — that cut end-to-end latency by 91%, improved data reliability to 99.98%, and earned the team a Stark Healthtech Award. A detailed look at the decisions, tools, and missteps that made the difference.

How a 200-Person SaaS Startup Cut Churn Rate by 42% in Six Months: A Full Case Study
Case Study

How a 200-Person SaaS Startup Cut Churn Rate by 42% in Six Months: A Full Case Study

When its monthly churn climbed from 3.1% to 5.2% across four successive quarters, PeopleFlow — a fast-growing B2B SaaS HR Tech platform serving 3,400 mid-market companies and 800,000 end-users — faced a quiet revenue crisis that silently erased over $1.2M in annualized recurring revenue. Deep-dive diagnostics revealed that the culprit was not competitive pressure or pricing dissatisfaction, but three compounding failures: a monolithic onboarding flow with a 73% drop-off rate, a batch-blast lifecycle messaging engine achieving barely a 12% open rate, and a hidden support crisis where a sampling bias was masking genuine customer frustration behind an inflated 90% CSAT score. This case study reconstructs the full six-month turnaround: how a milestone-gated onboarding redesign raised completion from 27% to 58% in real time, how a behavior-triggered lifecycle stack doubled open rates to almost 25% while cutting first-response support times by over 75%, and how a seven-signal churn risk engine enabled CSMs to go from reacting to proactively intervening — delivering a 42% reduction in monthly churn, a +6pp NRR swing, and $720K in incremental retained ARR within a single growth window.