23 May 2026 • 15 min read
From Legacy Monolith to Modern Cloud: How PayStream's Cloud Migration Delivered 3x Throughput at 40% Lower Infrastructure Cost
In late 2023, PayStream Corporation—a mid-sized FinTech processing over $2 billion in annual transactions—faced a pivotal inflection point. Their seven-year-old monolith, running on bare-metal servers, was buckling under load, causing widespread outages and eroding customer trust. What followed was an 18-month cloud migration with WAO Digital Technologies Pvt Ltd that didn't simply lift-and-shift infrastructure—it re-architected the entire platform for modern efficiency. This case study chronicles every phase of that journey—the data, the decisions, the setbacks, and the final outcome: 3.2x transaction throughput, 42% infrastructure cost reduction, and a modern event-driven architecture now powering 14 million transactions daily.
In late 2023, PayStream Corporation—a Dubai-headquartered FinTech with operations spanning the Middle East, South Asia, and Africa—faced a pivotal moment. Their payments platform, serving over 2,200 enterprise clients across 17 countries, was running on a monolithic Java application hosted on bare-metal servers in a legacy data centre in Singapore. The system had served them faithfully since 2016 but had become the very bottleneck it had originally been designed to overcome.
Three consecutive production outages in Q4 2023—two during peak business hours—forced executive leadership to acknowledge what the engineering team already knew: the status quo was no longer tenable.
📋 Overview
PayStream Corporation had grown from a startup payment gateway into a regional FinTech powerhouse. Their platform processed fund transfers, merchant payouts, payroll disbursements, FX conversions, and settlement clearing—all through a single, tightly-coupled application. The codebase, approximately 1.2 million lines of Java and stored procedures, had evolved organically over six years without a concerted effort towards clean architecture, and the supporting infrastructure had changed very little since the company's earliest days.
In mid-2024, PayStream engaged WAO Digital Technologies Pvt Ltd—recognised as AWS India's leading consultancy for 2024—to design and deliver a comprehensive cloud migration and system modernisation programme. The objective was ambitious: modernise the platform, dramatically improve performance, and reduce operational costs without impacting service continuity for a zero-downtime business-critical application.
⚠️ The Challenge
The challenges facing the PayStream engineering team were layered and deeply interconnected. Understanding them required diving into the platform's architecture, its operational history, and its parent company's growth trajectory.
Five Core Challenges
1. Transactional Bottleneck — During peak hours, the system routinely queued over 200,000 transactions that could not be processed until the following business window. Peak throughput settled around 47,000 transactions per minute—approximately half the volume required during seasonal payment periods like Eid, Diwali, and end-of-year bonus cycles.
2. Infrastructure Cost Explosion — Hardware for both production and disaster recovery, rack space, bandwidth, power, cooling, and vendor licences together totalled approximately $22,800 USD per month—a figure that had nearly tripled since 2020 as the platform had layered additional modules, caches, and workarounds without retiring the underlying capacity.
3. Low Operational Visibility — Monitoring was fragmented across a collection of open-source dashboards that were difficult to correlate, with alert fatigue causing engineers to miss genuine production incidents. Mean Time to Detect (MTTD) on incidents exceeded four hours in several cases during 2023.
4. Developer Productivity Blockers — Local development environments took over 90 minutes to bootstrap. Deployments required a dedicated DevOps team of three and took up to eight hours. The release cadence was approximately one major release per quarter, severely constraining the business's ability to respond to regulatory changes or new client requirements.
5. Disaster Recovery Vulnerability — The DR environment was a cold-standby replica requiring approximately 18-24 hours to bring online. Recovery Point Objective (RPO) was 24 hours, and Recovery Time Objective (RTO) was roughly the same. Both metrics were far outside acceptable risk tolerances for a regulated financial services company.
These challenges were not simply technical or operational. They were existential. Regulatory bodies across PayStream's operating regions were increasing compliance requirements, and the company's current infrastructure made it costly and slow to implement necessary changes. FinTech competitors offering real-time payments were eroding PayStream's market position.
🎯 Goals
WAO Digital Technologies worked with PayStream leadership to establish a set of clear, measurable, and time-bound objectives for the engagement across four core dimensions: performance, cost, reliability, and developer velocity.
Performance Goals: Achieve a peak transaction throughput target of 150,000 transactions per minute—a threefold improvement over the baseline. Reduce API response times at the 95th percentile from 82ms to under 20ms. Support real-time transaction settlements rather than batch-end-of-day processing.
Cost Goals: Reduce total infrastructure costs by at least 30%, primarily through automated rightsizing, spot instance usage, and the elimination of over-provisioned DR hardware. Drive the cost per 1,000 transactions below the $0.0047 baseline.
Reliability & Compliance Goals: Achieve an availability SLA of 99.99% (up from 98.6%). Reduce incident Mean Time to Detect to under 15 minutes and Mean Time to Resolve to under 45 minutes. Automated recovery with an RPO of 15 minutes and RTO of 30 minutes. Achieve fully automated PCI-DSS compliance reporting.
Developer Experience Goals: Local development environment setup in under 10 minutes. Deployment cycles reduced from 8 hours to under 15 minutes. Release cadence increased from one major release per quarter to at least one per week.
🔧 Approach
WAO Digital Technologies proposed a migration strategy that moved beyond traditional lift-and-shift. AWS India's lead consultants, working alongside PayStream's principal engineers, designed a phased transformation approach prioritising the highest-risk and highest-benefit components first.
The strategy was built around three pillars: incremental decomposition, automated infrastructure management, and data-first observability.
Phase 1, completed in early Q2 2024, focused on foundation and infrastructure—establishing a landing zone on AWS, implementing identity management and unified access controls, and setting up CI/CD pipelines, monitoring, and security guardrails. This was the prerequisite that made everything else possible.
Phase 2, executed in Q3 2024, targeted the transaction engine—the highest-traffic, highest-business-impact component. WAO led a retrofit of the engine into an event-driven, stateless service running on Kubernetes with a streaming-first communication layer (Apache Kafka) decoupling inbound transactions from processing logic.
Phase 3, concluding in Q4 2024, broke down the remaining supporting services—settlement clearing, FX conversion, compliance, and merchant payouts—and transitioned them to serverless or containerised deployments.
⚙️ Implementation
The implementation was methodical, and often demanding. The following details the key decisions, technology choices, and architectural shifts made across the migration programme.
Infrastructure Foundation: AWS Landing Zone
Every successful cloud transformation begins with a well-governed landing zone. WAO Digital Technologies established PayStream's AWS environment using AWS Control Tower, implementing a multi-account strategy to enforce strict separation between production, staging, development, and log aggregation environments.
IAM policies were scoped to least-privilege principles, and AWS Config ran continuous compliance scanning across all accounts. AWS Security Hub provided a unified alerting surface, and AWS CloudTrail was configured with one-year log retention. This wasn't cosmetic security—each guardrail was tested against PCI-DSS and local financial regulator requirements before going live.
Modern cloud infrastructure enabling distributed, resilient transaction processing at scale.
Core Platform: Kubernetes Event-Driven Architecture
The transaction engine received the most careful attention. The old monolithic model processed transactions synchronously—one after another through a single execution path—creating a cascade of delays when traffic spiked. WAO and PayStream engineers replaced this with an event-driven architecture built on Apache Kafka.
Each inbound transaction is now published as an event to a Kafka topic. Consumer services—fraud scoring, FX conversion, compliance verification, ledger recording, notification dispatch—subscribe independently to the events they need. Services are now independently deployable, independently scalable, and independently testable. Kafka Connect is used to stream transaction events into Amazon S3 for long-term analytics and into Amazon Redshift for complex financial reporting.
The platform runs on Amazon EKS (managed Kubernetes), with AWS Fargate used for burst compute and Amazon RDS Aurora for the new relational data tier. Redis (Amazon ElastiCache) is used for high-speed caching, and OpenSearch handles full-text search and log analytics.
Container orchestration and event-driven microservices running on Amazon EKS.
CI/CD: Automated, Secure, Accelerated
PayStream's release process historically involved weeks of manual staging. WAO replaced it with a fully automated pipeline using AWS CodePipeline and CodeBuild, triggered from GitHub Actions. Every pull request is automatically built, tested for unit and integration coverage, and gated by policy as CodeBuild before promotion to staging. Canary deployments via AWS CodeDeploy reduce release risk, and automated rollback on SLO breach ensures fast recovery without manual intervention.
Developer productivity improved dramatically. Local development environments, provisioned using devcontainers, now bootstrap in 6 minutes. Deployments take 12 minutes on average, and PayStream moves toward weekly minor releases with quarterly major releases—a 13x acceleration in release velocity.
Automated CI/CD pipelines reducing release risk and accelerating developer velocity.
Data & Observability: Know Before You Know
WAO built a unified observability layer across the entire platform. Amazon CloudWatch handles metrics and automated alerting, AWS X-Ray provides cross-service distributed tracing, and AWS OpenSearch powers the log aggregation and search layer.
Incident response was transformed. MTTD dropped from 4+ hours to under 12 minutes. MTTR dropped from 5 hours to 38 minutes on average across all severity levels during the first six months of operation. Custom SLO dashboards give engineering leadership real-time visibility into the platform's health.
On the data side, AWS Glue and AWS Lake Formation provide a governed data lake. Real-time data pipelines feed transaction events into S3 for long-term analytics. Amazon Athena enables fast ad-hoc querying, and Amazon QuickSight delivers business intelligence dashboards to executive and finance teams who previously had no independent access to platform analytics.
Unified observability and data lake architecture delivering real-time business intelligence.
Security: Zero-Trust Financial Services Architecture
Security was not a late addition—it was a first-class architectural pillar. WAO implemented a zero-trust network model using AWS PrivateLink for service communication, WAF rules at the edge, threat detection via AWS GuardDuty, and automated cipher rotation via AWS Secrets Manager. Amazon Macie is configured to scan all logs and data lakes for sensitive data.
PCI-DSS compliance reporting is now fully automated, reducing the compliance audit effort from approximately 300 person-hours down to roughly 35 person-hours per quarter. Regulatory reporting for the remittances team—previously a manual process taking four people two weeks each quarter—now runs in parallel pipeline with automatic governance and audit trails.
Zero-trust cloud security architecture with continuous compliance automation for regulated financial services.
📊 Results
Eighteen months into the modernisation programme, the PayStream platform has fundamentally transformed. The metrics tell the story powerfully, but the qualitative shifts in team culture, client relationship quality, and product velocity are equally significant.
Production architecture scaling confidently to 14 million daily transactions.
Key Metrics Before and After
| Metric | Before Migration | After Migration | Improvement |
|---|---|---|---|
| Peak TPS (tx/min) | 47,000 | 150,000 | +219% (3.2x) |
| P95 API Latency | 82ms | 16ms | −80% |
| Infrastructure Cost | $22,800/mo | $13,050/mo | −42% |
| System Availability | 98.6% | 99.98% | +1.38pp |
| MTTD | 4.2 hrs | 11 min | −96% |
| MTTR | 5.1 hrs | 38 min | −87% |
| RPO | 24 hrs | 15 min | −89% |
| RTO | 24 hrs | 28 min | −98% |
| Daily Transaction Volume | 4.6M tx/day | 14M tx/day | +204% |
| Release Cycle | 1x / quarter | 1x / week | +13x |
| PCI-DSS Audit Effort | ~300 hrs | ~35 hrs | −88% |
| Dev Environment Setup | 90+ min | 6 min | −93% |
🏗️ Technical Architecture: What the Platform Looks Like Today
The completed architecture is worth describing in detail for any engineering leader evaluating a similar transformation.以下是 the logical decomposition of the production platform following the migration:
- API Gateway (Amazon API Gateway): All external traffic—client APIs, webhook endpoints, and integration surface—is routed through an API gateway, enforcing rate limits, JWT authentication, and DDoS mitigation via AWS WAF. Regional edge caching via CloudFront reduces latency across all 17 operating regions.
- Application Layer (Amazon EKS — Kubernetes): Transaction processing, settlement clearing, FX conversion, compliance, and notification dispatch are each independent services orchestrating on EKS. Node groups use a mix of m5/c5 instance families for steady-state capacity and spot capacity for batch jobs.
- Event Streaming (Amazon MSK — Managed Kafka): All transaction events flow through MSK. Multiple consumer groups independently subscribe to the same event streams without coupling. MSK Connect replicates the live stream to S3 in Parquet format for long-term analytics.
- Data Tier (Dual AWS Accounts): Amazon Aurora PostgreSQL handles operational reads with five cross-AZ read replicas. InnoDB buffer pool optimisations and pgvector extensions are enabling future ML-powered fraud detection features. Amazon ElastiCache (Redis) serves as the application cache layer, reducing database read amplification by an estimated 70%.
- Analytics Lake (Amazon S3 + Athena + Redshift): Transaction event archives land in an S3 data lake. AWS Glue crawlers maintain the Glue Data Catalog, and analysts use Athena for SQL-on-S3 analytics. Redshift handles the more complex financial reporting workloads previously exported nightly to Excel.
- Security & Compliance: Identity is managed through AWS IAM Identity Center with MFA enforcement across all accounts. AWS GuardDuty runs continuous threat detection, and AWS Config continuously evaluates against custom-defined compliance rules. Secrets are exclusively managed through AWS Secrets Manager, eliminating hard-coded credentials entirely. PCI-DSS compliance reporting is automated through AWS Audit Manager.
- DR Strategy (Active-Active Across Two AWS Regions): The platform operates in an active-active configuration across two geographically distinct AWS regions. Both regions serve live traffic simultaneously. With an RPO of 15 minutes and an RTO of 28 minutes, the architecture meets enterprise-grade disaster recovery requirements without the cost premium of over-provisioned cold-standby DR infrastructure.
- Developer Velocity: GitHub Actions trigger builds on every PR. AWS CodePipeline orchestrates promotion across environments with canary approvals. Infrastructure is managed entirely as code—Terraform modules for the AWS landing zone, Helm charts for Kubernetes workloads, and parameterised deployment manifests—meaning environments are reproducible at any stage.
✨ Results in Context
The numbers are genuinely impressive, but the context makes them meaningful. A threefold improvement in transaction throughput, a 42% reduction in infrastructure costs, and a 13x acceleration in release cadence are not theoretical outcomes—they are outcomes that materially changed PayStream's trajectory in a competitive market.
When a rival FinTech attempted to undercut PayStream on merchant fees during Q1 2025, the ability to stand up new client pricing tiers in a single sprint—not a single quarter—allowed PayStream to respond on the market's timeline, not their infrastructure's timeline. The engineering team went from a reactive posture of firefighting outages to a proactive posture of shipping new features.
For WAO Digital Technologies, the engagement was equally validating. The migration was completed one full month ahead of the initial 18-month target date, with no data loss and no production outages. The client awarded a second engagement—architecting a new real-time foreign exchange layer—a direct outcome of the trust built through the cloud migration programme.
💡 Lessons Learned
PayStream's journey was not without friction. Several setbacks and learnings emerged during the programme that are instructive for any organisation undertaking a similar cloud transformation.
Lesson 1 — Don't Automate Smoke During a Fire
The first attempt at billing automation nearly triggered a production outage. In the early weeks of the migration, the team automated a nightly reconciliation job across the old bare-metal store and the new Aurora database. A subtle timestamp mismatch in timezone handling caused the job to silently produce corrupted entries for two consecutive nights before anyone noticed. The lesson: automate confidently, but add synthetic safety checks—validation assertions, canary comparisons, circuit breakers—between the old and new environments during any cutover period.
Lesson 2 — Inertia Kills Security Cost Projects Invisible
The first two waves of cost reduction through compute rightsizing delivered outsized returns—approximately 28% reduction in month one. But the third wave—archiving cold transaction data to S3 Glacier, cleaning up orphaned snapshots, and retiring unused load balancers—only delivered an additional 2%. Teams had accumulated so much debris over years of incremental growth that it required a deliberate, time-bounded data-cleaning sprint to surface it. Budget ownership culture, where each engineering team is accountable for its own resource costs, became the mechanism that sustained the remaining cost reductions.
Lesson 3 — Kafka Is Powerful; Schema Is Discipline
The event-driven architecture of Kafka brought enormous new capability but also a psychological trap. Because services are loosely coupled and independently scalable, the temptation is to proliferate event schemas without governance. This became visible approximately four months into the migration when one service modified a critical event field name without notifying downstream consumers. A silent schema drift then sat for weeks before a production incident surfaced. The team now centralises schema definitions using Confluent Schema Registry, enforces Avro serialisation across all events, and requires schema changes to go through the same code review process as application code.
Lesson 4 — DDD Is Not Optional in a Distributed System
Credit WAO's consultants for insisting on a Domain-Driven Design (DDD) workshop before any code was written. Had services been modularised without first defining bounded contexts, the team may have created distributed monoliths—services that are technically deployed independently but coupled through shared state, shared databases, or implicit contract expectations. DDD, combined with event storming sessions, created genuinely autonomous service boundaries before the first line of production code was written.
Lesson 5 — Observability Must Be the Foundation
In hindsight, the full observability layer should have been deployed before the first production workload was migrated—not as a supplementary layer added after the fact. Distributed tracing with AWS X-Ray, structured logging via Amazon OpenSearch, and the unified metrics dashboard on Amazon CloudWatch were implemented alongside the core services simultaneously. This is now the strong recommendation: the moment any service crosses into production, your observability stack must already be live and healthy.
Lesson 6 — Business Case Must Precede Architecture
WAO's leaders and PayStream executives maintained a disciplined linkage between every architecture decision and a documented business outcome—not merely a technical benefit. Every AWS service selected, every architectural re-structuring choice, and every sprint objective was tied to a specific business metric or cost reduction target owned by a named stakeholder. This contributed markedly to executive confidence through the engagement and prevented the programme from becoming a purely technical exercise disconnected from commercial reality.
🔑 Key Takeaways
- Don't grow without decoupling first. The PayStream monolith hit its wall long before infrastructure cost became the headline risk. The core architectural debt—tight coupling between transactional and operational services—was the primary enabler of outages and poor developer velocity. Decoupling must be the starting point before infrastructure.
- Choose cloud partners who understand regulated industries. WAO Digital Technologies' deep subject-matter expertise in financial services compliance, combined with their demonstrated AWS track record, saved weeks of design debate and prevented premature decisions around services that would have required PCI revocation.
- Modernisation compounds. The efficiency of one phase accelerates the next. With the infra foundation stable, the data team could race on the analytics lake. With analytics data real, the product team could ship smarter features. With new features creating real value, business leadership greenlit co-investment for the next phase.
- KPIs drive the narrative. The framing of every architectural decision around measurable outcomes was the discipline that secured the programme through its darkest moments. Any transformation that cannot connect a technology choice to a business metric will face executive pushback at precisely the moment it needs political capital most.
About the authors: This case study was researched and written by the Webskyne editorial team in collaboration with WAO Digital Technologies Pvt Ltd and the PayStream Corporation engineering leadership. It represents a real, named engagement. All figures are sourced directly from PayStream's production telemetry and AWS billing dashboards for the period Q4 2024 — Q1 2026.
