Webskyne
Webskyne
LOGIN
← Back to journal

23 March 2026 • 9 min

How Zento Payments Scaled from 10K to 2M Daily Transactions with Microservices Migration

A deep dive into how a regional fintech startup overcame monolithic architecture limitations, reduced latency by 67%, and achieved 99.99% uptime through strategic microservices migration. This comprehensive case study explores the technical decisions, implementation challenges, and measurable business outcomes that transformed Zento Payments into a market leader.

Case StudyMicroservicesFinTechDigital TransformationAWSKubernetesNode.jsNestJSCloud Architecture
How Zento Payments Scaled from 10K to 2M Daily Transactions with Microservices Migration
## Overview Zento Payments, a regional fintech startup founded in 2021, began as a simple payment gateway solution for e-commerce businesses in Southeast Asia. Within 18 months, the company experienced explosive growth—processing 10,000 daily transactions at launch to over 2 million by late 2025. This rapid scale brought unexpected challenges that threatened to collapse their infrastructure. The company's founding team—three engineers with backgrounds at established banks—initially built their platform on a monolithic architecture using Node.js and PostgreSQL. While this approach allowed rapid initial development, it became a significant bottleneck as transaction volumes surged. By mid-2025, system outages were occurring weekly, customer complaints were mounting, and the technical debt had reached critical levels. This case study examines how Zento Payments partnered with Webskyne to execute a comprehensive microservices migration that not only resolved immediate technical issues but positioned the company for sustained growth. The project spanned six months and delivered measurable improvements across performance, reliability, and developer productivity. --- ## Challenge ### The Monolith Problem Zento Payments' original architecture was a typical startup solution—everything built as a single Node.js application with a PostgreSQL database. All features resided in one codebase: user authentication, payment processing, merchant management, reporting, and webhook delivery. While this simplified initial development, the system began showing severe strain under production load. **Performance Degradation:** During peak hours (8 PM - 11 PM local time), API response times increased from 200ms to over 3 seconds. The monolithic database became a single point of contention, with complex queries blocking each other. The engineering team implemented caching with Redis, but this merely masked the underlying architectural issues. **Deployment Fear:** Every code deployment was a nerve-wracking event. A single merchant management update required deploying the entire application, risking payment processing stability. The team adopted a "deploy on Fridays" policy—ironically, the worst time for troubleshooting—leading to prolonged incident responses. **Scaling Limitations:** Vertical scaling hit hard limits. The largest available instance still couldn't handle the combined load of payment processing and reporting. The team explored horizontal scaling but realized the monolith wasn't designed for distributed deployment. **Data Integrity Risks:** A bug in the merchant reporting module once corrupted transaction records, requiring manual database repairs. The lack of service isolation meant one module's failure could cascade throughout the entire system. --- ## Goals Zento Payments approached Webskyne with clear objectives: 1. **Achieve 99.99% uptime** — Eliminating the weekly outages that damaged customer trust 2. **Reduce API latency to under 300ms** — Improving user experience and merchant integration reliability 3. **Enable independent deployments** — Allowing teams to ship features without coordinating across the entire organization 4. **Support 10x growth** — Building infrastructure that could scale beyond current demands without architectural changes 5. **Improve developer velocity** — Reducing cycle time from code commit to production deployment 6. **Enhance security posture** — Implementing better isolation for sensitive payment data The company also had a strict constraint: maintain backward compatibility with their existing merchant API to avoid forcing customers to rewrite integrations. --- ## Approach ### Phase 1: Architectural Assessment and Strategy We began with a comprehensive analysis of the existing system. Over three weeks, we: - Instrumented the monolith to capture detailed performance metrics - Mapped all service dependencies within the codebase - Interviewed the engineering team about pain points and domain knowledge - Analyzed transaction patterns and peak load characteristics This analysis revealed that the system actually contained five distinct bounded contexts that could be extracted as separate services: Authentication, Payments, Merchant Management, Reporting, and Webhooks. ### Phase 2: Strangler Fig Pattern Migration Rather than a risky "big bang" rewrite, we implemented the Strangler Fig pattern—gradually replacing specific functionalities while keeping the monolith running. This approach minimized risk and allowed continuous business operations. **Migration Sequence:** 1. **Authentication Service** — First target because it was relatively isolated and had clear boundaries 2. **Merchant Management** — Next, as it had the least dependency on payment processing 3. **Webhooks** — A high-traffic component that benefited from independent scaling 4. **Reporting** — Resource-intensive module moved to asynchronous processing 5. **Payments Core** — The critical path, migrated last with extensive testing ### Phase 3: Infrastructure Modernization Beyond code migration, we restructured the supporting infrastructure: - Containerized all services with Docker - Orchestrated using Kubernetes on AWS EKS - Implemented AWS Lambda for event-driven workloads - Deployed Amazon RDS for PostgreSQL with read replicas per service - Established Prometheus and Grafana for observability - Created CI/CD pipelines with GitHub Actions --- ## Implementation ### Service Extraction: A Detailed Look **Authentication Service:** The authentication module was the first to migrate. We extracted the JWT token generation, refresh logic, and session management into a dedicated NestJS service. Key implementation details: - Implemented OAuth 2.0 with support for merchant-specific authentication flows - Deployed Redis for session storage with 24-hour TTL - Created a JWT validation middleware that all downstream services use - Built a migration layer that allowed the monolith to trust tokens issued by the new service **Payments Core:** This was the most complex extraction. Payment processing required: - Event sourcing for transaction state management - Idempotency keys to prevent duplicate processing - Two-phase commit for cross-service transactions - Circuit breakers to handle third-party payment provider failures We implemented the Saga pattern for payment workflows that span multiple services. When a payment is initiated, it creates a saga that coordinates between the payment service, webhook service, and notification service. ```typescript // Simplified saga implementation example async function processPayment(payment: Payment): Promise { const saga = new PaymentSaga(payment); try { await saga.execute( step1: () => paymentService.reserveFunds(payment), step2: () => paymentService.capture(payment), step3: () => webhookService.notifyMerchant(payment), step4: () => notificationService.sendReceipt(payment) ); } catch (error) { await saga.compensate(); throw error; } } ``` ### Database Per Service Each microservice now owns its data store: - **Payments Service:** PostgreSQL with PgBouncer connection pooling - **Merchant Service:** PostgreSQL with vertical partitioning for tenant isolation - **Reporting Service:** ClickHouse for analytical queries (10TB+ data) - **Webhooks Service:** PostgreSQL with Redis streams for delivery queue ### Observability Implementation We built a unified observability stack: - **Distributed Tracing:** AWS X-Ray integrated into all services - **Logging:** ELK Stack (Elasticsearch, Logstash, Kibana) with structured JSON logs - **Metrics:** Prometheus exporters in every service with Grafana dashboards - **Alerting:** PagerDuty integration with severity-based routing A custom correlation ID middleware ensures every request can be traced across service boundaries. When a merchant support ticket arrives, support staff can enter the transaction ID and see the complete request journey. --- ## Results ### Performance Improvements | Metric | Before | After | Improvement | |--------|--------|-------|-------------| | Average API Latency | 2,100ms | 180ms | 91% reduction | | P99 Latency | 8,500ms | 450ms | 95% reduction | | Peak Hour Throughput | 500 TPS | 8,000 TPS | 16x increase | | Deployment Frequency | Weekly | 15x daily | 75x increase | | Mean Time to Recovery | 45 minutes | 3 minutes | 93% reduction | ### Reliability Metrics - **Uptime:** Achieved 99.997% over the 6-month post-migration period - **Incident Rate:** Reduced from 4-5 per month to 0-1 (non-critical) - **Rollback Success:** 92% of deployments complete without requiring rollback - **False Alert Rate:** Under 5% thanks to improved alerting logic ### Business Impact - **Merchant Acquisition:** 340% increase in new merchant signups (attributed to improved API reliability) - **Customer Satisfaction:** NPS improved from 34 to 72 - **Support Tickets:** 67% reduction in technical-related tickets - **Revenue Growth:** Processing volume grew from $180M to $890M annually ### Developer Productivity - **Cycle Time:** Reduced from 2 weeks to 2 days for feature development - **Code Review Turnaround:** Improved from 48 hours to 4 hours - **Onboarding Time:** New engineers became productive in 2 weeks versus 2 months --- ## Metrics Deep Dive ### Cost Analysis While the migration required significant upfront investment, the cost-to-revenue ratio actually improved: - **Infrastructure Costs:** Increased 40% (from $12K to $16.8K monthly) - **Engineering Costs:** Increased 25% (from $48K to $60K monthly) - **Revenue Processed:** Increased 394% ($180M to $890M annually) - **Cost per Transaction:** Decreased 72% The improved scalability meant Zento could process more transactions without proportional infrastructure cost increases. The Kubernetes auto-scaling ensures they only pay for what they use. ### Technical Debt Reduction We tracked technical debt using a custom scoring system: - **Initial Debt Score:** 8.2/10 (critical) - **Post-Migration Score:** 2.1/10 (healthy) - **Code Coverage:** Increased from 34% to 84% - **Security Vulnerabilities:** Patching time reduced from weeks to hours --- ## Lessons Learned ### What Worked Well 1. **Incremental Migration:** The Strangler Fig pattern proved essential. We could validate each service in production without risking complete system failure. 2. **Domain-Driven Design:** The bounded context mapping ensured services were correctly sized—neither too granular (chatty) nor too coarse (monolithic in disguise). 3. **Comprehensive Observability:** Investing in tracing and logging upfront paid dividends during debugging. The correlation ID system alone saved hundreds of engineering hours. 4. **Backward Compatibility:** By maintaining API contracts and building adapter layers, we avoided forcing merchants to update their integrations. ### Challenges and Solutions 1. **Data Consistency:** The distributed nature introduced eventual consistency challenges. We implemented idempotency at every API endpoint and built reconciliation jobs for edge cases. 2. **Service Discovery:** Early in the project, we struggled with service discovery. We standardized on AWS Cloud Map with fallback to hardcoded endpoints for critical services. 3. **Testing Complexity:** Testing microservices requires more sophisticated approaches. We built contract testing between services and implemented chaos engineering in staging. ### Recommendations for Similar Projects - **Start with monitoring:** Before extracting any service, ensure you can observe it properly - **Define clear boundaries:** Resist the temptation to extract too finely—aim for 5-10 services initially - **Plan for failures:** Design for degraded functionality rather than complete failure - **Invest in CI/CD:** Automated testing and deployment are non-negotiable - **Budget for unknowns:** Add 30-40% buffer to timeline estimates for unexpected complications --- ## Conclusion The microservices migration transformed Zento Payments from a startup with scaling problems to a platform ready for IPO-level growth. Beyond the technical achievements, the project demonstrated that architectural decisions have direct business implications—reliability enables trust, and trust drives merchant acquisition. Zento Payments now processes over $890 million annually and has become the dominant payment gateway in their regional market. The platform handles Black Friday levels of traffic without degradation—a scenario that would have caused complete system failure pre-migration. The journey isn't over. As Zento approaches 10 million daily transactions, they're already exploring edge computing for payment processing and machine learning for fraud detection. But they now have an architecture that can evolve with their ambitions. **Key Takeaway:** Technical debt isn't just a developer problem—it's a business risk. Strategic infrastructure investment, when executed with clear goals and proper methodology, delivers compounding returns across performance, reliability, and growth.

Related Posts

Real-Time Financial Analytics Dashboard: From Legacy Spreadsheets to Modern Data Platform
Case Study

Real-Time Financial Analytics Dashboard: From Legacy Spreadsheets to Modern Data Platform

A leading investment firm transformed their decision-making process by replacing manual spreadsheet workflows with a real-time analytics dashboard. The new platform reduced report generation time from 4 hours to 30 seconds, enabled live market data integration, and empowered analysts to focus on strategic insights rather than data compilation. This case study details the technical approach, architecture decisions, and measurable business outcomes achieved over a 16-week implementation.

How TechMart Transformed Their Legacy Platform into a Cloud-Native E-commerce Powerhouse
Case Study

How TechMart Transformed Their Legacy Platform into a Cloud-Native E-commerce Powerhouse

Discover how TechMart, a mid-sized electronics retailer, overcame significant technical debt and scalability challenges by migrating their decade-old e-commerce platform to a modern cloud-native architecture. This comprehensive case study details their journey from monolithic beginnings to a microservices-based system that handled 10x traffic growth while reducing infrastructure costs by 40%.

Transforming Legacy Operations: How AeroTech Industries Achieved 340% ROI Through Digital Modernization
Case Study

Transforming Legacy Operations: How AeroTech Industries Achieved 340% ROI Through Digital Modernization

AeroTech Industries, a mid-sized aerospace components manufacturer, was struggling with obsolete inventory systems and disconnected workflows that cost them millions annually. This case study explores their journey from manual spreadsheets to an integrated digital ecosystem, achieving remarkable results including 67% reduction in inventory carrying costs and 89% improvement in order fulfillment speed. Discover the strategic approach, implementation challenges, and key lessons from this manufacturing transformation that can guide similar initiatives.