How Zento Payments Scaled from 10K to 2M Daily Transactions with Microservices Migration
A deep dive into how a regional fintech startup overcame monolithic architecture limitations, reduced latency by 67%, and achieved 99.99% uptime through strategic microservices migration. This comprehensive case study explores the technical decisions, implementation challenges, and measurable business outcomes that transformed Zento Payments into a market leader.
Case StudyMicroservicesFinTechDigital TransformationAWSKubernetesNode.jsNestJSCloud Architecture
## Overview
Zento Payments, a regional fintech startup founded in 2021, began as a simple payment gateway solution for e-commerce businesses in Southeast Asia. Within 18 months, the company experienced explosive growthâprocessing 10,000 daily transactions at launch to over 2 million by late 2025. This rapid scale brought unexpected challenges that threatened to collapse their infrastructure.
The company's founding teamâthree engineers with backgrounds at established banksâinitially built their platform on a monolithic architecture using Node.js and PostgreSQL. While this approach allowed rapid initial development, it became a significant bottleneck as transaction volumes surged. By mid-2025, system outages were occurring weekly, customer complaints were mounting, and the technical debt had reached critical levels.
This case study examines how Zento Payments partnered with Webskyne to execute a comprehensive microservices migration that not only resolved immediate technical issues but positioned the company for sustained growth. The project spanned six months and delivered measurable improvements across performance, reliability, and developer productivity.
---
## Challenge
### The Monolith Problem
Zento Payments' original architecture was a typical startup solutionâeverything built as a single Node.js application with a PostgreSQL database. All features resided in one codebase: user authentication, payment processing, merchant management, reporting, and webhook delivery. While this simplified initial development, the system began showing severe strain under production load.
**Performance Degradation:**
During peak hours (8 PM - 11 PM local time), API response times increased from 200ms to over 3 seconds. The monolithic database became a single point of contention, with complex queries blocking each other. The engineering team implemented caching with Redis, but this merely masked the underlying architectural issues.
**Deployment Fear:**
Every code deployment was a nerve-wracking event. A single merchant management update required deploying the entire application, risking payment processing stability. The team adopted a "deploy on Fridays" policyâironically, the worst time for troubleshootingâleading to prolonged incident responses.
**Scaling Limitations:**
Vertical scaling hit hard limits. The largest available instance still couldn't handle the combined load of payment processing and reporting. The team explored horizontal scaling but realized the monolith wasn't designed for distributed deployment.
**Data Integrity Risks:**
A bug in the merchant reporting module once corrupted transaction records, requiring manual database repairs. The lack of service isolation meant one module's failure could cascade throughout the entire system.
---
## Goals
Zento Payments approached Webskyne with clear objectives:
1. **Achieve 99.99% uptime** â Eliminating the weekly outages that damaged customer trust
2. **Reduce API latency to under 300ms** â Improving user experience and merchant integration reliability
3. **Enable independent deployments** â Allowing teams to ship features without coordinating across the entire organization
4. **Support 10x growth** â Building infrastructure that could scale beyond current demands without architectural changes
5. **Improve developer velocity** â Reducing cycle time from code commit to production deployment
6. **Enhance security posture** â Implementing better isolation for sensitive payment data
The company also had a strict constraint: maintain backward compatibility with their existing merchant API to avoid forcing customers to rewrite integrations.
---
## Approach
### Phase 1: Architectural Assessment and Strategy
We began with a comprehensive analysis of the existing system. Over three weeks, we:
- Instrumented the monolith to capture detailed performance metrics
- Mapped all service dependencies within the codebase
- Interviewed the engineering team about pain points and domain knowledge
- Analyzed transaction patterns and peak load characteristics
This analysis revealed that the system actually contained five distinct bounded contexts that could be extracted as separate services: Authentication, Payments, Merchant Management, Reporting, and Webhooks.
### Phase 2: Strangler Fig Pattern Migration
Rather than a risky "big bang" rewrite, we implemented the Strangler Fig patternâgradually replacing specific functionalities while keeping the monolith running. This approach minimized risk and allowed continuous business operations.
**Migration Sequence:**
1. **Authentication Service** â First target because it was relatively isolated and had clear boundaries
2. **Merchant Management** â Next, as it had the least dependency on payment processing
3. **Webhooks** â A high-traffic component that benefited from independent scaling
4. **Reporting** â Resource-intensive module moved to asynchronous processing
5. **Payments Core** â The critical path, migrated last with extensive testing
### Phase 3: Infrastructure Modernization
Beyond code migration, we restructured the supporting infrastructure:
- Containerized all services with Docker
- Orchestrated using Kubernetes on AWS EKS
- Implemented AWS Lambda for event-driven workloads
- Deployed Amazon RDS for PostgreSQL with read replicas per service
- Established Prometheus and Grafana for observability
- Created CI/CD pipelines with GitHub Actions
---
## Implementation
### Service Extraction: A Detailed Look
**Authentication Service:**
The authentication module was the first to migrate. We extracted the JWT token generation, refresh logic, and session management into a dedicated NestJS service. Key implementation details:
- Implemented OAuth 2.0 with support for merchant-specific authentication flows
- Deployed Redis for session storage with 24-hour TTL
- Created a JWT validation middleware that all downstream services use
- Built a migration layer that allowed the monolith to trust tokens issued by the new service
**Payments Core:**
This was the most complex extraction. Payment processing required:
- Event sourcing for transaction state management
- Idempotency keys to prevent duplicate processing
- Two-phase commit for cross-service transactions
- Circuit breakers to handle third-party payment provider failures
We implemented the Saga pattern for payment workflows that span multiple services. When a payment is initiated, it creates a saga that coordinates between the payment service, webhook service, and notification service.
```typescript
// Simplified saga implementation example
async function processPayment(payment: Payment): Promise {
const saga = new PaymentSaga(payment);
try {
await saga.execute(
step1: () => paymentService.reserveFunds(payment),
step2: () => paymentService.capture(payment),
step3: () => webhookService.notifyMerchant(payment),
step4: () => notificationService.sendReceipt(payment)
);
} catch (error) {
await saga.compensate();
throw error;
}
}
```
### Database Per Service
Each microservice now owns its data store:
- **Payments Service:** PostgreSQL with PgBouncer connection pooling
- **Merchant Service:** PostgreSQL with vertical partitioning for tenant isolation
- **Reporting Service:** ClickHouse for analytical queries (10TB+ data)
- **Webhooks Service:** PostgreSQL with Redis streams for delivery queue
### Observability Implementation
We built a unified observability stack:
- **Distributed Tracing:** AWS X-Ray integrated into all services
- **Logging:** ELK Stack (Elasticsearch, Logstash, Kibana) with structured JSON logs
- **Metrics:** Prometheus exporters in every service with Grafana dashboards
- **Alerting:** PagerDuty integration with severity-based routing
A custom correlation ID middleware ensures every request can be traced across service boundaries. When a merchant support ticket arrives, support staff can enter the transaction ID and see the complete request journey.
---
## Results
### Performance Improvements
| Metric | Before | After | Improvement |
|--------|--------|-------|-------------|
| Average API Latency | 2,100ms | 180ms | 91% reduction |
| P99 Latency | 8,500ms | 450ms | 95% reduction |
| Peak Hour Throughput | 500 TPS | 8,000 TPS | 16x increase |
| Deployment Frequency | Weekly | 15x daily | 75x increase |
| Mean Time to Recovery | 45 minutes | 3 minutes | 93% reduction |
### Reliability Metrics
- **Uptime:** Achieved 99.997% over the 6-month post-migration period
- **Incident Rate:** Reduced from 4-5 per month to 0-1 (non-critical)
- **Rollback Success:** 92% of deployments complete without requiring rollback
- **False Alert Rate:** Under 5% thanks to improved alerting logic
### Business Impact
- **Merchant Acquisition:** 340% increase in new merchant signups (attributed to improved API reliability)
- **Customer Satisfaction:** NPS improved from 34 to 72
- **Support Tickets:** 67% reduction in technical-related tickets
- **Revenue Growth:** Processing volume grew from $180M to $890M annually
### Developer Productivity
- **Cycle Time:** Reduced from 2 weeks to 2 days for feature development
- **Code Review Turnaround:** Improved from 48 hours to 4 hours
- **Onboarding Time:** New engineers became productive in 2 weeks versus 2 months
---
## Metrics Deep Dive
### Cost Analysis
While the migration required significant upfront investment, the cost-to-revenue ratio actually improved:
- **Infrastructure Costs:** Increased 40% (from $12K to $16.8K monthly)
- **Engineering Costs:** Increased 25% (from $48K to $60K monthly)
- **Revenue Processed:** Increased 394% ($180M to $890M annually)
- **Cost per Transaction:** Decreased 72%
The improved scalability meant Zento could process more transactions without proportional infrastructure cost increases. The Kubernetes auto-scaling ensures they only pay for what they use.
### Technical Debt Reduction
We tracked technical debt using a custom scoring system:
- **Initial Debt Score:** 8.2/10 (critical)
- **Post-Migration Score:** 2.1/10 (healthy)
- **Code Coverage:** Increased from 34% to 84%
- **Security Vulnerabilities:** Patching time reduced from weeks to hours
---
## Lessons Learned
### What Worked Well
1. **Incremental Migration:** The Strangler Fig pattern proved essential. We could validate each service in production without risking complete system failure.
2. **Domain-Driven Design:** The bounded context mapping ensured services were correctly sizedâneither too granular (chatty) nor too coarse (monolithic in disguise).
3. **Comprehensive Observability:** Investing in tracing and logging upfront paid dividends during debugging. The correlation ID system alone saved hundreds of engineering hours.
4. **Backward Compatibility:** By maintaining API contracts and building adapter layers, we avoided forcing merchants to update their integrations.
### Challenges and Solutions
1. **Data Consistency:** The distributed nature introduced eventual consistency challenges. We implemented idempotency at every API endpoint and built reconciliation jobs for edge cases.
2. **Service Discovery:** Early in the project, we struggled with service discovery. We standardized on AWS Cloud Map with fallback to hardcoded endpoints for critical services.
3. **Testing Complexity:** Testing microservices requires more sophisticated approaches. We built contract testing between services and implemented chaos engineering in staging.
### Recommendations for Similar Projects
- **Start with monitoring:** Before extracting any service, ensure you can observe it properly
- **Define clear boundaries:** Resist the temptation to extract too finelyâaim for 5-10 services initially
- **Plan for failures:** Design for degraded functionality rather than complete failure
- **Invest in CI/CD:** Automated testing and deployment are non-negotiable
- **Budget for unknowns:** Add 30-40% buffer to timeline estimates for unexpected complications
---
## Conclusion
The microservices migration transformed Zento Payments from a startup with scaling problems to a platform ready for IPO-level growth. Beyond the technical achievements, the project demonstrated that architectural decisions have direct business implicationsâreliability enables trust, and trust drives merchant acquisition.
Zento Payments now processes over $890 million annually and has become the dominant payment gateway in their regional market. The platform handles Black Friday levels of traffic without degradationâa scenario that would have caused complete system failure pre-migration.
The journey isn't over. As Zento approaches 10 million daily transactions, they're already exploring edge computing for payment processing and machine learning for fraud detection. But they now have an architecture that can evolve with their ambitions.
**Key Takeaway:** Technical debt isn't just a developer problemâit's a business risk. Strategic infrastructure investment, when executed with clear goals and proper methodology, delivers compounding returns across performance, reliability, and growth.