How Webskyne Built a Scalable Real-Time Payment Platform for Finova: A FinTech Success Story

Finova, a rapidly growing fintech startup, needed a robust, scalable payment platform capable of processing thousands of transactions per second with sub-second latency. Partnering with Webskyne, they embarked on a 6-month journey to replace their legacy batch-processing system with a modern, event-driven architecture powered by AWS, Node.js, and microservices. This case study details the challenges faced, the goals set, the technical approach chosen, the implementation phases, the results achieved, key performance metrics, and the lessons learned throughout the project. Discover how Finova now handles peak loads of 50K TPS with 99.99% uptime, enabling real-time fund transfers and setting a new benchmark in the industry.

# How Webskyne Built a Scalable Real-Time Payment Platform for Finova: A FinTech Success Story ## Overview Finova is a fast‑growing fintech startup that provides instant peer‑to‑peer payments, merchant acquiring, and API‑based banking services to over 2 million users across Southeast Asia. Founded in 2020, the company quickly gained traction thanks to its user‑friendly mobile app and competitive pricing. However, as transaction volumes surged from a few hundred per day to tens of thousands per hour, the existing payment engine—built on a monolithic Ruby on Rails application with periodic batch settlement—began to show its limits. Latency spikes, failed settlements during peak hours, and an inability to offer real‑time balance updates threatened Finova’s core value proposition. In early 2025, Finova’s leadership decided to modernize the payment infrastructure. After evaluating several vendors, they selected Webskyne for its deep expertise in cloud‑native architectures, proven track record in high‑throughput financial systems, and ability to deliver end‑to‑end solutions from strategy to production. The goal was clear: design and implement a new payment platform capable of processing at least 50 000 transactions per second (TPS) with end‑to‑end latency under 200 ms, while maintaining 99.99% availability and full regulatory compliance. ![Modern payment architecture diagram](https://images.unsplash.com/photo-1551288049-bebda4e38f71?auto=format&fit=crop&w=1400&q=80) *Figure 1: High‑level architecture of Finova’s new real‑time payment platform (source: Webskyne).* ## Challenge Finova’s legacy system suffered from several critical shortcomings: 1. **Batch‑Oriented Processing** – Transactions were collected in queues and settled every 15 minutes, causing delays that made instant transfers impossible. 2. **Monolithic Codebase** – The Rails monolith made scaling difficult; scaling the application meant scaling the entire stack, leading to inefficient resource utilization. 3. **Single Point of Failure** – A single database instance handled all transaction writes; any hiccup caused system‑wide outages. 4. **Limited Observability** – Logging and metrics were rudimentary, making root‑cause analysis during incidents time‑consuming. 5. **Regulatory Pressure** – New local regulations required real‑time transaction reporting and audit trails, which the batch system could not provide. These challenges translated into business risks: customer churn due to slow refunds, merchant dissatisfaction from delayed settlement, and potential fines for non‑compliance with real‑time reporting mandates. ## Goals Webskyne and Finova defined the following measurable objectives for the modernization effort: - **Throughput**: Sustain 50 000 TPS peak, with headroom for 20% growth year‑over‑year. - **Latency**: 95th‑percentile end‑to‑end transaction latency ≤ 200 ms from API entry to settlement confirmation. - **Availability**: 99.99% uptime (≤ 52.6 minutes of downtime per year) for the payment API. - **Scalability**: Horizontal scaling of stateless services; ability to add capacity without downtime. - **Compliance**: Full adherence to PCI‑DSS v4.0, local real‑time reporting rules, and data residency requirements. - **Operability**: Comprehensive observability (metrics, tracing, structured logging) and automated rollback capabilities. - **Time‑to‑Market**: Deliver a production‑ready minimum viable platform within 6 months. ## Approach Webskyne proposed a cloud‑native, event‑driven architecture built on AWS, leveraging managed services where possible to reduce operational overhead. The key pillars of the approach were: 1. **Domain‑Driven Design (DDD)** – Bounded contexts for *Payments*, *Ledger*, *Fraud*, *Settlement*, and *API Gateway* were identified, each owning its data and business logic. 2. **Microservices** – Each bounded context deployed as a set of independently scalable services, communicating via asynchronous events (Amazon SNS/SQS) and synchronous gRPC for low‑latency queries. 3. **Event Sourcing & CQRS** – The *Ledger* context adopted event sourcing to guarantee an immutable audit trail, while read models were updated via CQRS for fast balance queries. 4. **Infrastructure as Code** – All AWS resources provisioned via AWS CDK (TypeScript), enabling repeatable, version‑controlled environments. 5. **Observability Stack** – AWS X‑Ray for distributed tracing, Amazon CloudWatch for metrics and logs, and custom dashboards in Grafana. 6. **Security‑First** – Mutual TLS between services, AWS Secrets Manager for credential rotation, IAM roles with least‑privilege policies, and regular penetration testing. 7. **Gradual Cutover** – A strangler‑fig pattern: the new system ran in parallel with the legacy system, routing a percentage of traffic via feature flags until full confidence was achieved. ## Implementation The project was divided into six phases over 24 weeks: ### Phase 0: Foundations (Weeks 1‑2) - Set up AWS Organization, VPC, private subnets, and baseline security groups. - Created CI/CD pipelines (GitHub Actions → AWS CodeBuild → CodeDeploy) for automated testing and canary deployments. - Defined data contracts using Protobuf and AsyncAPI. ### Phase 1: API Gateway & Authentication (Weeks 3‑4) - Built a lightweight API Gateway using Amazon API Gateway (REST) with JWT‑based auth (Cognito User Pools). - Implemented rate limiting, request/response validation, and centralized error handling. - Integrated with Finova’s existing OAuth2 provider for single sign‑on. ### Phase 2: Payments Bounded Context (Weeks 5‑8) - Developed the *Payment Service* (Node.js, TypeScript) responsible for validating incoming payment requests, performing fraud checks via a rules engine, and emitting a `PaymentInitiated` event. - Created the *Fraud Service* (Python, Scikit‑learn model) that subscribes to `PaymentInitiated`, scores each transaction, and emits `FraudCleared` or `FraudDeclined` events. - Built the *Ledger Write Service* (Java, Spring Boot) that processes cleared events, writes events to an event store (Amazon QLDB), and updates materialized views. ### Phase 3: Ledger & Read Models (Weeks 9‑12) - Implemented event sourcing in QLDB with a journal of immutable transaction events. - Created projection lambdas (AWS Lambda) that update DynamoDB tables for account balances, transaction histories, and daily aggregates. - Exposed balance queries via a *Balance Query Service* (Go) with sub‑10 ms read latency using DAX. ### Phase 4: Settlement & Payout (Weeks 13‑16) - Designed a *Settlement Engine* that aggregates cleared payments into batches every second (configurable) and initiates ACH/RTGS files via SFTP to partner banks. - Added reconciliation service that matches bank settlement files with ledger entries, flagging discrepancies for manual review. ### Phase 5: Observability, Security & Testing (Weeks 17‑20) - Instrumented all services with OpenTelemetry; traced requests from API gateway to ledger write. - Set up CloudWatch alarms on latency, error rates, and queue depths. - Conducted PCI‑DSS vulnerability scans and third‑party penetration tests. - Implemented chaos engineering experiments (Latency injection, instance termination) using AWS Fault Injection Simulator. ### Phase 6: Cutover & Hypercare (Weeks 21‑24) - Gradually shifted traffic from legacy to new platform using weighted routing in API Gateway (starting at 5%, doubling every 24 hours). - Ran dual‑write for audit: legacy system continued to receive events for comparison. - After 100% traffic migration, decommissioned the legacy batch pipelines. - Provided 2‑week hypercare support with war‑room rotations. ## Results Four weeks after full cutover, Finova observed transformative improvements: - **Throughput**: Sustained peak of 52 000 TPS during Black Friday‑equivalent flash sales, with 99.9th percentile latency of 162 ms. - **Availability**: 99.992% uptime in the first month (≈ 42 minutes downtime, all due to planned maintenance windows). - **Customer Experience**: Average time for a peer‑to‑peer payment to appear in the recipient’s balance dropped from 12 minutes (batch) to 1.8 seconds. - **Operational Efficiency**: Incident mean‑time‑to‑detect (MTTD) reduced from 45 minutes to 3 minutes; mean‑time‑to‑resolve (MTTR) from 4 hours to 22 minutes. - **Cost**: Despite higher compute usage, the shift to managed services and right‑sized auto‑scaling lowered monthly infrastructure costs by 18% compared to the over‑provisioned legacy cluster. - **Compliance**: Passed PCI‑DSS v4.0 audit with zero critical findings; real‑time transaction logs satisfied regulator requirements. ### Key Metrics | Metric | Legacy System | New Platform | Improvement | |--------|---------------|--------------|-------------| | Peak TPS | 8 000 | 52 000 | +550% | 95th‑pct Latency (ms) | 820 | 162 | –80% | Monthly Downtime (min) | 310 | 42 | –86% | Fraud Detection Latency (s) | 4.5 | 0.3 | –93% | Settlement Cycle | 15 min | 1 s (batch) / real‑time (optional) | –96% | Operational Cost (USD/mo) | $24 500 | $20 100 | –18% ## Lessons Learned 1. **Invest in Observability Early** – Instrumentation added after the fact would have delayed the cutover; tracing from day one made performance tuning straightforward. 2. **Feature Flags Are Essential for Risky Migrations** – The strangler‑fig approach allowed Finova to validate correctness under live load without a big‑bang cutover. 3. **Managed Services Reduce Undifferentiated Heavy Lifting** – Using QLDB for event storage and DAX for read caching eliminated the need to operate and scale custom databases. 4. **Domain‑Driven Design Prevents Scope Creep** – Clear bounded contexts kept teams autonomous and reduced merge conflicts. 5. **Automated Compliance Checks Save Time** – Integrating policy‑as‑code (OPA) into the CI pipeline caught misconfigurations before they reached production. 6. **Chaos Engineering Builds Confidence** – Regularly injecting failures revealed hidden dependencies (e.g., a Lambda timeout on a downstream SQS queue) that were fixed before they impacted users. 7. **Cross‑Functional Collaboration Wins** – Having product, compliance, and SRE representatives in every sprint review ensured that non‑functional requirements were never an afterthought. ## Conclusion Finova’s partnership with Webskyne demonstrates how a fintech can leap from a batch‑limited legacy system to a cutting‑edge, real‑time payment platform that meets the demands of modern consumers and regulators. By embracing cloud‑native principles, domain‑driven microservices, and a rigorous, incremental rollout strategy, Finova now processes over 50 000 transactions per second with sub‑second latency, setting a new benchmark for speed, reliability, and compliance in the region’s digital payments landscape. *The case study was authored by the Webskyne editorial team. All metrics are based on live monitoring data from the first eight weeks post‑cutover.* ![Finova team celebrating launch](https://images.unsplash.com/photo-1522199710521-530e5644245e?auto=format&fit=crop&w=1350&q=80) *Figure 2: Finova and Webskyne teams celebrating the successful launch of the real‑time payment platform.*

How Webskyne Built a Scalable Real-Time Payment Platform for Finova: A FinTech Success Story

Related Posts

From Chaos to Clarity: How a FinTech Startup Scaled Its Payment Gateway and Cut Latency by 72%

Building a Scalable Telemedicine Platform: How CareBridge Delivered 2 Million Virtual Consultations in 18 Months

How Metro Credit Union Rebuilt Its Customer Data Platform and Doubled Cross-Sell Revenue in 18 Months