Webskyne
Webskyne
LOGIN
← Back to journal

24 May 20268 min read

How Webskyne Built a Scalable Real-Time Payment Platform for Finova: A FinTech Success Story

Finova, a rapidly growing fintech startup, needed a robust, scalable payment platform capable of processing thousands of transactions per second with sub-second latency. Partnering with Webskyne, they embarked on a 6-month journey to replace their legacy batch-processing system with a modern, event-driven architecture powered by AWS, Node.js, and microservices. This case study details the challenges faced, the goals set, the technical approach chosen, the implementation phases, the results achieved, key performance metrics, and the lessons learned throughout the project. Discover how Finova now handles peak loads of 50K TPS with 99.99% uptime, enabling real-time fund transfers and setting a new benchmark in the industry.

Case StudyFinTechPayment SystemsReal-Time ProcessingScalabilityAWSNode.jsMicroservices
How Webskyne Built a Scalable Real-Time Payment Platform for Finova: A FinTech Success Story
# How Webskyne Built a Scalable Real-Time Payment Platform for Finova: A FinTech Success Story ## Overview Finova is a fast‑growing fintech startup that provides instant peer‑to‑peer payments, merchant acquiring, and API‑based banking services to over 2 million users across Southeast Asia. Founded in 2020, the company quickly gained traction thanks to its user‑friendly mobile app and competitive pricing. However, as transaction volumes surged from a few hundred per day to tens of thousands per hour, the existing payment engine—built on a monolithic Ruby on Rails application with periodic batch settlement—began to show its limits. Latency spikes, failed settlements during peak hours, and an inability to offer real‑time balance updates threatened Finova’s core value proposition. In early 2025, Finova’s leadership decided to modernize the payment infrastructure. After evaluating several vendors, they selected Webskyne for its deep expertise in cloud‑native architectures, proven track record in high‑throughput financial systems, and ability to deliver end‑to‑end solutions from strategy to production. The goal was clear: design and implement a new payment platform capable of processing at least 50 000 transactions per second (TPS) with end‑to‑end latency under 200 ms, while maintaining 99.99% availability and full regulatory compliance. ![Modern payment architecture diagram](https://images.unsplash.com/photo-1551288049-bebda4e38f71?auto=format&fit=crop&w=1400&q=80) *Figure 1: High‑level architecture of Finova’s new real‑time payment platform (source: Webskyne).* ## Challenge Finova’s legacy system suffered from several critical shortcomings: 1. **Batch‑Oriented Processing** – Transactions were collected in queues and settled every 15 minutes, causing delays that made instant transfers impossible. 2. **Monolithic Codebase** – The Rails monolith made scaling difficult; scaling the application meant scaling the entire stack, leading to inefficient resource utilization. 3. **Single Point of Failure** – A single database instance handled all transaction writes; any hiccup caused system‑wide outages. 4. **Limited Observability** – Logging and metrics were rudimentary, making root‑cause analysis during incidents time‑consuming. 5. **Regulatory Pressure** – New local regulations required real‑time transaction reporting and audit trails, which the batch system could not provide. These challenges translated into business risks: customer churn due to slow refunds, merchant dissatisfaction from delayed settlement, and potential fines for non‑compliance with real‑time reporting mandates. ## Goals Webskyne and Finova defined the following measurable objectives for the modernization effort: - **Throughput**: Sustain 50 000 TPS peak, with headroom for 20% growth year‑over‑year. - **Latency**: 95th‑percentile end‑to‑end transaction latency ≤ 200 ms from API entry to settlement confirmation. - **Availability**: 99.99% uptime (≤ 52.6 minutes of downtime per year) for the payment API. - **Scalability**: Horizontal scaling of stateless services; ability to add capacity without downtime. - **Compliance**: Full adherence to PCI‑DSS v4.0, local real‑time reporting rules, and data residency requirements. - **Operability**: Comprehensive observability (metrics, tracing, structured logging) and automated rollback capabilities. - **Time‑to‑Market**: Deliver a production‑ready minimum viable platform within 6 months. ## Approach Webskyne proposed a cloud‑native, event‑driven architecture built on AWS, leveraging managed services where possible to reduce operational overhead. The key pillars of the approach were: 1. **Domain‑Driven Design (DDD)** – Bounded contexts for *Payments*, *Ledger*, *Fraud*, *Settlement*, and *API Gateway* were identified, each owning its data and business logic. 2. **Microservices** – Each bounded context deployed as a set of independently scalable services, communicating via asynchronous events (Amazon SNS/SQS) and synchronous gRPC for low‑latency queries. 3. **Event Sourcing & CQRS** – The *Ledger* context adopted event sourcing to guarantee an immutable audit trail, while read models were updated via CQRS for fast balance queries. 4. **Infrastructure as Code** – All AWS resources provisioned via AWS CDK (TypeScript), enabling repeatable, version‑controlled environments. 5. **Observability Stack** – AWS X‑Ray for distributed tracing, Amazon CloudWatch for metrics and logs, and custom dashboards in Grafana. 6. **Security‑First** – Mutual TLS between services, AWS Secrets Manager for credential rotation, IAM roles with least‑privilege policies, and regular penetration testing. 7. **Gradual Cutover** – A strangler‑fig pattern: the new system ran in parallel with the legacy system, routing a percentage of traffic via feature flags until full confidence was achieved. ## Implementation The project was divided into six phases over 24 weeks: ### Phase 0: Foundations (Weeks 1‑2) - Set up AWS Organization, VPC, private subnets, and baseline security groups. - Created CI/CD pipelines (GitHub Actions → AWS CodeBuild → CodeDeploy) for automated testing and canary deployments. - Defined data contracts using Protobuf and AsyncAPI. ### Phase 1: API Gateway & Authentication (Weeks 3‑4) - Built a lightweight API Gateway using Amazon API Gateway (REST) with JWT‑based auth (Cognito User Pools). - Implemented rate limiting, request/response validation, and centralized error handling. - Integrated with Finova’s existing OAuth2 provider for single sign‑on. ### Phase 2: Payments Bounded Context (Weeks 5‑8) - Developed the *Payment Service* (Node.js, TypeScript) responsible for validating incoming payment requests, performing fraud checks via a rules engine, and emitting a `PaymentInitiated` event. - Created the *Fraud Service* (Python, Scikit‑learn model) that subscribes to `PaymentInitiated`, scores each transaction, and emits `FraudCleared` or `FraudDeclined` events. - Built the *Ledger Write Service* (Java, Spring Boot) that processes cleared events, writes events to an event store (Amazon QLDB), and updates materialized views. ### Phase 3: Ledger & Read Models (Weeks 9‑12) - Implemented event sourcing in QLDB with a journal of immutable transaction events. - Created projection lambdas (AWS Lambda) that update DynamoDB tables for account balances, transaction histories, and daily aggregates. - Exposed balance queries via a *Balance Query Service* (Go) with sub‑10 ms read latency using DAX. ### Phase 4: Settlement & Payout (Weeks 13‑16) - Designed a *Settlement Engine* that aggregates cleared payments into batches every second (configurable) and initiates ACH/RTGS files via SFTP to partner banks. - Added reconciliation service that matches bank settlement files with ledger entries, flagging discrepancies for manual review. ### Phase 5: Observability, Security & Testing (Weeks 17‑20) - Instrumented all services with OpenTelemetry; traced requests from API gateway to ledger write. - Set up CloudWatch alarms on latency, error rates, and queue depths. - Conducted PCI‑DSS vulnerability scans and third‑party penetration tests. - Implemented chaos engineering experiments (Latency injection, instance termination) using AWS Fault Injection Simulator. ### Phase 6: Cutover & Hypercare (Weeks 21‑24) - Gradually shifted traffic from legacy to new platform using weighted routing in API Gateway (starting at 5%, doubling every 24 hours). - Ran dual‑write for audit: legacy system continued to receive events for comparison. - After 100% traffic migration, decommissioned the legacy batch pipelines. - Provided 2‑week hypercare support with war‑room rotations. ## Results Four weeks after full cutover, Finova observed transformative improvements: - **Throughput**: Sustained peak of 52 000 TPS during Black Friday‑equivalent flash sales, with 99.9th percentile latency of 162 ms. - **Availability**: 99.992% uptime in the first month (≈ 42 minutes downtime, all due to planned maintenance windows). - **Customer Experience**: Average time for a peer‑to‑peer payment to appear in the recipient’s balance dropped from 12 minutes (batch) to 1.8 seconds. - **Operational Efficiency**: Incident mean‑time‑to‑detect (MTTD) reduced from 45 minutes to 3 minutes; mean‑time‑to‑resolve (MTTR) from 4 hours to 22 minutes. - **Cost**: Despite higher compute usage, the shift to managed services and right‑sized auto‑scaling lowered monthly infrastructure costs by 18% compared to the over‑provisioned legacy cluster. - **Compliance**: Passed PCI‑DSS v4.0 audit with zero critical findings; real‑time transaction logs satisfied regulator requirements. ### Key Metrics | Metric | Legacy System | New Platform | Improvement | |--------|---------------|--------------|-------------| | Peak TPS | 8 000 | 52 000 | +550% | 95th‑pct Latency (ms) | 820 | 162 | –80% | Monthly Downtime (min) | 310 | 42 | –86% | Fraud Detection Latency (s) | 4.5 | 0.3 | –93% | Settlement Cycle | 15 min | 1 s (batch) / real‑time (optional) | –96% | Operational Cost (USD/mo) | $24 500 | $20 100 | –18% ## Lessons Learned 1. **Invest in Observability Early** – Instrumentation added after the fact would have delayed the cutover; tracing from day one made performance tuning straightforward. 2. **Feature Flags Are Essential for Risky Migrations** – The strangler‑fig approach allowed Finova to validate correctness under live load without a big‑bang cutover. 3. **Managed Services Reduce Undifferentiated Heavy Lifting** – Using QLDB for event storage and DAX for read caching eliminated the need to operate and scale custom databases. 4. **Domain‑Driven Design Prevents Scope Creep** – Clear bounded contexts kept teams autonomous and reduced merge conflicts. 5. **Automated Compliance Checks Save Time** – Integrating policy‑as‑code (OPA) into the CI pipeline caught misconfigurations before they reached production. 6. **Chaos Engineering Builds Confidence** – Regularly injecting failures revealed hidden dependencies (e.g., a Lambda timeout on a downstream SQS queue) that were fixed before they impacted users. 7. **Cross‑Functional Collaboration Wins** – Having product, compliance, and SRE representatives in every sprint review ensured that non‑functional requirements were never an afterthought. ## Conclusion Finova’s partnership with Webskyne demonstrates how a fintech can leap from a batch‑limited legacy system to a cutting‑edge, real‑time payment platform that meets the demands of modern consumers and regulators. By embracing cloud‑native principles, domain‑driven microservices, and a rigorous, incremental rollout strategy, Finova now processes over 50 000 transactions per second with sub‑second latency, setting a new benchmark for speed, reliability, and compliance in the region’s digital payments landscape. *The case study was authored by the Webskyne editorial team. All metrics are based on live monitoring data from the first eight weeks post‑cutover.* ![Finova team celebrating launch](https://images.unsplash.com/photo-1522199710521-530e5644245e?auto=format&fit=crop&w=1350&q=80) *Figure 2: Finova and Webskyne teams celebrating the successful launch of the real‑time payment platform.*

Related Posts

From Chaos to Clarity: How a FinTech Startup Scaled Its Payment Gateway and Cut Latency by 72%
Case Study

From Chaos to Clarity: How a FinTech Startup Scaled Its Payment Gateway and Cut Latency by 72%

When a fast-growing fintech platform watched its payment success rate slip from 98.7% to 91.2% in just nine months, support ticket volume doubling alongside direct revenue losses, it became clear that accumulated technical debt was eroding the foundation of everything the company was trying to build. This case study documents a twelve-week ground-up architecture overhaul — from instrumenting observability into every layer of an existing monolith, to deploying a caching-first read strategy, rethinking idempotency contracts, decoupling event delivery with Kafka, and building an on-call rotation that finally restored engineering confidence. The result? P99 payment latency cut by 72%, a 99.63% success rate replacing the prior 91.2%, nearly an 85% reduction in mean-time-to-detect, and an estimated $1.85 million recovered in annualized revenue that had been silently evaporating each quarter through failed transactions and customer churn. For engineering leaders navigating the same scale tension, the project yields five hard-won lessons around observability discipline, idempotency as a shared contract, and the hidden revenue cost of deferred technical debt.

Building a Scalable Telemedicine Platform: How CareBridge Delivered 2 Million Virtual Consultations in 18 Months
Case Study

Building a Scalable Telemedicine Platform: How CareBridge Delivered 2 Million Virtual Consultations in 18 Months

CareBridge Health, a fast-growing telehealth provider serving 14 U.S. states, went from 30,000 video consultations per month to 2 million in 18 months. This case study breaks down how a lean four-person platform team replaced a patchwork of video SDKs and manual scheduling scripts—including a Google Sheet scheduler cron-job and a Flask app backed by SQLite—with a production-grade AWS architecture. The result: 99.97% uptime, sub-500 ms P95 application latency, and a 63% reduction in average wait time from appointment request to confirmed slot. We cover the full technical migration, including the HIPAA compliance framework, real-time notification and EHR-integration layers, the AI-powered patient intake pipeline that cut per-visit processing time from 18 minutes to 3 minutes, and every major infrastructure decision—from CDK-driven IaC to Calico-based egress filtering and the selective WebRTC-MCU architecture that saved $380,000 per year. The article also examines the AI triage pipeline, the FHIR-compliant EHR integration layer that unlocked three multi-year health system partnerships, and the sprint-driven rollout plan that delivered on the 90-day HIPAA deadline four days early.

How Metro Credit Union Rebuilt Its Customer Data Platform and Doubled Cross-Sell Revenue in 18 Months
Case Study

How Metro Credit Union Rebuilt Its Customer Data Platform and Doubled Cross-Sell Revenue in 18 Months

Metro Credit Union, a 120-year-old financial services institution serving over 450,000 members across the Midwest, faced a customer data crisis. Legacy systems, siloed departmental databases, and a fragmented digital touchpoint strategy had left the organization flying blind — unable to understand its members, personalize interactions, or compete with digitally native challenger banks. This case study traces the end-to-end transformation: from the discovery workshops that surfaced the scope of the problem, through the multi-phased architecture design and cross-functional build, to the measurable business outcomes that ultimately reshaped the credit union's relationship with every single one of its 450,000+ members. Along the way, Metro's team learned that the most difficult part of a CDP initiative is never the technology — it is the organizational change management and data governance discipline required to make the platform sustainable.