How FinNova Rebuilt Its Core Banking Platform and Cut Processing Time by 73%
FinNova, a mid-sized digital bank serving 2.4 million customers across Southeast Asia, faced a critical juncture: a monolithic banking core built on aging infrastructure was choking on transaction volumes, costing millions in downtime, and blocking the product velocity the market demanded. This case study details how FinNova's engineering team — partnered with Webskyne's architecture consultants — undertook a full platform rebuild over 18 months, migrating 1.8 years of transaction history, achieving 99.997% uptime in production, and delivering a 73% reduction in average transaction processing time. The project reshaped FinNova's technical foundation, unlocked a 4× faster feature release cadence, and positioned the bank to launch six new products in the 12 months following go-live.
Case StudyDigital TransformationMicroservicesBanking TechnologyFinTechDomain-Driven DesignCloud ArchitectureEvent SourcingPlatform Engineering
## Overview
FinNova is a digital-first bank headquartered in Singapore, serving over 2.4 million retail and SME customers across Singapore, Malaysia, Thailand, and Indonesia. Founded in 2015 as a neo-bank, FinNova grew rapidly by offering fee-free international transfers, competitive FX rates, and a mobile-first savings experience. By early 2023, however, that growth had outpaced the platform's architectural capacity.
The bank's core banking engine — a Java-based monolith running on bare-metal servers in a co-location facility — was showing clear signs of strain. Average transaction processing times had climbed from 280ms in 2021 to over 1,100ms by Q1 2023. System availability sat at 96.3% against an SLA target of 99.95%. Three minor outages in February and March 2023 alone cost an estimated $1.2 million in failed transactions, regulatory penalties, and customer churn.
Beyond availability, the monolith was a product velocity bottleneck. Any new feature — even a simple beneficiary lookup optimization — required a full regression cycle averaging 11 days, branching from a single shared codebase that 14 teams were committing to concurrently. The bank had a roadmap of 22 new product initiatives planned for 2023–2024; at existing release cadences, fewer than 6 would make it to production.
Webskyne's architecture team was engaged in April 2023 for a six-week discovery phase, followed by an 18-month implementation partnership.
## The Challenge
The core problem was architectural debt compounded by organizational friction. Specifically:
**Monolithic coupling.** The banking engine was a single 4.2-million-line Java codebase. Core ledger logic, transaction routing, fraud detection, notification dispatch, and API gateway concerns were all entangled at the class level. A bug in the SWIFT message parser could cascade into missed payment confirmations. Deployments required coordinated freezes across eight teams.
**Scaling inefficiency.** The monolith ran on six vertically scaled physical servers, each provisioned for peak load (2× actual average utilization). Horizontal scaling was technically possible but required custom load balancer reconfiguration and was never exercised in production. During the ASEAN payment surge around national holidays, the system routinely fell behind queue depth, causing batch jobs to slide into the next business day.
**Data siloing.** Customer accounts, transaction records, and compliance logs lived in a single PostgreSQL instance that had grown to 14TB without partitioning. Reporting queries — often run ad hoc by risk and compliance teams — consumed shared connection pool capacity, starving real-time transaction flows during business hours.
**Regulatory exposure.** FinNova operated underMAS (Singapore), BNM (Malaysia), and BOT (Thailand) regulatory frameworks simultaneously. Each jurisdiction had distinct reporting, audit logging, and data residency requirements that the monolith handled through a patchwork of post-hoc scripts and manual exports. A regulatory audit in late 2022 had flagged three deficiencies, including incomplete transaction log retention and inconsistent exchange rate sourcing.
**Team topology.** Fourteen development teams shared the monolith, averaging 8–12 engineers each. Merge conflicts were a daily occurrence. Code review queues stretched to 2–3 days. CI/CD pipelines took 94 minutes on a good day, longer during cross-team release windows. Deployment windows required executive sign-off and occurred biweekly,凌晨 2:00 AM SG time.
## Goals
The project had five measurable success criteria:
1. Reduce average transaction processing time from 1,100ms to under 300ms at the 95th percentile
2. Achieve 99.95% system availability over a 6-month production window
3. Enable single-team, self-contained deployments with a target of 2 deployments per team per day
4. Migrate all customer accounts and 1.8 years of transaction history without data loss or service interruption
5. Reduce the mean time to release new products from 11 days to under 2 days
A secondary goal was establishing a multi-jurisdiction compliance engine that could ingest regulatory changes as configuration rather than code.
## Approach
Webskyne's team proposed a **strangler fig migration** against a **domain-driven microservices decomposition**. The strategy had three phases:
### Phase 1 — Domain Mapping and Boundary Identification (Weeks 1–6)
Using event-storming workshops with FinNova's product, engineering, and compliance teams, Webskyne mapped the bank's core domain model. Key bounded contexts identified:
- **Accounts** — customer accounts, balances, hold management
- **Payments** — domestic and cross-border transfer initiation, routing, confirmation
- **Ledger** — double-entry bookkeeping, reconciliation, settlement
- **Compliance** — AML/CTF monitoring, regulatory reporting, audit logging
- **Customer** — KYC records, preferences, notifications
- **Instruments** — cards, virtual accounts, beneficiary management
Each context was classified as either a **migration candidate** (replaceable with a new service) or a **strangler** (proxied through a facade while the monolith continues serving traffic). Accounts and Payments were selected as the first two migration targets.
### Phase 2 — New Service Construction and Shadow Traffic (Weeks 7–32)
Two new services — Accounts Service and Payments Service — were built on Go (chosen for its strong concurrency model and operational simplicity), with PostgreSQL per service (accounts on a sharded cluster managed via Citus; payments on a single-writer replica set). Each service exposed a REST/gRPC API surface that was wire-compatible with the monolith's existing interfaces.
A **traffic shadowing** pipeline was established: live production transactions were duplicated and routed to both the monolith and the new service. Responses were compared at a 1% sampling rate, then 5%, then 25%. Divergences were automatically logged, triaged, and resolved before traffic was cut over.
### Phase 3 — Incremental Cutover and Legacy Sunset (Weeks 33–52)
Using a feature-flag-driven canary deployment pattern, traffic was shifted in 5% increments from monolith to new services. Each increment held for a minimum 72-hour stability window before the next increment was authorized. The monolith was fully retired when traffic reached 0% on the legacy path — approximately week 50.
## Implementation
### Accounts Service
The Accounts Service manages 2.4 million customer accounts, their balances, and hold states. Key design decisions:
- **Data model:** Account state is event-sourced, with PostgreSQL as the event store and a materialized snapshot rebuilt nightly for read-heavy queries. Each account mutation generates an immutable `AccountEvent` record (account_id, event_type, amount, timestamp, correlation_id).
- **Consistency:** Strong consistency within an account; eventual consistency across accounts in the same settlement batch. Transfers that span accounts use a Saga pattern with compensating transactions for failure recovery.
- **Scaling:** Citus sharding by account_id distributes load across 16 worker nodes. Each node handles ~150,000 accounts with a median read latency of 12ms (p99: 38ms).
- **Migration:** The monolith held account records in a denormalized flat table. A migration utility de-normalized the legacy records into event streams, replaying the historical sequence to reconstruct current state. 1.8 years of history (approximately 890 million account events) was migrated over a 14-day parallel-run window.
### Payments Service
The Payments Service handles all payment initiation and routing — domestic SEPA-equivalent transfers, SWIFT cross-border payments, and internal transfers.
- **Routing engine:** A rules-based routing engine (later extended with a ML-based fraud scoring layer) selects the optimal corridor for each payment. Domestic payments route to a local clearing house; cross-border uses a corridor selector that evaluates rate, fee, and ETA across 11 partner banks.
- **ISO 20022 compliance:** All payment messages were restructured to ISO 20022 standards (pacs.008, camt.053) to satisfy MAS regulatory reporting requirements natively.
- **Idempotency:** Every payment carries a client-provided idempotency key. Duplicate submissions within a 24-hour window are deduplicated at the service layer, eliminating the double-submission incidents that had plagued the legacy system.
- **Latency target:** P99 processing time (initiation to confirmation) was targeted at 280ms; achieved at 247ms post-launch.
### Infrastructure
All services were deployed on Kubernetes (EKS on AWS, in accordance with FinNova's data residency requirement for Singapore customer data). The deployment architecture used:
- **Service mesh:** Istio for mTLS between services, traffic shaping, and observability
- **Observability:** OpenTelemetry for distributed tracing; Prometheus + Grafana for metrics; Loki for centralized log aggregation
- **Deployment:** ArgoCD for GitOps-driven progressive delivery with automated rollback on error budget burn
### Compliance Engine
A dedicated Compliance Service was built as a configuration-driven rules engine. Regulatory requirements from MAS, BNM, and BOT were encoded as declarative rules (YAML-based DSL), compiled into runtime checks injected into the payment and account workflows via Istio sidecars. New regulatory requirements can be deployed as configuration changes — no code change required for most reporting rule updates.
### Observability and Release Engineering
A centralized developer portal built on Backstage aggregated service catalogs, runbooks, SLO dashboards, and deployment pipelines. Teams gained full self-service deployment capability for their bounded context. The CI/CD pipeline for a typical service shrank from 94 minutes to 18 minutes (parallelized test execution, incremental Docker layer caching).
## Results
After a 4-week hypercare period following the full cutover, the numbers were unambiguous:
| Metric | Before | After | Change |
|---|---|---|---|
| Avg. transaction processing time (p95) | 1,100ms | 247ms | **−77.5%** |
| System availability (6-month window) | 96.3% | 99.997% | +3.7 pp |
| Mean time to release (per team) | 11 days | 1.8 days | **−83.6%** |
| Monthly failed transactions (due to system error) | ~4,200 | ~140 | **−96.7%** |
| Cost per transaction (infra) | $0.0042 | $0.0011 | **��73.8%** |
| New products shipped (12 months post-go-live) | 2 (prior 12 mo.) | 8 | **+4×** |
In the first 12 months post-go-live (May 2024 – April 2025), FinNova launched eight new products, including a high-yield SGD savings vault, a multi-currency business account, and a micro-investment feature — all of which had been on the roadmap but blocked by the legacy release bottleneck.
One unplanned but welcomed outcome: the ML-based fraud scoring layer, built during Phase 2 but not in the original scope, achieved a 31% reduction in false-positive fraud alerts in its first quarter, directly improving the customer experience for an estimated 8,400 affected users per month.
## Key Lessons
**Strangler fig requires discipline, not just tooling.** The pattern is well-documented; the hard part is maintaining behavioral parity between the legacy and new system over months of parallel operation. FinNova's traffic shadowing pipeline — initially dismissed as overhead — proved essential. Three subtle semantic differences (not caught by unit tests) were surfaced only by production traffic comparison. Build shadowing into the plan from day one.
**Event sourcing pays dividends at migration time.** Migrating a denormalized, flat-state database to an event-sourced model is significant work upfront. It pays back during migration itself: a well-structured event log can be replayed to reconstruct state, enabling a full historical pass that a state-snapshot migration cannot offer. For financial systems, the auditability alone justifies the investment.
**Domain-driven design is a team sport before it is a technical exercise.** The most valuable output of the event-storming workshops was not the bounded context map — it was the shared vocabulary that 14 teams left with. Terms like "hold," "settlement," and "authorization" had been used inconsistently across teams, causing interface drift and missed requirements. The DDD workshops made that implicit. The technical boundaries followed naturally.
**Compliance-as-code is a competitive advantage, not a cost center.** FinNova's configuration-driven compliance engine reduced the lead time for regulatory changes from 6–8 weeks (code + test + deploy) to 3–5 days (configuration + review + deploy). In a regulatory environment that is tightening globally, this is a sustainable differentiator.
**Infra cost savings compound.** The shift from vertically scaled bare metal to horizontally scaled Kubernetes on spot instances reduced per-transaction infrastructure cost by 74%. At FinNova's transaction volume (approximately 18 million transactions per month), that compounds to annualized savings of roughly $680,000 — enough to fund the entire rebuild's consulting engagement from operational efficiency alone.
## Conclusion
The FinNova platform rebuild demonstrated that large-scale system migration does not require a big-bang rewrite or a multi-year feature freeze. A disciplined strangler fig approach, grounded in domain-driven design, with robust observability and incremental traffic validation, can deliver a full architectural transformation while maintaining business continuity.
The outcomes — 73% reduction in processing time, 99.997% availability, 4× improvement in release cadence, and a fully migrated customer base — were the product of a partnership model that treated the bank's engineering team as the primary implementers and Webskyne's consultants as architectural guides and technical sounding boards. That distribution of ownership is, arguably, the most transferable lesson from this engagement.