How We Scaled a Fintech Startup’s Mobile Banking Platform from 10K to 500K Users in 8 Months
When a promising fintech startup approached us, they had a working mobile banking MVP but were hitting a wall. After 12 months of organic growth, user counts had stalled at just over 10,000 active accounts. Their infrastructure was buckling under peak loads, API latency had crept past 2,200 milliseconds, and customer support tickets related to app crashes had tripled in a single quarter. The leadership team had made the difficult decision to suspend new onboarding for 45 days while they rebuilt the core architecture. They came to us with a clear mandate: rebuild the mobile banking platform so it could handle 500,000 active users within eight months, maintain sub-200-millisecond API response times at peak load, reduce infrastructure costs by 30 percent, and pass a penetration test and SOC 2 Type I audit before launch. This case study documents our technical approach, the architectural decisions we made, and the results we achieved together.
Case Studyfintechmicroservicesscalabilityreact-nativepostgresqlkafkamobile-optimizationsoc2
## Overview
FinEdge, a Series A fintech startup based out of Bangalore, had built a promising mobile banking MVP using a monolith Node.js backend, a PostgreSQL database on a single RDS instance, and a React Native front-end shared across iOS and Android. Twelve months after launch, the platform had 10,000 active users and was generating meaningful revenue through transaction fees. However, growth had slowed to a crawl, and the engineering team had identified three critical failure modes: API response times exceeding two seconds during market open, database connections maxing out at 150 concurrent connections, and a 4.2 percent crash rate on the mobile app during sign-up and payments flows. FinEdge’s CTO reached out to us after a referral from their investor board, wanting a partner who had executed infrastructure overhauls at scale under aggressive timelines.
Our engagement covered eight months, during which we redesigned the core backend, introduced event-driven infrastructure, migrated the data tier, and rebuilt critical mobile app modules. The result was a platform that now serves over 500,000 active users, handles 12,000 transactions per minute during peak hours, and maintains a 99.97 percent uptime SLA.
## Challenge
The technical challenges were interlocked and compounded by business constraints. First, the monolith had grown organically for 18 months. With twelve developers collaborating on a single codebase, deployment cycles had stretched to weekly release windows, and rollback times averaged forty-two minutes when something broke. Second, the database schema had drifted from a clean third-normal-form design into a patchwork of legacy tables, shared indexes, and stored procedures that no single engineer fully understood. Third, the mobile app’s sign-up flow relied on synchronous SMS verification through a third-party gateway that had a 1.8 percent failure rate during peak traffic, leaving thousands of users staring at an infinite spinner. Finally, FinEdge needed to comply with Reserve Bank of India data localization requirements, which meant rethinking their entire data residency strategy before they could onboard enterprise clients who demanded strict audit trails.
These were not simply engineering challenges; they were existential threats to the business. FinEdge had 90 days of runway left in their Series B budget justification. If the platform could not scale, the next fundraise would fail, and the company would face either acquisition at a massive discount or wind-down.
## Goals
We established six measurable goals to align the entire team, from engineering to the board of directors.
1. **Scale to 500,000 active users** within eight months without a single platform-wide outage.
2. **Reduce API latency** to under 200 milliseconds at the 95th percentile during peak hours.
3. **Cut infrastructure costs by 30 percent** through right-sizing instances, introducing caching layers, and optimizing database queries.
4. **Improve mobile app stability** by achieving a crash-free session rate above 99.5 percent.
5. **Meet RBI data localization requirements** and pass a primary SOC 2 Type I audit before launch.
6. **Reduce deployment risk** by achieving a mean time to recovery of under ten minutes and supporting continuous deployment for core services.
Each goal had a set of guardrails: no downtime during business hours in India, zero data loss during migrations, and no feature freeze longer than two weeks.
## Approach
Rather than treating this as a rewrite or a lift-and-shift project, we adopted an incremental, strangler-fig pattern. We introduced a new service layer that would gradually take over request handling from the monolith, giving the engineering team time to migrate routes without compressing all risk into a single launch day. This approach protected revenue while we worked.
Our methodology had three phases: assessment and design, build and migrate, and stabilize and optimize. In the first phase, we ran a two-week technical audit with root-cause analysis, profiled every endpoint, mapped every database table dependency, and instrumented full-request tracing using OpenTelemetry. In the second phase, we built three new services: an identity and Kyc service, a ledger service, and a notifications service. In the third phase, we optimized caching, tuned autoscaling policies, and ran chaos engineering scenarios to validate failover behavior.
One key decision was to adopt PostgreSQL read replicas for reporting workloads rather than building a complex sharding layer, which would have introduced unnecessary operational overhead for a team of fourteen engineers. Another was to move the event bus to Apache Kafka, which gave us exactly-once processing semantics needed for financial transactions.
## Implementation
### Service Decomposition
We broke the monolith into three independent services, each with its own repository, deployment pipeline, and database schema. The identity service handled registration, JWT generation, and session management. The ledger service managed double-entry bookkeeping for every debit and credit, ensuring idempotency through transactionIdempotencyKeys. The notifications service abstracted SMS and email delivery with exponential backoff and dead-letter queues for failed messages.
Each service communicated via Kafka topics. When a user initiated a transfer, the API gateway emitted a `transfer.initiated` event. The ledger service consumed the event and wrote two rows to its own database, then published a `transfer.completed` event. The notifications service subscribed to the completed event and sent a push notification. This decoupling meant the front-end did not care whether the notification gateway was experiencing issues; the transfer still completed successfully.
### Data Tier Migration
The original RDS instance had 2 terabytes of data and no clear backup strategy beyond daily snapshots. We introduced pglogical for logical replication to migrate data without downtime, then safely promoted read replicas to write instances after the schema was separated by service boundary. We also introduced Redis Cluster for frequently accessed data: user session tokens, KYC verification status, and exchange rate lookups. This dropped database query load by forty percent within the first week.
We rewrote the slowest twenty queries, which accounted for sixty percent of database CPU, by adding composite indexes and eliminating N+1 query patterns. We also introduced database connection pooling with PgBouncer, raising the maximum concurrent connections from 150 to 2,000 without increasing instance size.
### Mobile App Refactor
On the client side, we extracted the sign-up, KYC upload, and payments flows into separate modules within React Native so they could be updated independently. We replaced the synchronous SMS gateway with a two-step verification flow. The user now entered their phone number, received an OTP, and continued into the app immediately. If the OTP verification failed, a background job retried up to three times before alerting support. This reduced spinner time by 80 percent and cut crash-free session rate from 95.8 percent to 99.6 percent.
## Results
At the end of eight months, FinEdge had 510,000 active users and was processing an average of 8,500 transactions per minute, with peaks of 12,000 transactions per minute during market open. The platform remained stable throughout Diwali weekend, the highest-traffic period of the year, when user engagement usually triples.
Infrastructure costs dropped from 82,000 Indian rupees per month to 58,000 rupees per month, a 29 percent reduction that increased to 35 percent after another quarter of right-sizing based on CloudWatch and Prometheus metrics. The engineering team moved from weekly release windows to on-demand deployments through ArgoCD, with a mean time to recovery of seven minutes.
The SOC 2 Type I audit passed on the first attempt, and FinEdge closed its Series B at a 3.8x valuation multiple two months later. Two large enterprise clients signed contracts within thirty days of the audit completion, citing compliance as a key purchasing criterion.
## Metrics
The following table summarizes the metrics that mattered most to FinEdge’s leadership team:
| Metric | Before | After | Change |
|--------|--------|-------|--------|
| Active users | 10,500 | 510,000 | +4,757% |
| API latency (p95) | 2,200 ms | 142 ms | -93.5% |
| Crash-free sessions | 95.8% | 99.6% | +3.8 pp |
| Infrastructure cost | ₹82,000/mo | ₹58,000/mo | -29% |
| Mean time to recovery | 42 min | 7 min | -83% |
| Deployment frequency | Weekly | On-demand | +400% |
| Uptime SLA | 99.4% | 99.97% | +0.57 pp |
| PCI-DSS compliance | In progress | Passed | Achieved |
| SOC 2 Type I status | Not started | Passed | Achieved |
Not captured in the table but equally important was engineer satisfaction. We conducted quarterly surveys before and after the engagement. On the question "I feel confident that I can ship a change without breaking the platform," the score rose from 2.1 out of 5 to 4.3 out of 5.
## Lessons Learned
This project taught us lessons that we now apply to every large-scale overhaul.
**1. Strangler Fig Beats Big Bang.** Incremental migration protected revenue and reduced risk. We never put FinEdge in a position where a single launch day determined success or failure. Each service could be beta-tested with a subset of traffic before taking over full responsibility.
**2. Observability is Not Optional.** We traced every request end-to-end from day one. When latency spiked during monsoon season, we could isolate the root cause to a single unindexed query within twenty minutes rather than the two days it would have taken without instrumentation.
**3. Identity and Payments Deserve Separate Services.** They have fundamentally different scaling profiles, security requirements, and failure modes. Lumping them together in the same process or database guarantees contention.
**4. Event-Driven Architecture Requires Discipline.** Kafka gave us resilience and scalability, but it also introduced eventual consistency as a first-class concern. Front-end developers had to rethink caching and optimistic UI updates. We recommend practicing event sourcing on a small, low-risk domain before applying it to financial transactions.
**5. Compliance Must Be Built In, Not Bolted On.** We engaged our security consultant in the first week, not the last. This meant encryption key rotation, audit logs, and role-based access control were baked into every service from day one. Trying to retrofit compliance after the fact costs three to five times more.
**6. Mobile Crashes Are Often Server-Side Problems.** Fixing the database query and reducing backend payload size eliminated the majority of mobile crashes. Clients often assume the bug is in the app when the root cause is a timeout, malformed JSON, or a race condition in the API layer.
## Final Thoughts
Eight months after we started, FinEdge is no longer the team that suspended onboarding. They are on track to cross one million users before the end of the calendar year, and they recently launched an enterprise API product that generates recurring revenue from banking-as-a-service clients. The infrastructure we built handles load that would have taken down the original platform in under thirty seconds. More importantly, the engineering team now operates with confidence. They know their systems, they understand their data flows, and they have the tooling to keep scaling without repeating the mistakes of the past.
For any startup or engineering leader facing a similar scaling crisis, our advice is simple: start with data, make incremental changes, and invest heavily in observability and automation. The gap between where you are and where you need to be is not as large as it feels if you break it into small, testable pieces.
---
*We partnered with FinEdge from March through November 2025. The platform currently serves over 500,000 active users and processes more than 250 million transactions annually.*