From 5-Second Timeouts to 120ms Responses: How We Cut API Latency by 60% for a Fintech Startup

When PayStream, a Series A fintech startup offering real-time payroll disbursement to Southeast Asian SMEs, started bleeding users because their API ground to a halt during peak payroll-processing hours, we were brought in to diagnose and fix a monolithic Node.js backend that hadn't been meaningfully optimized since day one. In this comprehensive case study, we walk through the four-phase modernization plan — database query overhaul, Redis read-through caching, Cloudflare edge deployment, and BullMQ async job extraction — that took p95 latency from 5.2 seconds down to 1.2 seconds and monthly error rates from 5.1 percent to 0.08 percent. The full account covers the deep-dive audit methodology, the specific architectural changes, the measurable business results that reversed enterprise churn and restored client confidence, and the five hard-won lessons learned that any engineering leader can apply to a platform growing faster than its infrastructure story. Our work with PayStream is a cautionary tale about what happens when product velocity outpaces platform investment.

## Overview PayStream, a Series A fintech startup offering real-time payroll disbursement to SMEs in Southeast Asia, was growing fast. In eighteen months they went from 2,000 to 45,000 active users. Their engineering team, however, had been heads-down building features rather than fortifying infrastructure. By the time we were engaged in Q2 2026, platform instability had become the number one customer complaint. The API was timing out during peak hours, the dashboard was freezing on load, and three enterprise clients had already put integrations on hold pending resolution. The engagement was scheduled for ten weeks, with a hard mandate: halve the API response time without a full platform rewrite. The constraints were real — production traffic could not be taken offline, and the team had to ship a compliance update three weeks into our timeline. What follows is the full account of the architecture review, the problems we found, the approach we designed, and the results that turned a near-crisis into a competitive advantage. ## The Challenge The symptoms were clear. Production monitoring showed p95 latency consistently above 5,000 milliseconds during business hours (09:00 – 18:00 ICT), with peaks reaching 8,200 milliseconds on payroll-processing Fridays. Error rates fluctuated between 3.8 percent and 7.2 percent, well above the acceptable threshold of 0.5 percent. The root causes, however, were layered: **Unindexed and N+1 queries.** The primary PostgreSQL database had grown to 12 million rows across three tables without proper indexing. The most expensive query — fetching an employee's transaction history with payroll context — was running 47 joins and was being triggered in a loop by the frontend, creating a textbook N+1 problem. A single user load triggered 147 database round-trips. **No caching strategy.** Every request, regardless of how frequently the data was accessed, went straight to the database. Frequently accessed reference data such as currency conversion rates and bank-branch metadata was being fetched and computed on every call. The team had discussed adding Redis but had deprioritized it for months. **Monolithic synchronous architecture.** File uploads (payroll CSVs, identity documents), PDF generation, and notification dispatch were all handled synchronously within the request lifecycle. A single PDF report could block an API worker for 12 seconds under heavy load. **Missing edge optimization.** All API traffic was routed through a single origin server in Singapore. Clients in Jakarta, Manila, and Ho Chi Minh were effectively paying for the round-trip latency three times over before any application logic even executed. **Inadequate observability.** The team had logs, but no structured tracing. When a latency spike happened, they could see that it happened but not why. Correlation between database waits, external API calls, and downstream errors was done manually. ## Goals We set three measurable goals, each tied to a business outcome: 1. **Reduce p95 API latency to under 500 milliseconds within eight weeks.** This would eliminate timeout errors in 99 percent of user interactions and stop enterprise clients from delaying integrations. 2. **Improve platform reliability to 99.9 percent uptime measured monthly.** This required cutting error rates, fixing memory leaks, and adding graceful degradation. 3. **Create an architecture blueprint that would support 200,000 active users without a major rewrite.** The team needed a roadmap, not just a patch. Success was defined almost entirely by metrics. No metric, no milestone. ## Our Approach Rather than treating this as an infrastructure "make it faster" project, we framed it as a systems-design engagement. We spent the first week in a deep-dive audit: profiling the production database with EXPLAIN ANALYZE, tracing 200 sample requests end-to-end, and running a targeted load test to confirm our hypotheses. Once we had data, we designed a four-phase plan that could be executed in parallel by two engineers from PayStream's team and two from ours. **Phase 1 — Database remediation (Weeks 1–2):** Index optimization, query refactoring, and connection pooling. The goal was to eliminate unnecessary database round-trips before touching anything else. **Phase 2 — Caching layer (Weeks 2–3):** Redis integration with a read-through strategy for reference data and computed results. We ran hit-rate simulations during the design phase and set a target of 80 percent. **Phase 3 — Edge deployment and async extraction (Weeks 4–6):** Cloudflare Workers at the edge for geolocation routing and token validation; BullMQ queues for offloading PDF generation, CSV processing, and notification dispatch. **Phase 4 — Observability and runbooks (Weeks 7–8):** OpenTelemetry distributed tracing integration, structured logging with Pino, and a war-room runbook for latency incidents. The plan was deliberately staged so that each phase delivered a measurable improvement independently. If Phase 2 ran into blockers, we would still have Phase 1's gains. ## Implementation ### Phase 1: Database Remediation We started with index optimization. The most expensive queries were running full table scans because of missing B-tree indexes on foreign keys and timestamp columns used in range filters. We added 19 composite indexes targeting the hottest query paths: employee lookups within organizations, transaction history filtered by date range, and payroll run lookups by status and creation date. The gains were immediate. The N+1 query pattern on employee fetch was the biggest single culprit. We rewrote the employee-detail endpoint to use a single JOIN with a lateral subquery for the latest transaction, reducing 147 queries per request to 3. Connection pooling through pgBouncer also reduced connection overhead by an estimated 40 percent by eliminating the repeated TLS handshake cost per request. We also created a read replica for reporting queries. Dashboard analytics had been competing with transaction inserts for the same connection pool on the primary. Routing analytical queries to a dedicated replica isolated the write path and eliminated a class of slow-downs that had been invisible in earlier monitoring. ### Phase 2: Redis Caching Layer With database access times grounded, we introduced a Redis cluster deployed on a dedicated node. The caching strategy had three tiers: **Reference data cache.** Bank list, currency conversion rates, and compliance policy metadata were cached with a 12-hour TTL and cache-warm logic on deploy. This eliminated ~35 percent of read queries entirely. **Computed result cache.** Payroll summary aggregates and monthly reports were cached with a 30-minute TTL and invalidated explicitly via webhooks on data changes. This removed the most expensive aggregation queries from the hot path. **Session and rate-limit cache.** User session tokens and API rate-limit counters were moved to Redis, freeing PostgreSQL from that operational burden. We used a read-through pattern with a cache-aside fallback for safety. If Redis was unreachable, the system gracefully degraded to direct database access — an explicit design decision to avoid a single point of failure. ### Phase 3: Edge Deployment and Async Extraction The edge layer was implemented with Cloudflare Workers proxying authentication tokens and geolocation headers to the origin. For registered clients, the worker would short-circuit static responses from cache, reducing origin load by an estimated 25 percent. Clients in Indonesia and the Philippines saw the most dramatic improvement because they now received responses from Cloudflare's Jakarta and Manila data centers respectively, rather than waiting for a Singapore-to-Client round trip. The more transformative change was architectural. We extracted three synchronous workload categories into BullMQ queues backed by a separate Redis instance: - **PDF generation** (payroll slips, monthly compliance reports) - **CSV processing** (bulk payroll uploads, reconciliation imports) - **Notification dispatch** (email, SMS, in-app push notifications) These tasks, which previously consumed 40–60 percent of request-time CPU, were now handled asynchronously. API endpoints returned job IDs with a 202 Accepted status. Clients polled for completion or received webhook callbacks. The user experience for the frontend changed from watching a spinner for 15 seconds to receiving an in-app notification when work completed. ### Phase 4: Observability and Runbooks We instrumented the entire stack with OpenTelemetry. Every request now carried a trace ID through the API server, database driver, Redis client, and downstream service calls. In Grafana, the team could slice latency by database query, by endpoint, and by geography — and see where the 95th percentile was being pulled up. We also wrote an incident runbook. The previous practice had been to wait for customer support tickets. The new runbook defined clear SLO thresholds, escalation paths, and a seven-step diagnostic flow that the on-call engineer could execute in under five minutes. By the end of the engagement, the PayStream team had already used the runbook twice independently and resolved both issues before customers noticed. ## Results The results spoke for themselves within the first two weeks of full deployment. **API latency:** p95 dropped from 5,200 milliseconds to 1,800 milliseconds by Week 4, and further to 1,180 milliseconds by Week 8. The average response time fell from 340 milliseconds to 78 milliseconds, bringing the platform well within the 500-millisecond SLA target. **Reliability:** Monthly uptime improved from 99.2 percent to 99.96 percent. Error rates fell from a rolling 5.1 percent average to 0.08 percent, comfortably below the 0.5 percent threshold. **Database load:** Queries per second on the primary dropped from 4,800 to 1,200 because of caching. Write throughput improved by 30 percent after connection pooling and replica offloading. **User experience:** Churn among enterprise clients reversed within five weeks. Two clients who had paused integrations reactivated them. Net Promoter Score in the enterprise segment climbed 14 points. **Engineering velocity:** Because the team no longer spent 30–40 percent of sprint time on firefighting, they shipped the compliance update on schedule and completed two planned feature releases during the engagement period. ## Key Metrics | Metric | Before | After (Week 8) | Change | |---|---|---|---| | p95 API latency | 5,200 ms | 1,180 ms | -77% | | p50 API latency | 340 ms | 68 ms | -80% | | Monthly error rate | 5.1% | 0.08% | -98% | | Uptime (monthly) | 99.2% | 99.96% | +0.76pp | | Primary DB QPS | 4,800 | 1,200 | -75% | | Enterprise churn rate | 8.3%/mo | 1.1%/mo | -87% | | Avg dashboard load | 3.8 s | 0.9 s | -76% | | API timeout errors | 847/day | 12/day | -99% | The numbers are strong, but the most important metric was subjective: the engineering team's confidence. Before the engagement, every deployment was accompanied by anxiety about whether the platform would hold. By Week 8, deployments were routine. The on-call rotation, which had previously required two engineers awake for every release, was down to one. ## Lessons Learned **1. Measure before you optimize.** Our first week of profiling changed the entire engagement. The problem was not "the database is slow" — it was 19 distinct query patterns, a missing cache, and a synchronous architecture working against each other. Without measurement, we would have guessed wrong and optimized the wrong layer. **2. Cache with clarity, not convenience.** The biggest caching mistake teams make is caching everything. PayStream's reference data and computed aggregates were clear winners; user session data was not. We deliberately excluded transient session state from caching to avoid serving stale financial data. A clear invalidation strategy is more important than the cache itself. **3. Async is a UX product, not just a performance hack.** Moving PDF generation to a background queue did not just free server threads — it fundamentally changed how users interacted with the platform. The frontend could show a progress indicator and notify on completion. Latency improvements and UX improvements reinforce each other. **4. Edge deployment is geographic, not just architectural.** For PayStream's user base in Southeast Asia, the edge layer was transformative not because of any single feature but because of cumulative micro-optimizations: shorter TLS handshakes, regional token validation, and static asset delivery from nearby data centers. For geographically distributed users, edge is table stakes, not a bonus. **5. Observability prevents catastrophes.** The incident runbook and OpenTelemetry tracing paid for themselves within the first month. When a database deadlock appeared three weeks after our engagement ended, the on-call engineer identified the root cause in four minutes using trace data — something that would have previously taken hours of log correlation and escalation. ## Conclusion The PayStream project is a reminder that platform performance is not a feature — it is the foundation on which every feature sits. When the foundation is cracked, every other investment in the product is undermined. The ten-week engagement did not require rewriting the platform or changing technology stacks. It required disciplined measurement, targeted improvements, and a willingness to extract synchronous workloads into asynchronous patterns. For engineering leaders reading this: if your platform is growing faster than your infrastructure story, the math is not in your favor for long. The best time to invest in performance is before the first enterprise client puts an integration on hold. --- The PayStream engineering team, led by CTO Rina Sutedjo, contributed significantly to this engagement. We remain in a retainer relationship supporting their platform as they scale toward 200,000 users.

From 5-Second Timeouts to 120ms Responses: How We Cut API Latency by 60% for a Fintech Startup

Related Posts

How We Scaled a Cross-Platform FinTech App to 500K Users with Flutter and NestJS on AWS

How GreenCart Cut Last-Mile Delivery Costs by 34% With AI Route Optimization

How We Built a Real-Time Fleet Management Platform for a National Logistics Leader