From 8-Second Load Times to Sub-400ms: How FinStack Rebuilt Its Real-Time Trading Dashboard at Scale
When FinStack's real-time trading dashboard began buckling under 50,000 concurrent users — with latency spikes pushing page loads past 8 seconds during peak trading windows — the engineering team faced a choice: throw more servers at the problem, or re-architect from the ground up. This case study traces their 18-week journey to a 96% latency reduction, a 40% drop in infrastructure costs, and zero-downtime deployments — and the architectural decisions that made it possible.
Case StudyFinTechReal-time SystemsArchitectureKafkaRustPerformance EngineeringFrontendDevOps
## Overview
FinStack is a B2B fintech platform serving over 200 institutional trading firms across North America and Europe. Its real-time market data dashboard processes live feeds from 12 exchanges, ingests upward of 2.5 million price ticks per second during peak market conditions, and delivers a personalized viewport to each of the 50,000+ concurrent traders connected at any given moment.
By early 2025, the platform had hit a wall. Pages that should render in under 500 milliseconds were regularly taking 7 to 9 seconds during the first hour of the NYSE open. Traders — whose job literally depends on speed — were filing support tickets by the hundreds. Three of FinStack's largest clients had issued formal performance SLAs they were preparing to walk away from. The engineering executive team made a hard call: the dashboard's architecture needed a complete overhaul, and it needed to happen in one quarter or less.
## Challenge
The performance crisis was not a single problem — it was a deep, compounding stack of architectural debt that had accumulated over four years of rapid feature development. The live market price stream was being pulled into a monolithic Node.js backend via WebSocket connections, transformed and enriched on the main event loop, then squirted through a REST API layer to a React frontend that was doing all of its state hydration client-side on every tick. A typical trading workspace subscribed to between 40 and 80 live instruments simultaneously, each pushing updates at varying frequencies. An average page load refreshed during peak activity would trigger a cascade of 3,200 individual REST round-trips.
The infrastructure bill was equally painful. Peak traffic drove the team to maintain a 320-instance Kubernetes cluster, with an average CPU utilization of just 18% — meaning 82% of paid compute was being wasted on idle capacity. Cost per user had risen 68% year over year, with no corresponding revenue growth to justify it. Engineering was spending 40+ hours per week on incident triage. The on-call rotation was burning out at a rate of two engineers per quarter.
## Goals
With executive buy-in secured, the engineering leadership team established three non-negotiable goals for the rebuild:
**1. Latency:** Reduce the 95th-percentile page load to under 400 milliseconds during peak trading windows (NYSE open, London open, and US economic data releases). This represented a 95% improvement over the worst-case measurements being recorded at the time of planning.
**2. Cost:** Reduce cloud infrastructure spend by at least 30% while supporting 100,000 concurrent users — double the current peak — without degrading performance.
**3. Engineering velocity:** Achieve zero-downtime deployments and reduce mean time to recovery (MTTR) from 47 minutes to under 5 minutes.
The timeline was aggressive: 18 weeks from architecture kickoff to full production rollout, with a soft-launch to 10% of users at the 12-week mark to validate early performance gains.
## Approach
The CTO assembled a cross-functional strike team of 8 engineers — 4 backend, 2 frontend, and 2 DevOps/reliability — and brought in a performance engineering consultant for a two-week architecture deep-dive. The resulting blueprint was a significant architectural departure from the existing stack.
The foundational principle was to move as much of the real-time computation as possible away from the client and the monolithic backend server into purpose-built edge and streaming infrastructure. Rather than a thin REST API layer fronting a monolithic backend, the new architecture introduced three distinct tiers: a WebSocket gateway operating at the edge, an in-memory streaming engine as the central nervous system, and a slim backend API for authenticated REST calls only.
On the frontend, the team made the call to migrate from React with Context-based global state to Redux Toolkit with RTK Query, and to shift the entire data ingestion strategy from per-tick REST polling to WebSocket push with selective subscription. The frontend would no longer maintain a full in-memory instrument cache — instead, each component would hold only the slice of state it directly rendered, drastically reducing both memory footprint and reconciliation overhead.
## Implementation
### Phase 1: Streaming Engine (Weeks 1–6)
The single most impactful change was the introduction of Apache Kafka as the central market data bus. Rather than pushing every instrument update through the application server for enrichment and fan-out, exchange connectors — written in Rust for its predictable memory characteristics and low-latency threading — now published every tick to Kafka topics partitioned by instrument symbol.
The enrichment layer — responsible for calculating derived fields like VWAP, moving averages, and spread indicators — was implemented as a stream processing job using Apache Flink. This was a decisive break from the old pattern where enrichment happened synchronously on the event loop. In the new model, enrichment was a stateless, horizontally scalable pipeline with sub-millisecond processing latency measured at p99.
The WebSocket gateway, built using Elixir and the Phoenix Framework, connected to Kafka consumer groups, maintained the per-user subscription registry, and pushed only the enriched deltas each user had explicitly subscribed to. This meant a user watching 80 instruments was receiving approximately 40 messages per second, not the 3,200 individual REST calls the old system generated per page refresh.
### Phase 2: Frontend Rebuild (Weeks 4–10)
The frontend work overlapped with the streaming engine build, and represented perhaps the most visible transformation for end users. The team migrated component by component from a monolithic client-side store to a Redux Toolkit architecture with strict slice scoping. Each trading workspace component — the price ticker, the depth chart, the positions panel, the order entry form — held only its own subscribed instrument data in state.
RTK Query replaced all of the ad-hoc REST fetch logic with a declarative subscription model. Rather than polling the backend every 2 seconds for every open position, the frontend subscribed to the positions endpoint once and received push updates whenever state changed. This reduced outbound request volume by 94% during peak conditions.
Perhaps more impactful than any code change was the introduction of skeleton loading states and optimistic UI updates. Under the old system, the frontend received every tick as a partial data frame, requiring the engine run periodic full reconciliations that caused visible UI jank. In the new system, the backend sent complete, self-contained update payloads — eliminating the need for reconciliation entirely.
### Phase 3: Infrastructure and Observability (Weeks 8–14)
The infrastructure team took the opportunity to overhaul not just the deployment targets but the entire observability pipeline. The new cluster runs on AWS EKS with Karpenter for intelligent node autoscaling, replacing the rigid, manually managed instance groups of the previous environment. Karpenter's just-in-time provisioning logic means cluster nodes spin up in approximately 90 seconds when utilization crosses threshold, and spin down within 30 seconds of a traffic trough — a behavior that alone accounts for a significant portion of the 40% cost reduction.
The observability stack was rebuilt around OpenTelemetry with a distributed tracing pipeline shipping to Grafana Tempo. The team established SLOs for dashboard load latency (95th percentile), API error rate (less than 0.01%), and Kafka consumer lag (less than 50 milliseconds). PagerDuty was configured only to alert on SLO budget burn thresholds rather than on individual server CPU spikes, which drastically reduced alert noise.
### Phase 4: Progressive Rollout and Load Testing (Weeks 12–18)
The team ran load tests using k6 with a traffic profile that simulated concurrent NYSE open conditions before each rollout milestone. At a simulated 75,000 concurrent users, the new architecture held p99 latency at 280 milliseconds — a 97% improvement from the 8.4-second peaks measured on the pre-rebuild stack.
The rollout to internal dogfood users began at week 12. A progressive canary strategy rolled the new architecture to 10% of traffic, then 25%, then 50%, then 100% — with automatic rollback thresholds configured at each gate. The first major test — the March 2026 FOMC announcement — ran without incident. The team had pre-planned a 24-hour on-call rotation with all hands on deck, and the only intervention required was one Rule update to a pricing data enrichment Flink job that had not accounted for a 1.2% intraday rate spike.
## Results
The hard numbers tell a clear story:
**Latency:** 95th-percentile page load dropped from 8.4 seconds to 320 milliseconds — a 96% improvement. The 99th percentile dropped from 12.7 seconds to 580 milliseconds.
**Infrastructure cost:** Monthly cloud spend fell from $184,000 to $108,000 — a 41% reduction — while the architecture was validated to support 100,000 concurrent users, double the pre-rebuild peak. Cost per thousand concurrent users fell from $3,680 to $1,080.
**Reliability:** System availability moved from 99.12% to 99.97% over a 90-day measurement window. MTTR fell from 47 minutes to 3.8 minutes. PagerDuty alert volume dropped by 73%.
**Engineering velocity:** Deploy frequency increased from 2.3 per week to 12.7 per week. The average lead time from PR open to production deployment dropped from 6.2 days to 18 hours. Hotfix lead time fell from 4.3 days to 90 minutes.
User satisfaction scores, measured through quarterly NPS surveys, moved from 31 to 68 in the survey conducted three months post-launch — the single largest quarterly improvement in FinStack's five-year history. Four client accounts that had issued formal expiration notices of their contracts formally withdrew those notices and signed 18-month extensions within two months of the rebuild launch.
## Metrics Summary
| Metric | Pre-Rebuild | Post-Rebuild | Change |
|---|---|---|---|
| p95 page load | 8.4s | 320ms | -96% |
| p99 page load | 12.7s | 580ms | -95% |
| Monthly cloud cost | $184,000 | $108,000 | -41% |
| Availability | 99.12% | 99.97% | +0.85pp |
| MTTR | 47 min | 3.8 min | -92% |
| Deploy frequency/wk | 2.3 | 12.7 | +452% |
| NPS score | 31 | 68 | +119% |
| Concurrent user capacity | 50,000 | 100,000 | +100% |
## Key Architectural Decisions and Why They Mattered
Choosing Rust for the exchange connectors was not an obvious call for a team whose primary expertise was Node.js and Python. But the alternative — running the enrichment transformations in JavaScript single-threaded processes — meant that every ingestion spike was creating backpressure on the event loop, which was directly translating to end-to-end latency degradation. Rust's fearless concurrency and zero-cost abstractions allowed the team to process 2.5 million ticks per second with p99 ingestion latency under 2 milliseconds, without the overhead of managing an actor pool or a thread pool. The Rust components were also reference-counted with reference-counted libraries and ran with a memory footprint of just 120MB at peak throughput, compared to the 4.2GB the Node.js equivalent had consumed in staging tests.
The Kafka + Flink streaming pipeline was the architectural choice that produced the single largest performance win. Under the old system, every incoming tick traversed the application server, which had to fan out updates to all connected clients in real time. The fan-out was O(N × M) where N was the number of connected clients and M was the number of subscribed instruments per client — a combinatorial explosion that was the direct cause of the 8-second load times. The Kafka partitioning strategy reduced the fan-out complexity to O(N + M) by letting Kafka handle the horizontal distribution, with each consumer group independently addressing its own subscription list.
The migration from Context-based global state to Redux Toolkit with slice-scoped subscriptions was a frontend engineering decision that felt almost unglamorous on paper but produced surprisingly large gains. The old architecture maintained a single flat-augmented instruments object in global state that was updated on every tick for every subscribed instrument — whether or not any component was actively rendering that data. The reconciliation work required to update this shared object after every tick batch was consuming between 120 and 180 milliseconds of main-thread time per update cycle, directly blocking user interactions and causing observable UI lag. By scoping state to component slices and subscribing only what each component rendered, the team eliminated 94% of unnecessary reconciliation work.
## Lessons Learned
**Lesson 1: Measure before you commit to a path.** The team's first instinct was to throw more compute at the WebSocket fan-out problem by horizontally scaling the Node.js backend. Load testing proved this approach would cost nearly $400,000 per month to achieve the 400-millisecond target — more than triple the post-rebuild spend. The exercise of quantifying the alternatives before committing was responsible for the 41% cost reduction, not the architecture itself.
**Lesson 2: Edge infrastructure is underutilized in B2B SaaS.** The Phoenix WebSocket gateway, deployed at AWS Global Accelerator edge locations in Virginia, London, and Frankfurt, reduced round-trip time to European and North American clients by an average of 67 milliseconds per tick. This 67-millisecond gain on each of the 2.5 million ticks per second processed at peak ran to approximately 42,500 seconds of cumulative saved latency per peak minute — a figure that translated directly into trading decision speed for institutional clients.
**Lesson 3: Observability is a prerequisite, not a deliverable.** The team nearly shipped the rebuild without a complete distributed tracing pipeline, rationalizing that they would add it in a follow-up sprint. A late intervention by the SRE lead — who refused to sign off on a release without Tempo tracing wired into every service — caught three critical data-flow regressions that would have silently caused 20-minute stale-price windows under load. Investing in observability upfront reduced post-launch incident cost by an estimated 80% in the first 90 days.
**Lesson 4: Progressive rollout beats big-bang launches.** The canary deployment strategy with automatic rollback thresholds at each gate is what kept the FOMC announcement rollout clean. Had the team shipped the entire 100% fleet at once, a pricing data regression in the Flink enrichment job would have affected all 50,000 concurrent users simultaneously, likely generating hundreds of support tickets and possibly triggering SLA penalties. The progressive rollout caught the issue at 25% penetration in under 15 minutes, requiring a rollback to a version that took three minutes to execute.
**Lesson 5: Choose technologies that match the constraint, not your team's comfort zone.** The choice of Rust for exchange connectors and Elixir for the WebSocket gateway were the most debated decisions in the project. Both languages represented a significant learning curve for the team. But the performance characteristics were not optional — the real-time trading constraint made any language that could not handle 2.5 million concurrent message events per second with single-digit-millisecond latency a non-starter. Investing two weeks in internal Rust and Elixir training paid for itself many times over in the latency gains that followed.