How CloudScale Reduced Infrastructure Costs by 47% While Processing 3× More Requests

When CloudScale’s engineering team approached us, their monolithic API was buckling under 18,000 requests per minute. In this case study, we walk through the migration from a single-region monolith to a distributed event-driven architecture that cut costs nearly in half, tripled throughput, and restored reliability during their peak holiday window — all within a hard 90-day deadline.

Overview

CloudScale is a SaaS operations platform used by mid-market manufacturing firms to automate supply-chain tracking, inventory reconciliation, and ETL workflows. Founded in 2019, the company had scaled to 180 enterprise customers and 12,000 daily active users by early 2025. What looked like rapid growth on the outside was masking serious infrastructure instability on the inside.

This case study documents the end-to-end engagement between CloudScale and Webskyne over a 90-day period: how we assessed the existing state, redefined the target architecture, led the implementation, and validated results. The result was a 47% reduction in monthly infrastructure spend, a 3× increase in request throughput, and zero downtime during their busiest sales period.

The Challenge

CloudScale’s engineering leadership had a problem that many fast-growing SaaS companies recognize too late: their architecture simply could not keep up with demand. Their monolithic API, running on a single AWS region in us-east-1, was handling roughly 18,000 requests per minute during peak hours. Every quarter, the peaks grew steeper. Worse, the codebase had accumulated years of roadmaps, bypass patches, and undocumented integrations, making even small changes risky.

The symptoms were unmistakable. Response latency during peak windows regularly breached the 2-second SLA threshold. The on-call team was managing 4–5 critical incidents per month. Autoscaling was configured reactively, which meant they were vastly over-provisioned most of the time and still caught flat-footed during traffic spikes. Support tickets about data-delay issues were climbing as batch ETL jobs began stealing resources from the live API.

Management gave the engineering team a 90-day deadline to stabilize the platform before the annual holiday inventory cycle — their highest-traffic window of the year. Any major outage during that period would risk multi-year contracts with three logistics partners.

Goals

We established three primary goals and agreed on measurable success criteria:

Performance: Reduce median API latency from 1.4 seconds to under 400 milliseconds while supporting 50,000 requests per minute.
Reliability: Achieve zero downtime during the 30-day holiday inventory window and reduce monthly P1 incidents to one or fewer.
Cost: Reduce monthly infrastructure spend by at least 30% without sacrificing resilience or data consistency guarantees.

A secondary goal was to bring the engineering team up to speed on modern observability and deployment practices so they would not need to re-engage a consultancy for the next major change.

Approach

Rather than recommending a full rewrite — which would have exceeded the timeline and destabilized the team — we chose an incremental strangler-fig pattern. The monolith would remain in place during the migration, with new services intercepting specific traffic domains and gradually absorbing responsibility.

We also insisted on an event-driven backbone. CloudScale’s problem was not just compute; it was also a communication bottleneck between the API, the ETL workers, and the customer notification systems. Moving to a publish-subscribe model would let components scale independently and remove the tight coupling that made deployments so risky.

The design phase included a capacity model built on 90 days of historical metrics. We used that model to size the new infrastructure and to demonstrate cost savings before any production changes. That early financial justification was critical in getting executive sign-off.

Implementation

We ran the work in four distinct phases over 90 days.

Phase 1 — Observability and Baseline (Days 1–14)

The first two weeks were spent instrumenting the monolith. We added structured logging, distributed tracing using OpenTelemetry, and meaningful service-level objectives for every public endpoint. Before any architectural changes, we needed to understand where time was actually spent. Data collection revealed that database connection exhaustion and synchronous ETL calls inside request paths were the top contributors to latency.

Phase 2 — Event Backbone and Data Layer (Days 15–35)

Next, we stood up the new event infrastructure. We chose Redis Streams for the primary event backbone because CloudScale was already comfortable with Redis and the operational overhead was low. For durable ordering and replay, we layered on a Kafka topic per bounded context. The ETL workers were migrated to consume from Kafka rather than polling the primary database. That single change reduced read pressure on the transactional database by an estimated 40%.

We also introduced read replicas for the core PostgreSQL database and split write-heavy inventory operations onto their own connection pool. The database layer refactor alone dropped p95 latency by more than half.

Phase 3 — Service Extraction (Days 36–60)

In the third phase, we extracted the three most critical domains: customer authentication, inventory lookups, and notification dispatching. Each service was deployed behind its own autoscaling policy and exposed through an API gateway that handled authentication, rate limiting, and request routing. The gateway also gave us a single point to implement circuit breakers and graceful degradations for fallback behavior.

One key decision was to keep the data ownership model synchronous for customer profiles while moving inventory changes to eventual consistency. That distinction respected the business-criticality of customer data while still letting the high-volume inventory workflow scale horizontally.

Phase 4 — Cutover, Testing, and Runbooks (Days 61–90)

The final month focused on load testing, phased traffic shifting, and operational readiness. We ran synthetic load tests at 1.5× expected holiday peak traffic, injected controlled failures, and tuned autoscaling thresholds. The on-call team practiced incident response scenarios using the new observability dashboards. By Day 85, 82% of production traffic was flowing through the new architecture.

On Day 90, the monolith was fully decommissioned. Total production cutover time: twelve minutes. Zero lost requests.

Results

The results exceeded every metric established at the start of the engagement.

Latency: Median API latency dropped from 1,400ms to 220ms.
Throughput: The platform sustained 54,000 requests per minute in load tests and remained stable.
Cost: Monthly infrastructure spend fell from roughly $78,000 to $41,000.
Reliability: P1 incidents dropped to zero during the 30-day holiday window.
Team velocity: Deployment frequency increased from roughly once per month to twice per week.

The cost reduction was driven by three factors: right-sized compute, elimination of over-provisioned staging environments running 24/7, and the removal of expensive synchronous fan-out patterns that had inflated message-delivery costs. The performance improvements came from removing synchronous dependencies, caching aggressively at the edge, and giving the database breathing room through read replicas and connection pooling.

Key Metrics

Below is a summary of the most important before-and-after metrics.

Median API latency: 1,400ms → 220ms
P95 API latency: 3,200ms → 480ms
Monthly infrastructure cost: $78,000 → $41,000
Requests per minute supported: 18,000 → 54,000
P1 incidents per month: 4.2 average → 0
Deployment frequency: 1/month → 2/week
Database read pressure: reduced 42% after ETL migration

Lessons Learned

The project reinforced several lessons that apply broadly to fast-growing SaaS companies.

Right-size infrastructure before optimizing architecture. Many of the cost savings came from discontinuing oversized environments and unused resources, not from writing more efficient code. A thorough resource audit should precede any rewrite.

Event-driven decomposition is not just a scalability pattern; it is a team-velocity pattern. Once bounded contexts had clear publish-subscribe contracts, teams could deploy independently. Change failure rate dropped because blasts radiuses were smaller.

Invest in observability first. Trying to migrate an unknown system is like performing surgery without an MRI. The first two weeks of instrumentation paid for themselves within the first month by eliminating guesswork during incidents.

Strangle, do not rewrite. A full monolith rewrite would have taken six to eight months and risked missing the holiday deadline entirely. The strangler-fig approach let us deliver incremental, reversible value every two weeks.

Executive buy-in requires financial foresight. Presenting a cost model before touching production made it easy to justify the migration budget. By showing expected monthly savings, leadership viewed the project as an investment with a clear payback period rather than a technical luxury.

CloudScale entered the holiday season with a platform that was faster, cheaper, and more reliable than at any point in its history. More importantly, the engineering team emerged with the patterns, tooling, and confidence to continue evolving the system without external dependency.