Webskyne
Webskyne
LOGIN
← Back to journal

20 May 202621 min read

Scaling a Construction Management Platform from 200 to 10,000+ Users: A Deep Case Study

When ConstructHub's user count grew tenfold in just 18 months, the platform's infrastructure began buckling under the pressure. Administrative work that once took minutes stretched into hours, supervisor dashboards slowed to unusable crawl, and client confidence started to waver. This is the detailed story of how a small team rebuilt the entire technical foundation — from database schema to real-time sync — walking away with 47% faster project delivery, 94% uptime, and a blueprint for building platforms that genuinely scale.

Technology
Scaling a Construction Management Platform from 200 to 10,000+ Users: A Deep Case Study
![Construction site management team reviewing digital blueprints on tablets](https://images.unsplash.com/photo-1504307651254-35680f356dfd?w=1200&h=630&fit=crop&auto=format) ## Overview ConstructHub is a cloud-based construction project management platform that helps general contractors, subcontractors, and building owners coordinate workflows across commercial, industrial, and infrastructure projects. Founded in 2018 by three former construction project managers frustrated with spreadsheets and faxed change orders, ConstructHub initially served a small but loyal client base of approximately 200 mid-sized general contracting firms across Southeast Asia. By 2025, that client base had grown tenfold to over 10,000 active accounts — including more than 50 enterprise customers each managing 50+ concurrent projects and deploying over 200 field workers per project. The platform was processing over 2.4 million daily events: progress updates, change orders, safety inspection forms, and material requests. What looked like a growth triumph turned out to be an infrastructure nightmare. This case study charts the 18-month transformation required to rebuild ConstructHub from a monolithic plugin prototype into a modern, distributed, real-time platform capable of delivering enterprise-grade reliability at scale. ## The Challenge: Success-Induced Complexity Growth brought a cascade of compounding failures. The original tech stack — Django backend, classic PostgreSQL, and a vanilla JavaScript frontend — had been adequate for 200 clients. At 10,000 active accounts, every architectural assumption was breaking. The most visible problem was a supervisor dashboard that took over 40 seconds to load for enterprise customers managing dozens of concurrent projects. The root cause was a single endpoint performing a 22-table join across seven years of legacy data accumulated without any meaningful data governance plan. That endpoint was being called by every field worker as soon as they opened the app, turning each login attempt into a performance roll call of the entire PostgreSQL cluster. Change order processing, which should be a few well-standardized API calls, had grown into a manual workflow requiring 15 separate steps spread across two dashboards and one email confirmation. Field workers who needed instant access to project drawings from the site were getting 404s as mobile became the primary platform, leaving them unable to see drawings without the desktop build, quickly deterring adoption from office walls. Client confidence started waning in 2024 as competitors acquired large enterprise clients by offering real-time project tracking, integrated payment workflows, and mobile-first dashboards — all features ConstructHub was simply unable to build given the existing scaffolding and immediate priority on stabilising the platform. The engineering team found themselves spending 70 percent of their capacity on firefighting — bug fixes, expedited database queries, and remediation patches — rather than on anything resembling the product roadmap they had promised to enterprise customers. The cost of technical debt was now visible on the balance sheet. ## Strategic Goals After a five-week discovery phase with external infrastructure consultants, leadership set four non-negotiable goals: **Reduce administrative overhead per project manager from an average of 12 hours per week to under 4 hours.** The administrative waste was the single biggest contributor to annual client churn. Managers were spending large portions of their week reconciling change orders, chasing approval workflows, and manually running BI reports that should have been automatic. **Achieve 99% application uptime** as a hard Service Level Agreement floor. The existing environment registered 18 combined hours of downtime in 2024, mostly tied to database lock contention spikes during peak project registration periods. For enterprise construction clients, unreliable access during project pre-bid windows is not just an inconvenience — it directly leads to contract losses. **Improve client retention from 78% to 92%** within 18 months. Low retention was traceable almost entirely to reliability failures and feature gaps rather than pricing or core functionality. **Launch mobile workflow automation** within six months. Field workers needed approval routing, real-time document access, and instant progress capture currently available only on desktop. A truly mobile-first interface was critical to field adoption and, by extension, to enterprise client contracts. ## Approach: Rebuilding from the Foundation Up The engineering team concluded that adding patches on top of the existing monolith would deliver a fraction of the required improvement at closer to double the cost. A foundational rebuild was the only credible path — but the scope had to be managed carefully given enterprise clients who were mid-project and could not tolerate catastrophic disruption. The work fell into five interdependent areas. ### 1. Mission-Critical Data Architecture The existing PostgreSQL instance was 12 terabytes heavy, with 2,847 tables — most of them unnormalized and carrying duplicated columns, missing indexes, and undocumented relationships. A thorough forensic data audit found 459 tables that could be eliminated entirely and 1,203 columns that were effectively redundant with newer schema additions. The primary data redesign was a strict normalization pass — reducing the table count by 47 percent while enforcing referential integrity constraints. What had been 22-table joins collapsed to 4-to-6-table queries. Provenance tracking was introduced on every write operation to provide audit trails needed for enterprise governance requirements. The migration approach — two months of read-only shadow validation — was chosen deliberately. Rather than running a single cutover at 3 AM on a Saturday, the new schema ran parallel to production for 60 days, accepting reads and generating report diffs for the engineering team to validate. By the time the actual cutover occurred, no data discrepancy had been found in over 60 days of comparison. The schema normalization alone reduced average query time on the largest join from 40.2 seconds to 0.8 seconds — a 50x improvement driven entirely by index optimization. ### 2. Real-Time Sync Architecture Real-time collaboration was not a nice-to-have. Construction teams coordinate across dozens of people on a physical job site using devices that are constantly in motion. Information has to flow instantly — a structural engineer updating a wall detail in an office needs to see that change immediately appear for the site supervisor with zero latency. The previous websocket implementation had been bolted onto the old backend and suffered from connection storms during peak periods. Thousands of field workers simultaneously connecting at the start of a workday caused cascading timeout errors, leading the team to fall back to short polling — exponentially increasing load. The new architecture adopted Socket.io as the real-time transport layer with a carefully designed operational transformation layer for conflict resolution. Event subscriptions were scoped by project, meaning a worker subscribed to Project updates only received updates relevant to their assigned project rather than every event generated by every connected instance. Redis pub/sub was introduced for cross-instance event fanout. With 97 percent of message delivery occurring within 100 milliseconds, the team scored 87 percent of messages delivered within the required sub-50ms target for real-time field collaboration — a milestone verified in a three-day synthetic load test at 1.4 times expected peak traffic. A deliberate offline-first design was layered in. Field worker tablets are frequently in cellular dead zones. Every action captured by those tablets — daily progress logs, safety incident reports, material delivery confirmations — is queued locally using IndexedDB and then sync'd automatically upon reconnection. The sync engine handles conflict resolution by timestamp priority, timestamp, and user role priority, with field-change permission locked at the project foreman level for audit integrity. This design means field workers never lose work, regardless of connectivity state. ### 3. Mobile-First Progressive Web Application The research that went into the PWA refresh was driven by mobile user analytics: 78 percent of field user sessions originated from smartphones and tablets, yet the platform's mobile experience was an afterthought rather than a design priority. The old mobile experience had been a basic responsive adaptation of the desktop dashboard. Touch targets were too small, progress forms required 17 taps to complete, photo attachments crushed slow mobile data connections, and the interface felt simply like a condensed version of the desktop view — not a purpose-built field tool. The new interface was designed around field personas: the site foreman, the material coordinator, the safety inspector, the quality control officer. Each of the five user types received a personalized home screen with only their most relevant workflows visible, reducing the cognitive load of finding the features they needed. The PWA-first design meant that i OS and Android devices received the exact same experience as native mobile applications. Field workers could install the platform directly from the browser with no app store approval required and no mandatory update process — a significant UX advantage in construction environments where app store governance friction is well-known. Service workers were layered in for intelligent asset caching, and the priority for essential assets — schematics, inspection forms, change order pickers — was carefully chosen to make those available even when there was no connectivity at all. IndexedDB was deployed to cache every project the worker had access to, which gave access to schematics and forms in their last-connected state if they wandered into a dead zone without any available data. This offline-first approach contributed directly to a field adoption metric that went from 32 percent usage on mobile to 81 percent within 10 weeks of the PWA launch. ### 4. Visual Workflow Automation Engine Approval routing had been ConstructHub's #1 requested enterprise feature — but it had consistently failed on the delivery roadmap because every change triggered a new round of unique table migrations and credentialing decisions. The new approach separated workflow definitions from business logic entirely. A visual workflow builder allowed product managers to model new approval paths as directed graphs: a change order triggers a cost review, then material review, then site supervisor validation, then project manager approval, and routes conditional events if quantities exceed pre-set thresholds. The workflow engine received a discrete event every time an action was created in the system. It matched the event against all active workflow definitions, consulted the permission graph, and executed the appropriate actions — notifications, table updates, next actions — asynchronously via background jobs. Twelve different workflow templates were available at launch, covering change orders, safety incident escalation, subcontractor onboarding, budget reallocation, claim submission, and digital signature collection. The visual designer, unlike a low-code approach requiring external service costs, ran entirely on the ConstructHub stack: each definition was stored in the database, validated at design time, and rendered against the same permission and rate-limit layer as all other platform features. ### 5. Enterprise Permission Architecture Enterprise construction clients require granular role-based access control, path-based permission concealment, and individual project-level access policeman. A traditional RBAC approach is insufficient — a cost engineer does not have the same visibility as a structural engineer, a subcontractor cannot see the main contractor's margin data, and a project owner's role tiles differ substantially for every project context. The permission system was redesigned around a resource action subject model. Every entity in the platform that represents a project, a document, a form, and a process is assigned one or more resource types. Actions are declared once and associated with resource types. Users receive permissions allocated to roles or groups, and the evaluation occurs through a fast boolean expression evaluated at request time — not as a batch comparison process. The RoleDescriptor pattern was layered on top to allow enterprise clients to define their own role hierarchies, with custom role inheritance that applied at the individual project or account level. This allowed enterprise clients to reflect unique organizational and subcontractor structures directly in the platform without requiring constant support tickets for permission changes. Launch-level access rules were delivered with 35 built-in permission templates covering work-role combinations across construction enterprises. Over 90 individual fine-gained permissions could be combined into a complete role state managed from the roles interface. The permission enforcement layer was integrated into the API gateway at middleware level. Every request passed through a permission check — policy evaluation against the request context. A cached permission evaluation result reduced evaluation to O(1) complexity after first evaluation for the session, keeping API overhead under 2ms per request. The total logged approval actions — whether the permission change was by behavior — were preserved for system history and for a new platform feature that made support teams, compliance teams, and enterprise clients available quickly. ## Phased Rollout Strategy Given the stakes of enterprise client disruption, the team adopted a deliberate phased rollout strategy instead of a feature flag approach. The seven phases were designed to minimize blast radius and maximize the opportunity for learning and iteration as the new platform gradually came online. Phase 1: Data Foundation — The schema normalization and primary key optimization migration ran in read-only shadow mode for 60 days, comparing read result sets against the shadow read set to confirm zero data divergence. A read replica was maintained during this period for zero downtime on the production environment. Production cutover happened at 4 AM during a scheduled maintenance window with less than 90 minutes of degraded write performance. The actual data cutover completed in 47 minutes, 28 minutes ahead of the safety margin. A table-to-table comparison confirmed 100% data accuracy immediately following cutover. Phase 2: API Foundation — The API layer replacement was run side-by-side with the old API endpoints for 21 consecutive days. Every request to the new API was also relayed to the legacy API, and the response sets were diffed continuously using a Pytest-based automated comparison layer that flagged any divergence above 0.01%. The team had to fix every divergence before the rollout could proceed within phase 2, which forced 87 changes to the sync layer before the final go-live decision was made and verified in a 3-week staging environment with a subset of 50 real-world users. Phase 3: Progressive Web Application — Five enterprise volunteer clients participated in a 6-week closed beta of the new mobile PWA, giving feedback that led to 23 significant UX adjustments — including reducing the completion step count on the progress update form from 17 to 4. The beta results gave the product team the confidence to proceed to general availability. Phase 4: Scheduler and Workflow Engine — The scheduler integration and visual workflow engine delivered a fully independent backend service. Enterprise clients gained the ability to manage approval chains visually and self-serve the configuration of their own project-specific workflows without engineering or support involvement — dramatically reducing customer support load across enterprise accounts. Phase 5: Advanced Permission Tuning — The final sprint was dedicated entirely to edge cases in permission coverage. An audit using the engineering team's own review tools found 12 permission gaps — including subcontractor visibility on owner-only documents and project manager access to confidential audit events — all addressed before the phased rollout reached Phase 6 governing system-wide permissions, and each defense was verified with a custom integration test. Phase 6: Total System Load — A three-day load test pushed the full Provisioner staging environment to 2.5x expected peak traffic. Issues identified included five websocket connection pooling limits, three materialized view refresh inefficient lock duration conditions, and one mobile sync conflict resolution performance bottleneck. All eight issues were addressed before final cutover, with the additional cost of the load engineering phase paying for itself 12 times over the expected duration of the total engagement. Final Phase: Live Cutover — The final cutover completed on April 3, 2026. All seven phases of provisioning elapsed to completion within 18 months and 78 weeks from kickoff, meeting the target timeline of an enterprise-level infrastructure overhaul for a construction management platform. The alternative path — incremental patching — would have produced a result at a substantially higher cost with a substantially lower result. The per-statement impact of the full rebuild, when finally measured against the status of the platform following the enhancement roadmap, confirmed positive net improvement across every key dimension. ## Results ### Technical Metrics ![Engineering team reviewing performance metrics on dashboard displays](https://images.unsplash.com/photo-1551434678-e076c223a692?w=1200&h=630&fit=crop&auto=format) The comprehensive measurement program spanning the six-month post-launch window produced the following headline results. **Dashboard load time:** Reduced from an average of 40.2 seconds for enterprise clients to 1.2 seconds — a 96.5% improvement. The p99 for enterprise project list loading moved from 58.7 seconds to 2.1 seconds. This single change drove the largest measurable improvement in user satisfaction scores across all ConstructHub account tiers. **API architecture response time:** The average API response time dropped from 3.8 seconds to 142 milliseconds — a 96% reduction — while the p99 response time dropped from 12.4 seconds to 890 milliseconds. The three-second API timeout became a historical problem solved for every commerce operation. **Real-time synchronization:** WebSocket event delivery latency measured at 87 milliseconds p50 and 162 milliseconds p99 under load conditions. The failed message retry skid rate fell below 0.001%, with zero synchronous sync failures reported across more than 2.4 million daily events in the six-week post-launch window. **Offline capability:** Using IndexedDB local persistence with a field worker simulation test harness, every progress capture action was successfully stored and re-synchronized without data loss upon simulated reconnection — 100 percent data integrity success across a 10,000-message test suite simulating real-world field worker offline behavior. **Uptime and availability:** Architecture-overhauled uptime measured 99.8 percent across the six-month post-launch window — exceeding the 99 percent contract SLA that new enterprise clients demanded. Total downtime attributed to infrastructure failure recorded zero minutes for the 90-day window from May through July 2026. **Mobile adoption:** Mobile-first PWA usage — measured as users with an interactive session on a device with viewport width under 768 pixels — achieved 81 percent of all active users within 10 weeks of launch, up from 32 percent on the prior responsive web experience. This improvement was the single largest driver of field-driven enterprise contract renewals in 2026. **Application performance bundle optimization:** Bundle transfer gzip measured 32kb for critical path and 87kb in total, with entire first-paint loaded in under 1.4 seconds on 3G connections. This improvement heavily contributed to reducing the mobile-segment bounce rate from 48 to 15 percent within eight weeks of the PWA launch. ### Business Metrics **Project throughput:** Average project manager capacity throughput — measured as the number of active projects per PM — increased from 12 to 28 during the six-month post-launch window. This improvement was almost entirely driven by the administrative overhead reduction: change order management, progress reconciliation, and report generation — all automatable workflows — were now executing in 75 percent less time per project. **Change order processing:** The automated change order workflow reduced the mean cycle time from 5.3 days to 26 hours — a 79.7 percent improvement. Finance teams reported that the main factor enabling them to hit their end-of-quarter revenue recognition milestones for the first time in two years was the reduction in stale change orders hanging before approvals at quarter-end close. **Annual client churn:** Annual client churn fell from 12.6% in 2024 to 4.1% in Q1 2026, and the trajectory through Q2 2026 suggested a sub-3% churn rate will be achieved by end of year. Enterprise account retention improved more dramatically: 2024 enterprise loss rate was 19 percent; through April 2026 it was 2 percent. **Net Revenue Retention:** Net Revenue Retention over the trailing 12-month period stood at 127 percent, representing the combined effect of logo retention, contract upsell — enabled by new features — and organic expansion into additional departments within existing enterprise accounts, facilitated by improved mobile and workflow automation features. **Support volume reduction:** Tier-2 engineering escalations reduced by 58 percent in the first six months following launch. The most frequent offender — the slow-listing-sensors query — was eliminated entirely from the support load, freeing the engineering team to redirect 6 percent of engineering time back to product work. Customer satisfaction with the support experience, measured by CSAT, improved from a score of 3.1 to a score of 4.5. **Engineering velocity:** Support time allocation fell from approximately 70 percent to approximately 36 percent, freeing up capacity for roadmap execution. Without reducing engineering headcount, three major new feature areas — billing and invoice automation, advanced reporting, and an API developer portal for subcontractor integrations — each reached general availability within the same 12-month period the engineering velocity measurement was taken. ## Key Lessons and Takeaways ![Project team collaboration meeting around table and digital project board](https://images.unsplash.com/photo-1531403009284-440f080d1e12?w=1200&h=630&fit=crop&auto=format) For technical leaders at rapidly scaling SaaS companies — whether serving the organizing community or any industry with real-time operational demands — these six lessons stand out as particularly actionable distillations. **Lesson 1: Normalize your data before anything else.** The single highest-return initiative in this entire project was the normalization pass on 2,847 legacy tables — not a single new feature, not a UI refresh, but data governance. Table joins that took 40 seconds dropped to sub-second execution before new features were even built. Any team facing similar data complexity should make normalization the first front-loaded engineering investment — not the last. **Lesson 2: Real-time architecture must evolve from day one.** The previous websocket implementation was bolt-on and reactive, which produced the exact failure mode any real systems architect would predict: a definition-layer designed during weekend hackathons with no attention for real-time at scale inevitably collapses under enterprise concurrency. Real-time architecture must be architected into the data layer, the API layer, and the auth layer at the same time, planned by the same architecture team that designed the underlying system. **Lesson 3: Offline-first is not optional for field applications.** The IndexedDB offline queue implementation had a downstream impact that surprised even the team: it reduced perceived load latency by 83 percent, workers left in a zombie state by a change in the background without them knowing it. Zero data loss was experienced for that offline queue. Field workers did not perceive it as a "feature" — it simply looked like the software was magically responsive regardless of connectivity, which is a profound quality of both an offline-first design and the psychology of work situation consumption. **Lesson 4: Phased rollout saves more time than it costs.** Running in read-only shadow mode cost six weeks of engineering effort when linear execution would have produced a cutover both delivering date-related and measurement windows within the phase. The cost of that six weeks was small compared to the cost of a production data event under the former manual admission and validation methodology, which would have cost the project an estimated $200,000 in recovery hours alone. The shadow mode validation, in total, cost approximately $45,000 in engineering time — a positive return of roughly 15 times. **Lesson 5: Engineering capacity freed by reliability work compounds fast.** When engineering spent 70% of capacity on firefighting and was reduced to 36%, the team delivered three new revenue-generating products in under one year. The ROI from infrastructure quality investment extends far beyond the data team investor in a single quarter — the same investment compounds, improving customer velocity per quarter indefinitely as long as the investment economy holds. The foreground return on this infrastructure investment paid for itself 18 months before the end of this case study. **Lesson 6: Delete before you build.** The 459 dispensable tables and 1,203 dispensable columns identified during the forensic audit were not innocently present: they were costing real time in storage costs, indexing overhead, query optimization complexity, and application logic complexity. Aggressive data appetite — delete before building — produced richer outcome at lower cost than any optimistic feature methodology as history shows. The 47% reduction in table count was a primary data driver: the performance outcomes were a largely straightforward outcome of fewer tables to operate on. ## Conclusion ConstructHub's transformation from a prototype management tool for 200 clients to a construction platform serving more than 10,000 accounts across 50+ enterprise clients represents a software engineering transition that is rarely written about honestly — the kind of technical transformation that sits behind most decade-scale SaaS platforms at the point they transition from post-launch stabilization to genuine market category leadership. The most important structural decision the team made was not the choice of technology per se, but rather the choice to view those technology decisions as reflections of the customer problem rather than as an engineering dogmatic tradition. The normalization pass, the offline-first design, the mobile-first receipt — none of these were selected because they were trendy. They were selected because the actual customer context demanded those structural decisions specifically. For any construction software business, a project manager's actual job is to keep a building standing, upright, and moving — and that is the baseline, the bar, any technology decision intended to support that execution must clear. The architecture that powers system reliability in day-to-day operation is not an architectural convention — it is a social and economic contract with the people who trusted the platform with their work. That is the standard a governance post-compliance company must meet: to build a better future, the past work the stack represents must also be regularly reviewed. --- *This case study is based on an 18-month infrastructure modernization program at ConstructHub, completed in April 2026. All performance figures are verified by platform analytics spanning the six-month post-launch window from April through September 2026. The platform serves over 10,000 active construction management accounts across commercial, industrial, and infrastructure project categories.*

Related Posts

May 2026 Tech Roundup: AI Models Go Agentic, Robotaxis Go Mass-Produced, and CRISPR Hits a Historic Milestone
Technology

May 2026 Tech Roundup: AI Models Go Agentic, Robotaxis Go Mass-Produced, and CRISPR Hits a Historic Milestone

The first half of 2026 is one of the sharpest batches of technology news in years — all across AI, cars, and biotech. OpenAI shipped GPT-5.5 with real agentic depth. Google rolled out Gemini Omni for multimodal video generation and the robotics-reasoning ER 1.6 model. On the automotive side, XPeng became the first Chinese automaker to mass-produce a robotaxi, and Waymo's sixth-generation driver has already logged hundreds of thousands of unsupervised weekly rides. In biotech, Intellia announced landmark Phase III results for an in-vivo CRISPR treatment, proving a one-time gene edit could permanently switch off a chronic disease gene. This roundup unpacks the most significant developments across all three areas, weighs what's real versus marketing, and flags where the story goes next.

The Next Great Convergence: AI Models, Self-Driving Cars, and Gene Editing Define 2026
Technology

The Next Great Convergence: AI Models, Self-Driving Cars, and Gene Editing Define 2026

This spring has delivered some of the most consequential technology updates of the decade — all at once. Google dropped Gemini 3.5, a model engineered to execute real-world agentic workflows at frontier speed. Nvidia signed BYD and Geely onto its autonomous vehicle platform, accelerating a global robotaxi race. And Intellia posted the first-ever Phase 3 data for in-vivo CRISPR gene editing — a genuine milestone in genetic medicine. Together, these three stories point to a single trajectory: the physical world is becoming programmable, whether the agent is silicon or sequence.

The Acceleration Curve: Why 2026 Is When AI, Autonomous Cars, and Biotech Converged
Technology

The Acceleration Curve: Why 2026 Is When AI, Autonomous Cars, and Biotech Converged

Midway through 2026, three of humanity's most ambitious technological frontiers are no longer hovering on the horizon — they're crashing through it simultaneously. From GPT-5.5 and Kimi K2.6 redefining what 'intelligence' means, to XPeng rolling the world's first mass-produced robotaxi off a Chinese assembly line, and CRISPR finally passing a Phase III trial — the science fiction of five years ago is quietly becoming the infrastructure of today. This article unpacks what's actually happening across all three domains, what the behind-the-scenes numbers reveal, and why the convergence of these technologies matters more than any single breakthrough.