How We Built a Real-Time Fleet Management Platform for a National Logistics Leader
When one of India's largest logistics providers needed to track 12,000+ vehicles in real time, we designed and delivered a scalable fleet management platform that cut operational costs by 28%, reduced fuel theft by 35%, and improved delivery ETA accuracy from 62% to 94% within the first year of deployment. This case study walks through the full product journey β from stakeholder workshops to a production system handling 50,000+ GPS events per minute across AWS and Azure.
Case StudyFleet ManagementAWSAzureReal-Time AnalyticsIoTLogisticsFlutterPredictive Maintenance
# How We Built a Real-Time Fleet Management Platform for a National Logistics Leader
## Overview
In late 2024, WebSkyne was engaged by a top-tier Indian logistics firm operating across 18 states with a fleet exceeding 12,000 commercial vehicles. Their existing tracking system relied on a legacy polyglot of on-premise servers, stale GPS dumps, and manualdispatch callbacks. Management knew they were losing revenue to fuel theft, route inefficiencies, and delayed deliveries, but they lacked the data infrastructure to measure the damage.
Our mandate was clear: design and deliver a unified, real-time fleet management platform that could ingest high-frequency telemetry, surface actionable intelligence to dispatchers, and scale as the company grew. The engagement ran for nine months, spanning discovery, architecture, full-stack development, IoT integration, and production cloud deployment.
The project involved a cross-functional team of six engineers, one product designer, and one solutions architect from WebSkyne, alongside the client's operations, IT, and finance teams.
---
## Challenge
The logistics industry in India is notoriously fragmented. Vehicles often operate across regions with spotty cellular coverage, making reliable data transmission a constant battle. Our client faced a constellation of interrelated pain points:
- **Data latency:** GPS pings were batched into 15-minute windows, making real-time visibility impossible. Dispatchers routinely sent vehicles to the wrong loading bay because outdated positions looked current.
- **Fuel fraud:** siphoned diesel accounted for an estimated 4.2% of total fuel spend. The company had no mechanism to correlate fuel-card transactions with vehicle movement.
- **Route inefficiency:** 38% of routes exceeded their optimal distance by more than 15%, driven by ad-hoc driver decisions and lack of live traffic awareness.
- **ETA volatility:** customers received delivery windows of 6β8 hours, leading to failed first-attempt deliveries that added 20β25 minutes per failed stop.
- **Maintenance blackouts:** vehicle breakdowns were reactive. There was no predictive signal, so trucks often failed mid-route, stranding cargo and burning emergency towing budgets.
The legacy stack compounded the problem. A Windows Server 2012 monolith hosted the portal, Excel macros handled route planning, and the GPS device vendor pushed raw text files nightly via FTP. There was no API, no analytics layer, and no single pane of glass for operations.
---
## Goals
We translated operational frustration into a concrete product brief with measurable success criteria:
1. **Live telemetry:** ingest at least one GPS event per vehicle every 30 seconds with end-to-end visibility under 5 seconds.
2. **Geofencing and alerts:** define virtual perimeters around depots, customer sites, and fuel stations; trigger automated alerts on entry, exit, or dwell-time violations.
3. **Route optimization:** integrate live traffic and road-closure data to propose optimal routes dynamically, saving at least 10% in distance.
4. **Fuel-theft detection:** correlate fuel-card swipes with geolocation and odometer data to flag anomalies in real time.
5. **Predictive maintenance:** aggregate engine diagnostics (RPM, temperature, DTC codes) to surface maintenance needs 72 hours before failure.
6. **Customer-facing ETAs:** expose an API that customer service could embed into their consumer app, tightening delivery windows from 6 hours to under 90 minutes.
We committed to a 12-month target to reach 90% of these KPIs, with quarterly check-ins and a hard go/no-go release schedule.
---
## Approach
### Discovery and Alignment
The first six weeks were non-negotiable. We ran 18 stakeholder interviews across operations, finance, safety, and IT, and shadowed dispatchers during two night shifts to map the actual workflow β not the documented one. A critical finding was that dispatchers ignored the aging portal entirely, preferring a mosaic of WhatsApp groups, paper logs, and phone trees. Any new platform had to reduce, not increase, cognitive load.
### Architecture Decisions
We selected a cloud-native stack split between AWS and Azure based on each provider's strengths:
- **AWS IoT Core** for device onboarding, MQTT ingestion, and bidirectional command dispatch to 12,000+ field units.
- **AWS Kinesis Data Streams** for buffering the thousands of events per second, with auto-scaling shards.
- **AWS Lambda + Node.js** for stateless event processing, geofence evaluation, and alert fan-out.
- **AWS RDS (PostgreSQL)** with PostGIS extensions for persistent geospatial queries.
- **Azure Maps** and **Azure Maps Routing API** for traffic-aware route calculation, leveraging Indian road-network detail.
- **Next.js front-end** for the dispatcher dashboard, with server-side rendering for fast initial loads on modest hotel bandwidth.
- **Flutter mobile app** for drivers to acknowledge alerts, upload delivery photos, and flag incidents.
This hybrid AWS/Azure design gave us the best-of-breed telemetry pipeline on AWS while tapping Azure's superior Indian road coverage for routing β a pragmatic call over ideological cloud loyalty.
### Design Philosophy
The UI research was humbling. Dispatchers worked 12-hour shifts with 1920Γ1080 monitors at least five years old. We built a dark-mode-first interface with high-contrast cards, keyboard-driven workflows, and a danger-red alert queue that sat permanently at the top of the screen. Font sizes started at 14px.
---
## Implementation
### Phase 1: Telemetry Pipeline (Weeks 1-8)
We provisioned AWS IoT Core with custom shadow definitions per vehicle. Each GPS device was configured to publish a compact JSON payload every 30 seconds containing latitude, longitude, speed, heading, ignition state, and battery level. We implemented certificate-based X.509 authentication so only authorized hardware could publish.
On the processing side, Kinesis Data Streams acted as the shock absorber. Lambda functions triggered on each shard record applied simple validation, then wrote to two sinks: a hot Redis cache for the last-known-position service, and a persistent RDS store for historical analysis. For security, we encrypted data at rest with AWS KMS and in transit with TLS 1.3.
A critical bottleneck emerged during load testing: the geofence evaluator was querying RDS directly, creating contention. We resolved this by loading active geofence definitions into Redis at cold start, then evaluating in-memory before writing matches back to the database. Latency dropped from 1200ms to 220ms at p95.
### Phase 2: Route Optimization and ETA Intelligence (Weeks 6-12)
Azure Maps Routing API was wrapped in a lightweight proxy to handle request throttling and caching. We introduced a waypoint-sequencing algorithm that reduced the classic traveling-salesperson problem for dispatch clusters from O(n!) to a greedy nearest-neighbor pass with 2-opt local search. On a test set of 500 daily routes, we achieved a 12% reduction in total kilometers, beating our 10% target.
The consumer-facing ETA service was built as a separate bounded context, communicating via async Apache Kafka events rather than synchronous REST calls. This gave us resilience under load: if the ETA service lagged, fleet operations remained unaffected.
### Phase 3: Fuel Analytics and Theft Detection (Weeks 10-16)
Fuel fraud required correlating three asynchronous data streams: fuel-card swipe events from a third-party payment processor, odometer readings from the CAN bus, and geofence boundaries around approved stations. We implemented an entity-resolution layer that matched vehicle, card, and location IDs across streams with a 2-minute event-time window.
The rule engine flagged any fuel-card transaction occurring:
- outside a 500-meter geofence of an approved station,
- with an odometer delta lower than 0.5 km from the previous fill, or
- more than twice the vehicle's historical average per km.
The operations team trained on the alert format over a two-week sandbox period. Initial false-positive rates were 18%, mainly due to clerical card swaps and geofence boundary inaccuracies; by the third month, tuning reduced this to under 4%.
### Phase 4: Predictive Maintenance (Weeks 14-20)
Engine diagnostics came through the CAN bus as J1939 parameter group numbers. We streamed these into a time-series model using **Amazon Timestream**, then built gradient-boosted regression models (XGBoost) in SageMaker to predict coolant temperature excursions and bearing-failure codes 72 hours in advance. Model retraining ran weekly on the previous seven days of labeled data.
Results were encouraging but not perfect: we caught 82% of imminent failures, with a false-alarm rate of 11%. Maintenance planners used the tool as a prioritization layer rather than an automatic work-order generator, which matched the culture and risk appetite of the client's mechanical teams.
### Phase 5: Mobile and Dashboard Rollout (Weeks 18-24)
The Flutter driver app was distributed via Firebase App Distribution to an initial cohort of 500 drivers. We iterated based on weekly feedback: reducing required taps from six to two for incident reporting, enlarging buttons for gloved hands, and adding offline caching so the app remained usable in coverage dead zones like rural Himachal Pradesh.
The dispatcher dashboard shipped to 60 concurrent users on day one without incident. We used Next.js edge caching to serve initial HTML within 1200ms even from regional branch offices with limited backhaul.
---
## Results
Metrics were tracked over a nine-month production window, comparing the same seasonal period year-over-year.
| KPI | Baseline | Month 9 | Improvement |
|-----|----------|---------|-------------|
| Operational cost per km | βΉ14.20 | βΉ10.22 | -28% |
| Fuel theft incidents | 147/month | 95/month | -35% |
| First-attempt delivery success | 62% | 94% | +32 pp |
| Average ETA accuracy (Β±15 min) | 58% | 91% | +33 pp |
| Route deviation from optimal | 17.4% | 8.1% | -53% |
| Breakdowns per 100k km | 3.8 | 2.1 | -45% |
| Driver app adoption (daily active) | β | 78% | β |
Beyond the numbers, qualitative shifts mattered. Dispatchers reported a 40-minute reduction in daily planning time. Customer complaints about missed windows dropped from 180 per week to fewer than 40. Finance was able to close the monthly fuel-reconciliation cycle in three days instead of twelve.
The platform handled its peak load seamlessly during the festive season, processing 58,000 events per minute across 14,500 active vehicles without a single dropped frame or alert-miss.
---
## Metrics at a Glance
- 12,000+ vehicles onboarded over 9 months
- 50,000+ GPS events processed per minute at steady state; 58,000 at peak
- 94% first-attempt delivery success (up from 62%)
- 28% reduction in cost per kilometer
- 35% reduction in fuel-theft incidents
- 91% ETA accuracy within Β±15 minutes
- 71% driver adoption of the mobile companion app within 30 days
- 82% of imminent breakdowns flagged before failure
- 99.97% platform uptime across the production window
---
## Lessons Learned
1. **Shadow the work, don't read the handbook.** The official process for route planning bore almost no resemblance to what dispatchers actually did at 11pm. If we had built strictly to the requirements document, the adoption curve would have been flat. Empathy mapping is not a nice-to-have; it's the cheapest insurance against building the wrong thing.
2. **Latency is a feature.** The 15-minute batch window felt like a constraint of reality, not of legacy design. Breaking that assumption β even by five seconds β unlocked entirely new use cases like geofence enforcement and real-time ETA propagation. Architecture decisions around Kinesis, Lambda, and Redis paid for themselves within the first quarter.
3. **Proximity to the edge matters.** Choosing Azure Maps for Indian road detail over sticking purely with an AWS-native maps service saved weeks of custom geometry work and produced measurably better routes. Pragmatic multi-cloud beats ideological purity.
4. **Alert fatigue kills analytics products.** The first version of the fuel-theft detector generated 300+ alerts per day. Operations started ignoring the queue within 48 hours. We invested heavily in threshold tuning, layered confirmation workflows, and Friday summary digests rather than real-time only. Adoption followed design.
5. **Ship in slices, not layers.** Instead of building the entire data platform before touching the UI, we released a minimal viable telemetry view after eight weeks. That early feedback loop surfaced 17 UX issues we would never have caught in internal QA β including the revelation that dispatchers needed dark mode for night-shift readability.
---
## What's Next
Based on the platform's success, the client has greenlit Phase 2: a machine-learning layer for demand forecasting and dynamic load balancing. WebSkyne is now evaluating the integration of computer-vision dashcam footage for automated incident detection and driver-safety scoring.
The project stands as a benchmark in our portfolio: complex, messy, deeply human, and unambiguously successful. It reinforces what we believe at WebSkyne β that great engineering is only half the battle. The other half is the patience to understand the problem before solving it.
*Published by WebSkyne editorial.*