Real-Time Inventory Tracking for E-Commerce Using Kafka & Databricks

Summary

I worked on a project with an e-commerce company that was having a major issue with overselling, especially during big events like Black Friday. They used batch updates for inventory and were losing hundreds of thousands annually due to order cancellations. I helped them build a real-time inventory tracking system using Kafka and Spark Structured Streaming on Databricks.

We ingested clickstream events like 'add to cart' and 'purchase' via Kafka, synced with vendor stock updates from APIs, and maintained a live view of inventory using Delta Lake. Spark jobs kept SKU availability up-to-date and ensured only valid orders passed downstream.

The result was a reduction in overselling from 10% to under 1%. It was a great example of combining streaming with real-world operations to solve a business-critical issue in a short time frame.

Business Challenge

Previously, the e-commerce relied on hourly batch jobs to sync inventory from PostgreSQL to vendor APIs. During high-demand events (e.g., Black Friday), this led to:

~10% overselling rate
Significant order cancellations, damaging customer trust

The goal was to design a streaming-based solution that reacts in real time to purchases, cart updates, and vendor stock changes.

Solution Overview

We architected and deployed a real-time inventory tracking system leveraging:

Apache Kafka: Ingest clickstream & vendor events
Databricks & Spark Structured Streaming: Process streams & update state
Delta Lake: Unified batch + streaming storage
PostgreSQL: Legacy backend for bootstrapping

Processing Architecture Diagram — Processing Architecture: Kafka & Databricks Streaming Pipeline

Data Sources & Schemas

Clickstream Events (Kafka Topics)

 {
  "event_type": "add_to_cart",
  "user_id": "12345",
  "product_id": "SKU-9981",
  "timestamp": "2025-03-29T12:01:02Z",
  "quantity": 2
}

Inventory DB (PostgreSQL)

 product_id VARCHAR,
warehouse_id INT,
quantity_available INT,
updated_at TIMESTAMP

Vendor Inventory APIs

Stock levels pulled every 5 mins and normalized into events via a microservice.

Implementation Steps

Requirements & Architecture Design: Workshops and sprint planning to define failure points and streaming-first approach.
Kafka Setup:
- Defined topics: user_activity, vendor_updates, inventory_changes.
- Configured Debezium CDC for PostgreSQL & REST-to-Kafka microservice.
Spark Streaming & Delta Lake:
- Real-time inventory decrement on clickstream events.
- Merge logic for vendor API updates every 5 mins.

Impact & Results

Overselling rate reduced from 10% to <1%
Order SLA violations cut by 65%

Thanks for reading! 🧑‍💻💕