Engineering #Architecture #Ukweli

How We Achieved Sub-100ms Load Times on Ukweli Code Solutions

How We Achieved Sub‑100 ms Load Times on Ukweli Code Solutions At Ukweli Code Solutions we pride ourselves on turning complex business requiremen...

May 9, 2026 5 min read

How We Achieved Sub‑100 ms Load Times on Ukweli Code Solutions

At Ukweli Code Solutions we pride ourselves on turning complex business requirements into high‑performance products. The most recent milestone was a turn‑key optimization of our flagship internal management platform, bringing average initial render times below 100 milliseconds. This post breaks down the engineering steps we followed, the trade‑offs made, and the tangible value delivered to customers and investors alike.

Initial Baseline and Business Context

When the project began, the system served a global audience of roughly 10 thousand concurrent users. The average page load time measured 450 ms on a 4‑core, 16 GB instance at peak. While the figure met previous contractual obligations, stakeholders flagged a noticeable drift in user experience during high‑traffic events such as price roll‑outs and holiday promotions. The sliding window for revenue loss per millisecond under these circumstances was estimated at 5 USD, highlighting a clear motivation to reduce latency.

The architecture originally comprised a monolithic Ruby on Rails application, a series of queues driven by Sidekiq, and a PostgreSQL database with a single owner‑replica pair. Load balancers were native to the hosting provider, and all assets lived on the same volume. From a performance standpoint, the old stack suffered from three critical bottlenecks: 1) blocking I/O during ORM queries, 2) over‑fetching non‑essential data, and 3) high serialisation overhead for nested HTML templates.

Redesigning the Data Path

Because the user interface was heavily data‑centric, the first order of business was to extract latency out of the database round‑trip. We rewrote all high‑frequency endpoints to use optimistic caching backed by Redis. The decision to move sensitive string data into a compressed LZ4 format saved 12 % in object size, directly translating into lower network traversal time.

To reduce blocking I/O, we introduced an asynchronous query layer. Instead of waiting for all records to materialise, we issued batched queries that returned cursors. The cursor handling logic leveraged the PostgreSQL's `RETURNING` clause to stream rows directly into Redis using a pipelined Lua script. The effect: the previous 80 ms spent awaiting database completion dropped to roughly 10 ms.

We also leveraged the new `EXPLAIN (ANALYZE, BUFFERS)` output to shrink index coverage. By extracting only columns needed for the view, the total keyspace was reduced by 28 %, influencing the execution courses of our queries. This change eliminated needless disk reads, which on a 16‑core engine became a limiting factor under concurrency.

Micro‑services & Sharded Workers

Rails’ monolith gave way to a service‑oriented design around core business logic and rendering layers. By splitting the e‑commerce calculation engine into a stateless micro‑service in Go, we achieved an execution cost reduction of 1.7x for CPU‑bound operations. The decision was data‑driven. Benchmarks on the same code base in Ruby measured 8 ms per calculation; Go’s implementation took 4 ms across identical datasets.

We introduced worker sharding to keep batch jobs from saturating the application tier. Workers were spun up based on real‑time queue depth, allowing us to bring maximum concurrency from 10 to 32 for specific tasks like image generation and email push. The overhead of context switching was compensated by the reduction in total execution cycle.

Front‑End Engineering for 100 ms

With the backend latency tucked away, the focus shifted to the front‑end. Consider the request‑render‑response transaction: after receiving a rendering instruction from the micro‑service, the response payload required validation and client‑side rendering. We sliced the JavaScript bundle into three segments with dynamic imports driven by user interaction: core, product, and auxiliary. The first segment loaded in under 30 ms on a 2.5 GHz device with a 3 Gbit/s connection.

Static assets benefited from incremental bundle hashing; only changed files were re‑downloaded. The new cache-control policy used `must-revalidate` with a version‑ed query string, allowing edge caches to stay fresh without stale content leakage. The build pipeline integrated a custom `gzip+brotli` pipeline, cutting payload size to 48 kB for the critical UI.

HTML templates were restructured to leverage partial rendering. For example, the product listing section was aggressively server‑rendered into an HTML fragment. The client immediately inserted the fragment into the DOM, cutting the need for a full parse of a monolithic template. This reduced the time from first paint to interactable state to 90 ms on baseline hardware.

Network and Edge Optimisations

A Content Delivery Network (CDN) was mandatory to keep latency low global‑wide. Our edge layer, powered by a multi‑regional setup, served pre‑hashed resources with a custom API route that returned a pre‑validated cache key. The CDN’s edge workers also performed a simple fingerprint check, refusing to serve stale content when a new hash emerged.

We configured HTTP/2 with push promises for critical CSS and JavaScript, ensuring that the browser had all it needed before executing rendering logic. The HTTPS handshake was accelerated via HTTP/2 Server Push, eliminating three round‑trips for key resources.

Profil

Featured Service

Data Engineering & BI

Architecting data pipelines and intelligent reporting systems.

Reader Comments (0)

No comments yet. Be the first to share your thoughts!

Your Rating

Name *

Email *

Comment / Question *

Become an Affiliate

Become a Reseller

Welcome Back

Create Your Account

Reset Password