But they didn’t just rush to the database — they collided at the . You see, KQR’s cache was protected by a single, global synchronized block for writes.
At 9:00:00 AM, a surge of traffic hit. Every user, in every time zone, suddenly demanded the same piece of data: the flash sale metadata for item ID #42.
def get(key): if key in cache: return cache[key] else: value = db.query("SELECT * FROM items WHERE id = ?", key) // slow cache[key] = value return value Because the cache was empty, all 10,000 threads saw a at the exact same moment. They all rushed to the database.
In the bustling data center of the e-commerce platform, there lived a tired but loyal piece of infrastructure: a PostgreSQL database named KQR (Key-Query-Resolver).
— KQR had a little-known diagnostic command:
KQR had a job: cache frequently accessed rows so the main disk could rest. For years, this worked beautifully. Until .
CACHE GETS (total): 10,000 CACHE HITS: 0 CACHE MISSES: 10,000 MISSES WHILE LOCK HELD: 10,000 CONTENTION RATIO: 1.00 TOP CONTENDED ROW: item:42 WAITING THREADS: 9,999 LOCK HOLD TIME (avg): 487ms This was a contention storm . The first thread to acquire the cache lock went to the database (487ms). The other 9,999 threads didn’t just wait — they spun, retried, and choked the CPU.
But they didn’t just rush to the database — they collided at the . You see, KQR’s cache was protected by a single, global synchronized block for writes.
At 9:00:00 AM, a surge of traffic hit. Every user, in every time zone, suddenly demanded the same piece of data: the flash sale metadata for item ID #42. kqr row cache contention check gets
def get(key): if key in cache: return cache[key] else: value = db.query("SELECT * FROM items WHERE id = ?", key) // slow cache[key] = value return value Because the cache was empty, all 10,000 threads saw a at the exact same moment. They all rushed to the database. But they didn’t just rush to the database
In the bustling data center of the e-commerce platform, there lived a tired but loyal piece of infrastructure: a PostgreSQL database named KQR (Key-Query-Resolver). Every user, in every time zone, suddenly demanded
— KQR had a little-known diagnostic command:
KQR had a job: cache frequently accessed rows so the main disk could rest. For years, this worked beautifully. Until .
CACHE GETS (total): 10,000 CACHE HITS: 0 CACHE MISSES: 10,000 MISSES WHILE LOCK HELD: 10,000 CONTENTION RATIO: 1.00 TOP CONTENDED ROW: item:42 WAITING THREADS: 9,999 LOCK HOLD TIME (avg): 487ms This was a contention storm . The first thread to acquire the cache lock went to the database (487ms). The other 9,999 threads didn’t just wait — they spun, retried, and choked the CPU.