Race Condition in Async Cache Refresh
Summary
Good instinct on the fix — the original refresh() method had a thundering herd problem where concurrent callers each triggered their own DB fetch for the same user. The inFlight map deduplicates concurrent calls for the same userId, which solves the wasted DB round-trips. However, the current implementation has a subtle race condition in the cache-write path that can cause a stale value to overwrite a newer one. The bug manifests when two refresh calls for the same user arrive with a small timing offset — one DB fetch completes before the other, and the older result overwrites the newer one in Redis.
Findings
Here's the exact failure scenario:
- User's session is updated in the DB at T=0. The cache has the old session.
- Two concurrent requests call
refresh(userId)at T=1. Both see the stale cache, both queue up a DB fetch. - Request A's DB fetch completes at T=5. It writes the stale data (no new data in DB yet) to Redis.
- Request B's DB fetch completes at T=6. It also writes the stale data — same result, so no visible issue yet.
- The real issue: The DB update happens at T=7 (after both fetches started). Request A's write at T=5 overwrites the cache with stale data. Request B's write at T=6 overwrites it with the same stale data. The cache will not reflect the update at T=7 until it expires naturally.
The fix requires either:
- Redis WATCH/MULTI for optimistic locking — check the cache hasn't changed before writing, or retry if it has.
- Version numbers — store a version/timestamp with each cache entry and reject writes with stale timestamps.
- Write-through invalidation — when the session is updated in the DB, explicitly delete the cache key rather than waiting for it to expire or get overwritten.
The inFlight deduplication is good. It's just the write side that needs the safeguard.
The inFlight map deletes the key inside the async IIFE's success path. If fetchFromDb() throws — e.g., the user was deleted from the DB between the cache check and the DB fetch — the promise rejects but this.inFlight.delete(userId) is never called. The entry stays in the map forever, and any subsequent call for that userId will receive the rejected promise from the stale entry.
Add a .catch() that also calls this.inFlight.delete(userId), or use a try/finally inside the IIFE.
Each unique userId that goes through a cache miss adds one entry to inFlight. If user IDs are high-cardinality (many users, few repeats), this map grows indefinitely over the lifetime of the process. For a long-running service, this is a memory leak. Consider using a WeakMap (though that won't work for string keys), or periodically prune entries for user IDs that no longer have in-flight requests. A simple approach: prune on every successful cache hit.
Recommendation
The inFlight deduplication is a meaningful improvement over the original. Merge it — but address Finding #2 (error cleanup) before merging. Finding #1 is the real correctness issue, but it's nuanced enough that you may choose to track it as a follow-up with a separate PR rather than block this one. Finding #3 is a low-priority operational concern.
For the cache staleness issue: the most robust long-term fix is write-through invalidation — any code path that modifies session data in the DB should also call this.redis.del(`session:${userId}`). This eliminates the entire race condition class by making the cache always reflect the DB state. If you go this route, add it to the session service's architectural documentation so new contributors know the invariant.
Suggested Fix
import { Redis } from 'ioredis'; import { SessionCache } from './session-cache'; // Write-through: when session data changes in the DB, // explicitly evict the cache entry to prevent stale reads. export async function updateSession(userId: string, data: SessionData): Promise<void> { await db.sessions.upsert(userId, data); await redis.del(`session:${userId}`); // Cache will be repopulated on the next refresh() call — no race window. }
Why Codio flagged this
Concurrent cache operations are one of the hardest classes of bugs to catch in code review — the failure mode requires specific timing to manifest and might not show up in unit tests. Codio flags this by analyzing the data flow in async methods: when a function reads from an external store (Redis), then writes to it, then calls an async function that may return at unpredictable times, Codio identifies the potential for out-of-order writes. The inFlight deduplication was a good fix; the write-through invalidation completes it.
Get Codio reviewing your PRs
Ship reviews like this on every pull request. No reminders. No missed PRs.