TypeScript High

Race Condition in Async Cache Refresh

Concurrent cache invalidation calls can overwrite each other, serving stale data.

Repository auth-service

PR Title Add redis cache layer for session lookups

Author @dkim

Files changed 2

Diff size +45 / -12 lines

Changed file

src/session-cache.ts

import { Redis } from 'ioredis';
const SESSION_TTL = 3600; // 1 hour
export class SessionCache {
  async refresh(userId: string): Promise<SessionData | null> {
    const cached = await this.redis.get(`session:${userId}`);
    if (cached) return JSON.parse(cached) as SessionData;
    const fresh = await this.fetchFromDb(userId);
    await this.redis.setex(`session:${userId}`, SESSION_TTL, JSON.stringify(fresh));
    return fresh;
  }
  private inFlight = new Map<string, Promise<SessionData | null>>();
  async refresh(userId: string): Promise<SessionData | null> {
    const cached = await this.redis.get(`session:${userId}`);
    if (cached) return JSON.parse(cached) as SessionData;
    const existing = this.inFlight.get(userId);
    if (existing) return existing;
    const promise = (async () => {
      const fresh = await this.fetchFromDb(userId);
      await this.redis.setex(`session:${userId}`, SESSION_TTL, JSON.stringify(fresh));
      this.inFlight.delete(userId);
      return fresh;
    })();
    this.inFlight.set(userId, promise);
    return promise;
  }

Codio Review

Summary

Good instinct on the fix — the original refresh() method had a thundering herd problem where concurrent callers each triggered their own DB fetch for the same user. The inFlight map deduplicates concurrent calls for the same userId, which solves the wasted DB round-trips. However, the current implementation has a subtle race condition in the cache-write path that can cause a stale value to overwrite a newer one. The bug manifests when two refresh calls for the same user arrive with a small timing offset — one DB fetch completes before the other, and the older result overwrites the newer one in Redis.

Findings

Finding #1 — Stale Data Overwrite on Concurrent Refreshes High

Here's the exact failure scenario:

User's session is updated in the DB at T=0. The cache has the old session.
Two concurrent requests call refresh(userId) at T=1. Both see the stale cache, both queue up a DB fetch.
Request A's DB fetch completes at T=5. It writes the stale data (no new data in DB yet) to Redis.
Request B's DB fetch completes at T=6. It also writes the stale data — same result, so no visible issue yet.
The real issue: The DB update happens at T=7 (after both fetches started). Request A's write at T=5 overwrites the cache with stale data. Request B's write at T=6 overwrites it with the same stale data. The cache will not reflect the update at T=7 until it expires naturally.

The fix requires either:

Redis WATCH/MULTI for optimistic locking — check the cache hasn't changed before writing, or retry if it has.
Version numbers — store a version/timestamp with each cache entry and reject writes with stale timestamps.
Write-through invalidation — when the session is updated in the DB, explicitly delete the cache key rather than waiting for it to expire or get overwritten.

The inFlight deduplication is good. It's just the write side that needs the safeguard.

Finding #2 — inFlight Map Never Cleared on Error Medium

The inFlight map deletes the key inside the async IIFE's success path. If fetchFromDb() throws — e.g., the user was deleted from the DB between the cache check and the DB fetch — the promise rejects but this.inFlight.delete(userId) is never called. The entry stays in the map forever, and any subsequent call for that userId will receive the rejected promise from the stale entry.

Add a .catch() that also calls this.inFlight.delete(userId), or use a try/finally inside the IIFE.

Finding #3 — Memory Leak: inFlight Map Grows Unbounded Low

Each unique userId that goes through a cache miss adds one entry to inFlight. If user IDs are high-cardinality (many users, few repeats), this map grows indefinitely over the lifetime of the process. For a long-running service, this is a memory leak. Consider using a WeakMap (though that won't work for string keys), or periodically prune entries for user IDs that no longer have in-flight requests. A simple approach: prune on every successful cache hit.

Recommendation

The inFlight deduplication is a meaningful improvement over the original. Merge it — but address Finding #2 (error cleanup) before merging. Finding #1 is the real correctness issue, but it's nuanced enough that you may choose to track it as a follow-up with a separate PR rather than block this one. Finding #3 is a low-priority operational concern.

For the cache staleness issue: the most robust long-term fix is write-through invalidation — any code path that modifies session data in the DB should also call this.redis.del(`session:${userId}`). This eliminates the entire race condition class by making the cache always reflect the DB state. If you go this route, add it to the session service's architectural documentation so new contributors know the invariant.

Suggested Fix

src/session-cache.ts

import { Redis } from 'ioredis';
import { SessionCache } from './session-cache';

// Write-through: when session data changes in the DB,
// explicitly evict the cache entry to prevent stale reads.
export async function updateSession(userId: string, data: SessionData): Promise<void> {
  await db.sessions.upsert(userId, data);
  await redis.del(`session:${userId}`);
  // Cache will be repopulated on the next refresh() call — no race window.
}

Why Codio flagged this

Concurrent cache operations are one of the hardest classes of bugs to catch in code review — the failure mode requires specific timing to manifest and might not show up in unit tests. Codio flags this by analyzing the data flow in async methods: when a function reads from an external store (Redis), then writes to it, then calls an async function that may return at unpredictable times, Codio identifies the potential for out-of-order writes. The inFlight deduplication was a good fix; the write-through invalidation completes it.

Get Codio reviewing your PRs

Ship reviews like this on every pull request. No reminders. No missed PRs.

← Previous Missing Test Coverage for Payment Edge Case Next → Unhandled Error in HTTP Client Retry Logic