N+1 Query in ORM Lookup Loop
Summary
The original function had a textbook N+1 query problem — for each user ID in user_ids, it fires two separate ORM queries (Session.query.filter_by(user_id=uid) and Payment.query.filter_by(user_id=uid)). With 100 user IDs, that's 200 database round-trips before the first result is returned. For a reporting endpoint, this is a significant performance regression waiting to happen. The fixed version correctly batches the fetches using .in_() clauses, reducing the query count from 2N to 2 regardless of input size. However, the Python-side grouping logic in the fixed version has a correctness issue that makes it O(N²) in the wrong place.
Findings
The SQL fix is correct — .in_(user_ids) fetches all sessions and payments in two queries. But the Python grouping logic is inefficient:
sessions_by_user is a dict keyed by user_id, so the lookup should be O(1): sessions_by_user[uid] (or handling the list case if multiple sessions exist). Instead, the code iterates over .values() for every user, which is O(total_sessions) per user. With 10,000 sessions and 100 users, that's 1,000,000 iterations instead of 100 direct dict lookups.
If each user can have multiple sessions, the correct structure is defaultdict(list) keyed by user_id, then sessions_by_user[uid] is O(1). If each user has exactly one session, sessions_by_user[uid] is already correct.
If any Payment.amount column is NULL in the database, sum(...) silently skips the NULL (SQL aggregate functions ignore NULLs), so this won't crash — but it may undercount revenue if NULL payments exist in production. More importantly, if the ORM maps Payment.amount to a Python None and the model field is typed as Decimal, the sum() will raise a TypeError at runtime when it tries to add Decimal(50) + None. Verify the column constraint — if NULL payments can exist, handle them explicitly with filter()` or coalesce in the SQL query.
SQLite and PostgreSQL have limits on the number of values you can pass to IN (...) clauses — PostgreSQL's default is around 32KB of query text, which translates to roughly 1,000–2,000 integer IDs per query. If user_ids can be large (e.g., a bulk export for all users), the .in_() query will fail with a "too many arguments" error. Consider chunking the IDs into batches of 1,000 and running multiple queries, then merging the results.
Recommendation
The SQL-side fix is correct and solves the original N+1. Addressing Finding #1 first — replace the Python iteration with proper dict lookups. This makes the function O(2 queries + N dict lookups) instead of O(2 queries + N×M Python iterations). Finding #2 is a data integrity concern worth a quick check of the Payment model schema. Finding #3 is an edge case for large batch exports — worth noting but not blocking for normal usage.
Suggested Fix
from collections import defaultdict from sqlalchemy import and_ from models import User, Session, Payment def get_user_analytics(user_ids: list[int]) -> dict[int, dict]: if not user_ids: return {} BATCH_SIZE = 1000 sessions_by_user = defaultdict(list) payments_by_user = defaultdict(list) # Batch the IN queries to avoid PostgreSQL argument limits for batch in [user_ids[i:i+BATCH_SIZE] for i in range(0, len(user_ids), BATCH_SIZE)]: sessions = Session.query.filter(Session.user_id.in_(batch)).all() payments = Payment.query.filter(Payment.user_id.in_(batch)).all() for s in sessions: sessions_by_user[s.user_id].append(s) for p in payments: payments_by_user[p.user_id].append(p) result = {} for uid in user_ids: user_payments = payments_by_user.get(uid, []) total = sum( (p.amount or 0) for p in user_payments ) result[uid] = { "session_count": len(sessions_by_user.get(uid, [])), "total_revenue": total, } return result
Why Codio flagged this
Codio identifies N+1 queries by analyzing ORM patterns inside loops — when a query is constructed inside a for loop over a collection derived from user input, it's a performance issue waiting to explode under load. The original N+1 is obvious. The O(N²) Python grouping in the "fixed" version is the subtler issue Codio catches — the author solved the SQL problem but introduced an iteration-in-iteration that will slow the function proportionally to the number of rows returned. Codio flags both.
Get Codio reviewing your PRs
Ship reviews like this on every pull request. No reminders. No missed PRs.