A Wall of Errors and What Lived Behind It

The production console was a wall of errors. CSP violations from Cloudflare's analytics script. A missing order_activities table. 400 errors from endpoints that worked fine in development. Each one a thread to pull.

The Content Security Policy needed two additions: Cloudflare Insights loads a script from static.cloudflareinsights.com and connects to cloudflareinsights.com. Neither was in the CSP whitelist. The remeasure dialog's iframe needed the marketing domain in frame-src — a dynamic value computed from the environment.

The order_activities table had a migration in the codebase but had never been applied to production. The table didn't exist. The endpoint returned 400. Created it through the database management tool and registered the migration as run.

The tenant context cache had a stale-data problem. The middleware that sets the default tenant wasn't clearing the cache afterward, so subsequent requests in the same process could see stale tenant data. Added a clearCache() call after setting the default.

Two endpoints — level/progress and save-measurement — were returning 400 errors with no diagnostic information. The error response service hides exception details in production, which is correct for security but makes debugging impossible without SSH access. Added catch-block logging with request context: tenant ID, presence of required fields, the actual exception message. Deploy, wait for the errors to recur, read the logs.

Production debugging is archaeology. The errors are symptoms. The causes live in migrations that were never run, CSP rules that were never updated, caches that were never cleared. Each fix reveals the next layer. The diagnostic logging deployed in this session would answer questions that hadn't been asked yet — but would be, the next time an endpoint returned 400 with no explanation.