Four Problems Stacked on Top of Each Other

Four problems stacked on top of each other, each one invisible until the one above it was fixed.

First: the queue was running synchronously. QUEUE_CONNECTION=sync on both beta and production meant every queued job — email sending, data sync, notification dispatch — executed inside the HTTP request that triggered it. A customer submitting a quote would wait for the confirmation email to actually send before seeing a response. Switch to Redis.

Second: emails weren't sending anyway. The server firewall blocks port 587 — the standard SMTP submission port. SendGrid offers 2525 as an alternate. Switch the port.

Third: the sender address was unverified. SendGrid requires domain verification, which requires correct MX records. The MX records pointed to smtp.google.com — wrong. Google Workspace uses a specific set of five MX records. Fix the DNS through the Cloudflare API: five MX records, an SPF record including both Google and SendGrid, a DMARC record.

Fourth: beta and production shared the same Redis queue name. Both environments ran queue workers. Beta's worker was stealing production's jobs. A deploy on beta would restart the worker mid-send, killing an email that production had dispatched. Intermittent failures with no pattern — the kind that makes you doubt your logging. Fix: separate queue names, separate worker scripts.

Each fix revealed the next. The queue fix revealed the port block. The port fix revealed the sender verification failure. The verification fix revealed the MX records. And the whole stack was sitting on a shared queue where one environment's deploys could kill the other's jobs.

No code committed. Just environment variables and DNS records. The most impactful kind of fix — zero lines changed, everything works now.