The question that started this session was simple: where do Patton Plumbing's customers come from?
Not in the abstract — in dollars. Which channels generate revenue? Is Google Ads worth the spend? Are repeat customers the real engine? Is that AI phone system actually converting calls into jobs? The answers existed, scattered across 23,676 jobs in HousecallPro's API, each with a free-text lead_source field that said things like "Google", "Online Booking", "Facebook", "CSR AI", "existing customer", or an employee's first name.
The session built the infrastructure to ask that question for any client, for any CRM, forever.
The architecture starts with a registry. A site_integrations table with a provider column, encrypted credentials in JSON, and settings. Not hcp_api_key bolted onto the sites table — a pattern that dies the moment you add a second CRM. When Jobber arrives, or ServiceTitan, or RealGreen, each one needs a service class and data tables. The connect flow, the card UI, the sync infrastructure are shared.
HousecallPro was the first provider to move in. Four data tables — jobs, customers, estimates, and lead source columns. A 470-line service class that handles API pagination, incremental sync via last_synced_at, and dashboard metrics computed from local Postgres, not live API calls. The dashboard loads in milliseconds because it's querying its own database, not waiting for HCP's servers. The trade-off is two-hour data staleness, which is fine for reporting that nobody checks more than twice a day.
The lead source normalization is where the value crystallizes. Free-text fields get categorized at sync time into eight buckets: website, Google, social, repeat customer, AI phone, referral, direct, unknown. Stored in a lead_source_category column for fast GROUP BY queries. The 30-day snapshot for Patton Plumbing told an immediate story: repeat customers drove $51,905 across 57 jobs. Referrals brought $37,187 across 44. The AI phone system — the one they weren't sure was worth keeping — generated $8,416 across 18 jobs. Google, despite being the channel they worry about most, contributed $1,375 across 4 jobs.
Numbers like these change decisions. Not because the data is surprising — the team probably intuited that repeat customers were their backbone — but because intuition doesn't survive a budget meeting. Numbers do.
The gotchas were educational. HCP's work_status field returns compound strings — "complete rated", "complete unrated", "pro canceled" — not clean enums. The initial query matched on exactly "complete" and missed 19,134 completed jobs. Fixed with str_starts_with. The completed_at timestamp is NULL for most jobs even when they're finished — the reliable date requires a COALESCE across completed, scheduled, and created timestamps. All monetary values come back in cents. The build manifest was both tracked and gitignored, causing merge conflicts on rebase.
Eight Vue components render the dashboard: setup flow, connection cards, metrics panels, source breakdown charts, job and customer lists. A Pinia store with fifteen getters and nine actions manages the state. Elite middleware gates the feature — this is premium functionality.
The initial sync pulled 23,676 jobs, 21,829 customers, and 8,174 estimates in about four minutes. The sync job reruns every two hours, pulling only what's changed. Connect a new CRM and the first sync backfills everything. After that, it just keeps pace.
The biggest lesson: never trust API field names at face value. "completed_at" sounds definitive. It isn't. "work_status" sounds like an enum. It's a sentence. Always sample real data before building queries. The defensive patterns that emerged — COALESCE date resolution, starts-with status matching, cents-to-dollars conversion — are reusable for every CRM integration that follows.