Ten samples, ten epochs, thirty-five seconds, and a loss curve that barely moved. That was the first training run of the SAM3 fine-tuning pipeline — a smoke test, not a breakthrough. But the pipeline that made it possible is the real story.
The loop works like this: a human reviews an AI detection in the review panel. They adjust a polygon, relabel a region, delete a false positive. That correction gets exported from Laravel — the satellite image paired with the human-edited polygon mask — and pushed to S3. On the other end, a RunPod handler downloads the pair, rasterizes the mask to a binary PNG, and fine-tunes the model's mask decoder. The trick is freezing the encoder. SAM3's image encoder is ten gigabytes of pretrained knowledge about what things look like. The decoder — the part that draws the actual polygons — is 4.4 megabytes. You train the small part. You leave the big part alone. When training finishes, the new weights go back to S3, and the handler auto-loads them on next cold start.
Loss went from 0.54 to 0.52 on ten samples. Meaningless as a metric, meaningful as proof of life. The architecture works. The data flows. The weights update. Now it needs fifty to a hundred diverse corrections before the model starts to generalize — before it starts seeing the difference between a freshly mowed lawn in Memphis and a dormant bermuda yard in January.
The same session built a parcel boundary service that changes how properties get measured. Before, every detection used a default buffer radius — seventy-five meters from the address pin — which meant the AI was analyzing the neighbor's yard, the street, and half a parking lot. The new ParcelBoundaryService queries county ArcGIS endpoints for actual parcel polygons. Shelby County, Tennessee was the first entry in the registry. For an address on Butterworth Road that RapidAPI had no data for, ArcGIS returned the exact lot boundary on the first try. Adding new counties is a config entry — a URL and a layer ID.
Batch detection had been clipping entire neighborhoods because the processing queue wasn't fetching parcel boundaries before running SAM. It would detect on the full satellite tile, find every lawn-like surface in frame, and attribute them all to one property. The fix: parcel query first, then detection within the boundary. The default fallback radius dropped from seventy-five meters to thirty.
Rejected detections had been cycling back to "unprocessed" instead of staying rejected — a soft-delete that wasn't soft enough. Fixed. Save-measurement calls were throwing foreign key violations because they referenced detection records that hadn't been committed yet. Fixed. PostgreSQL boolean type mismatches — the eternal gotcha when Eloquent's PHP booleans meet Postgres's strict typing — surfaced again and got squashed again.
The RunPod deployment had its own drama. A circular import crashed the handler on cold start — importing the model at module level re-triggered the serverless entry point. Docker Hub had IPv6 connectivity issues with the worker nodes. AWS credentials weren't propagating to the container environment. Each one a different kind of broken, each one invisible until you're reading stack traces at two in the morning.
The value of this session isn't the loss curve. It's the feedback loop. Every correction a human makes in the review panel becomes training data. Every training run produces weights that make the next detection slightly better. Every slightly better detection means the human has less to correct. The loop tightens. Not overnight — this is a months-long curve. But the infrastructure to ride that curve now exists, and that's the part you can't rush.