pornboxdBETA
← Field Notes
Devlog

Pre-baking every image the site will ever need

The Cloudflare Image Transformations bill was about to become a problem. Every grid thumbnail, every actor avatar, every card was being served through a

The Cloudflare Image Transformations bill was about to become a problem.

Every grid thumbnail, every actor avatar, every card was being served through a cdn-cgi/image/width=400,quality=75,format=webp/… URL, one billable transform per unique (image, width) pair per month. At the end of the SODVR push we were at ~5k uniques; by the end of the long-tail bulk import we were projected to hit ~150k. Cloudflare's pro tier covers 100k transforms/month. Past that, every image is ~$0.50 per 1,000, and unlike bandwidth or storage, there's no cache you can add to dodge it, because the URL pattern itself is the billable primitive. Two paths: pay the bill indefinitely, or stop asking for transforms.

We stopped asking for transforms.

Pre-baked WebP derivatives, written once, served forever. Migration 019 extended the image pipeline so that every image the processor uploads now also gets a sibling WebP at {key-without-extension}-{width}.webp: 480w for video posters / covers / gallery / series, 240w for actor avatars. The frontend's getGridImageUrl(r2Key) returns the plain R2 URL, no cdn-cgi/image prefix, zero billable transform. Only the detail-page hero and the lightbox (both 1200w) still transform. That's a conscious concession: those surfaces need one particular width each, and a third baked derivative just for them would double the R2 put cost for marginal benefit. Everything grid-shaped, which is 95% of the image traffic, is now static assets.

The lovely side effect: getGridImageUrl returning a plain URL means VideoCard can do a Netflix-style hover preview. Mousedown on a poster on the home page and up to eight evenly-spaced gallery frames cycle through at ~600 ms each. No new transforms, it reuses the baked 480w derivatives that were already there for the static thumbnail. Gated on matchMedia('(hover: hover)') so touch devices don't fire it on tap; minimum three frames so single-gallery videos don't flicker; preloads on first hover only so the home page doesn't kick off sixteen parallel fetches at page load. The preview keys come from a denormalised videos.preview_r2_keys TEXT[] column, partitioned NTILE(8) + DISTINCT ON over video_gallery_images gives eight evenly-spaced frames across the scene, which reads much better than the first eight (which are always the boring setup shots).

Shipping that on localhost was the easy part. Applying it across 20,000 already-completed images on a 2 GB VPS was the actual engineering.

The first lesson: CONCURRENCY=2 is not safe. The VPS is 2 GB RAM shared with Postgres, Express, PM2, the scheduler. Each sharp encode peaks around 50 MB resident. Running the backfill at concurrency=2 (the obvious first default) reliably OOMed within 30-90 minutes of sustained work, no swap, no trace, the process just vanished and systemd brought PM2 back up on the next healthcheck. The cure is CONCURRENCY=1. Resident stays ~250 MB, throughput drops to ~10 HEAD-skips/sec or ~1-2 full encodes/sec. That's fast enough for 100k rows, and it's the only setting that doesn't kill the box.

The second lesson: transient ts-node deaths were eating checkpoint progress. R2 hiccups, DB connection churn, random Node exits, none of them fatal individually, all of them fatal to a 3-hour backfill run. Wrapped the script in a bash supervisor loop:

for i in 1 2 3 4 5 6 7 8 9 10; do
  CONCURRENCY=1 npx ts-node scripts/backfill.ts >> LOGFILE 2>&1
  [ $? -eq 0 ] && exit 0
  sleep 5
done

The checkpoint file on disk survives restarts, so attempt N+1 picks up where N died. Ten attempts is overkill in steady state and exactly right during a hiccup.

The third lesson: cursor pagination races with any concurrent writer. The backfill paginated WHERE id > $cursor ORDER BY id. The hourly image processor kept completing new images at random v4 UUIDs while the backfill ran. Some of those UUIDs landed behind the advancing cursor and were silently missed. You only discover this when you HEAD-sample 50 completed rows near the cursor and find 5 missing derivatives. Two fixes: either run the backfill 2-3 times with a cleared cursor (HEAD-skip makes re-runs cheap because it skips anything that already has a derivative), or paginate ORDER BY id DESC so new inserts land behind the receding cursor. I did the first.

The fourth lesson: denormalised side effects must be expressed as SQL, not as Set<string>. The first cut of preview_r2_keys population tracked touched video IDs in an in-memory Set and ran one UPDATE per video at end-of-run. First supervisor restart wiped the Set; every video processed in the previous attempt lost its preview keys. Rewrote the end-of-run step as a single partitioned bulk UPDATE keyed on a window function over video_gallery_images, idempotent, resumable, no accumulator state. The script can die halfway through and the next run picks up exactly where the last one stopped and reconstructs any cross-cutting aggregate correctly.

Full operational playbook went into MAM1/PornBoxd/Image Pipeline.md under "Backfill operational playbook" and a terse version into the project CLAUDE.md. It'll be the reference the next time a >10k-row backfill lands on this VPS.