← Field Notes

Devlog

Thirty JAV studios in one afternoon

Yesterday the JAV catalogue had one studio in it. By the end of today it had thirty-one, 15,120 videos, 3,659 actresses, 260 directors, 1,151 series. The

By Mortimer CockburnApril 15, 20263 min read

Yesterday the JAV catalogue had one studio in it. By the end of today it had thirty-one, 15,120 videos, 3,659 actresses, 260 directors, 1,151 series. The afternoon felt like watching a warehouse fill up in time-lapse.

The mechanics are the same as the SODVR launch: one admin row per label, then npx ts-node src/importers/r18dev.ts --studio=<slug> --label-id=<N>. The writer joins the R18.dev staging schema against our canonical tables and upserts through the shared sink. What made this an afternoon's work and not a fortnight was two fixes to the importer and one fix to the image processor, each of them obvious in hindsight, each of them discovered by measuring how long things actually took.

The HEAD-probe was the whole wall clock. Migration 019 added a pre-flight HEAD request against DMM's CDN for each row, if the first gallery frame 302s to their now_printing.jpg placeholder, skip the entire range. Correct, cheap per-request (~250 ms), catastrophic in a loop. KMPVR has 3,116 rows; 3,116 × 250 ms = 780 s of wall-clock before a single database write. Swapped the serial for await for a 20-wide Promise.all pool, and the same label's clean re-import dropped from 790 s to 128 s. That's not a micro-optimisation; that's the difference between "I can import another label after dinner" and "I'll do it tomorrow."

The image processor was about to drain in the wrong order. Post-bulk, the pending queue was ~6,400 videos + ~8,700 covers + ~55,000 gallery stills + ~29,000 actor avatars. The old fetchPendingImages fired five parallel SELECT … LIMIT 50 queries and let each type claim 20% of a 250-item batch. Videos (the thing a visitor actually sees on the home page) would have taken eight hours to drain while gallery (lightbox-only, hidden behind a click) got equal share. Refactored to strict priority tiers: videos → covers → actors → series → gallery, each tier filling the whole 50-item batch until it empties. Per-batch throughput drops while upper tiers have work, but posters drain ~5× faster, which is the right product decision.

One migration tripped a constraint in production. Migration 016 authored a 3-value CHECK for videos.cover_image_status. Migration 018 widened it to 4 (adding missing_source for DMM rows that serve placeholder art). The migrate script re-runs every .sql on every deploy, so 016 kept dropping the wider constraint and re-adding the narrower one, a harmless no-op until the cover processor started writing missing_source in prod, at which point the re-add failed against the live data and the deploy broke. Fix: edited 016 in place to author the full 4-value enum up front; 018's re-add is now a genuine no-op. Lesson: when a later migration widens a CHECK that an earlier one re-asserts every run, the earlier one is the source of truth. Migrations are additive but CHECK constraints on existing columns are not.

There were smaller things too, a VideoCard title overflow on mobile where long JAV titles with unbreakable DVD-ID tokens (13DSVR-1972) blew past the viewport because CSS grid's default min-width: auto sizes 1fr columns around intrinsic min-content; a missing_source image state to distinguish "permanent placeholder" from "transient failure, retry later"; a 1.3% slug-collision rate in the writer's DVD-ID disambiguation that'll need attention when the next big label lands. None of those were the story. The story was that scale multiplies your mistakes: a mis-ordered fetch that's invisible at one studio is an eight-hour UX regression at thirty.