← Field Notes

Devlog

NaughtyAPI hides 61 VR scenes in two endpoints nobody documents

LTI Digital, the parent company of Naughty America and a handful of other networks, has a public JSON API at api.naughtyapi.com. It is one of the saner…

By Mortimer CockburnApril 22, 20262 min read

LTI Digital, the parent company of Naughty America and a handful of other networks, has a public JSON API at api.naughtyapi.com. It is one of the saner integrations we have. You can ask ?type=vr and you get back a clean list of around 1,350 VR scenes with thumbnails, runtimes, actor lists, and tags. We had this importer working on day one of the integration push.

Then a few weeks in, an actor's page on PornBoxd was missing two scenes that her IAFD entry clearly listed under a NaughtyAmerica VR title. I went looking. Turns out NaughtyAPI also exposes ?type=rtvr (24 scenes) and ?type=rsvr (37 scenes), and these lists do not overlap with ?type=vr at all. They are separate streams, hidden behind acronyms the public documentation never explains. As far as I can tell, "rt" stands for "real time" and "rs" for "rooms" but nobody at LTI Digital felt the need to write that down anywhere.

61 missing scenes across the two streams. A drop in the bucket against 1,350, but every absent scene is a hole in the catalog that someone arriving from a Google search will notice. Fixed by importing all three types, then routing them all to the same studio row at the writer level. No dedup needed because, as established, the streams do not overlap.

The funnier story from the same batch was the Tier 2.5 alias guard.

When the writer (the part of the importer that actually inserts rows into Postgres) sees an actor name on a new scene, it tries to match against the existing actor catalog before creating a new row. Tier 1 is exact slug match. Tier 2 is content-hash match using studio + lowercased name, which catches re-scrapes of the same studio. Tier 2.5, added a couple of weeks ago, was an alias overlap match: if the incoming name is in the canonical actor's aliases array, treat it as the same person.

This sounds reasonable until you remember that some actor pages have terrible aliases attached from earlier scrapes. A single token like "Maria", a misspelling, or worse, a tag that got captured as a name. When a new scene came in for a different "Maria", Tier 2.5 would happily route it onto the wrong actor. We had 67 such collisions waiting to fire.

The fix is two guards stacked on top: the incoming name must be at least two tokens, and there must be at least one shared token between the incoming name and the canonical's main name. So an alias of just "Maria" stops matching anything that is not already a "Maria Something". An alias of "Maria Cherry" stops matching scenes for "Cherry Wood". Defangs all 67 collisions without us having to actually clean the bad aliases. Some of those aliases are wrong but harmless now.

I do not know why I keep finding these. The sensible reading is that the catalog is just large enough that bad legacy data has surface area now, and every operation at the seam between two source-of-truth systems will eventually trip it. The less sensible reading is that I am paying down debt I added in February when I was too excited about making the alias system work at all.