Brutkey

SnoopJ
@SnoopJ@hachyderm.io

does present me with an interesting conundrum of what I will do when I aggregate the info from all of these into a single stream that other people can consume… not sure if people will want info about non-movies from a project mostly focusing on movies, but OTOH I'm already doing the necessary scraping work for them…


SnoopJ
@SnoopJ@hachyderm.io

The API backing Apple Cinemas has several eyebrow-raisers:

* TLS fingerprinting by CloudFlare is enabled
* Typos in the API (Location vs. Loction)
* Redundant/ignored query parameters (movieID vs actualMovidId, end datetime for a range query does not matter, does not even need to be after start datetime)

This one is I think going to require me to send 1 + Nmovies*Ndays requests

SnoopJ
@SnoopJ@hachyderm.io

Writing up my provider code for Apple Cinemas and find myself writing the following in a comment explaining that the query route ignores the end date:

# We'll set this parameter "right" anyway, as a prayer for that messy API's soul.

SnoopJ
@SnoopJ@hachyderm.io

caching N*M web requests is mildly annoying but the alternative is to cache after I've started to munge things and I kinda hate doing that in an application like this.

it's easy enough to cache by filename with
{actualMovieID}_{query_start_date}_{query_performed_date}.json, just annoying

SnoopJ
@SnoopJ@hachyderm.io

I probably need a caching implementation that I can re-use between providers. I keep re-writing the simple parts of that right where it's needed.

SnoopJ
@SnoopJ@hachyderm.io

that N*M is 155, by the way. not really enough to justify being annoyed about it, but enough to be annoying

SnoopJ
@SnoopJ@hachyderm.io

but spewing 155 files to disk every day, on the other hand, that's another level of obnoxious. It's only 3 MB of data (on the other hand, it's 3 MB of data!) but yea, maybe I should collate these requests and serialize the function with the loop that creates them.

SnoopJ
@SnoopJ@hachyderm.io

In which I forget to de-dupe

SnoopJ
@SnoopJ@hachyderm.io

enhance

SnoopJ
@SnoopJ@hachyderm.io

In addition to their API being… idiosyncratic… the Southeast Asian showings at Apple Cinema are also an interesting edge case. These are separate showings and would not fit into the false dichotomy of dub/sub that one might be tempted to adopt for this domain:

Coolie (Tamil)
Coolie (Telugu)
Coolie (Hindi)

I could see the case for collapsing those to a single listing (especially if the languages could be combined) but I don't think I will bother for now, it's not as big a problem for the reader as mass-market movies are

SnoopJ
@SnoopJ@hachyderm.io

okay, I have done the needful and now have a re-usable cache mechanism, so I can stop writing that half-a-dozen lines again and again

I settled on being okay with caching the "showings by date" internal structure that is the output of each provider before they go to the global gather. It's discarding a lot of data from the cache, but that's… fine.

The cache is really just there so we don't hammer the upstreams, and if I
really want HTTP-layer caching, I can go get someone else's solution for it and plug that in.

SnoopJ
@SnoopJ@hachyderm.io

several seconds later: ah, crap, I just realized that I'm missing the serialization of one of those types

SnoopJ
@SnoopJ@hachyderm.io

well, fixing the caching took a while, but now that's sorted… I hope…

In the process I also confirmed my theory from yesterday that Regent Theatre's API nonce changes on a daily basis. Thankfully, it's in the HTML served by the schedule root page, so it's just another web request and a pattern-match.

There is I guess potential for TOCTOU with that nonce if the program execution crosses the day boundary, but that's easy: I will "just" put this program on a timer that avoids that problem :D

SnoopJ
@SnoopJ@hachyderm.io

Coolidge Corner's film pages are taking a snooze a lot. I guess this is the point where I am willing to send them an email so some computer-toucher can have a look at what's going on there

SnoopJ
@SnoopJ@hachyderm.io

hmm, or maybe it's only some of the pages for the Kurosawa series?

what the heck, I'll see if the issue keeps happening once the series is open, and then I'll reach out if it still is

SnoopJ
@SnoopJ@hachyderm.io

yet another example of this calendar doing EXACTLY what I want it to do:

cool, there's a showing of ASHES OF TIME on Sunday. CHUNGKING EXPRESS is the only work of Wong Kar-wai that I've seen (it's also showing, on Saturday) but I would love to experience more of his work
and scratch the martials-arts movie itch

SnoopJ
@SnoopJ@hachyderm.io

it's 35mm projection too :D

SnoopJ
@SnoopJ@hachyderm.io

hmm, uh oh, looks like I'm missing some showings for Somerville Theatre though

well, that's a problem for tomorrow-Snoop, I guess

SnoopJ
@SnoopJ@hachyderm.io

OR MAYBE NOW-SNOOP

I think I figured it out

SnoopJ
@SnoopJ@hachyderm.io

yep, I figured it out, I was discarding shows that had already appeared on other days rather than just consolidating repeated showings of the same movie on the same day

easy enough fix, change the 'seen' key to a tuple of
(date, film_id) instead of just film_id

SnoopJ
@SnoopJ@hachyderm.io

Another bug: looks like Alamo lists private events in their JSON API, so I'll have to put in a special rejection filter for those

SnoopJ
@SnoopJ@hachyderm.io

I wonder what happens if you buy a ticket for a listed private event (it does seem to let you do this)

SnoopJ
@SnoopJ@hachyderm.io

I mean, I'm sure they would turn you away if you tried to just show up, but like… would someone notice? would it error during transaction?

just stepped ankle-deep into someone's edge case there