A more immediate problem that wants solving is looking out further than a week. Many of these data streams have data looking that far out, although many of the more interesting cinemas fall off faster
wild to me how much javascript is on these sites to do so very much nothing
single script of out a dozen on one page at 4000 lines when prettified, like⦠buddy, my thing is currently 624 lines total and I cannot imagine it would take more than 1000 of JS to make it REAL schmick
I know that comparison on LoC is vague at best and I know that webtech exists to serve ads and everything else is a side effect
but goddamn
wow, Regent Theatre's use of a commercial offering called EventON to store their event data really made writing a provider for their events quite a nuisance
features include:
* shocking number of form fields (may be PHP/WP data?)
* date range fields that are ignored when satisfying the request
* multiple nonces re-used for every request
* serving HTML over JSON
* putting more information in that HTML than the sibling JSON metadata
* inability to link to a particular month in the on-page calendar schedule
* no event pages to link to :(
most of those are the software's fault, although the last one feels like the theatre gets some credit. oh well, I'll link to the main schedule page and the user can figure it out, the calendar does tell them what day the event is on
but in the end, I won, their movies now appear on the calendar
does present me with an interesting conundrum of what I will do when I aggregate the info from all of these into a single stream that other people can consumeβ¦ not sure if people will want info about non-movies from a project mostly focusing on movies, but OTOH I'm already doing the necessary scraping work for themβ¦
The API backing Apple Cinemas has several eyebrow-raisers:
* TLS fingerprinting by CloudFlare is enabled
* Typos in the API (Location vs. Loction)
* Redundant/ignored query parameters (movieID vs actualMovidId, end datetime for a range query does not matter, does not even need to be after start datetime)
This one is I think going to require me to send 1 + Nmovies*Ndays requests
Writing up my provider code for Apple Cinemas and find myself writing the following in a comment explaining that the query route ignores the end date:
# We'll set this parameter "right" anyway, as a prayer for that messy API's soul.
caching N*M web requests is mildly annoying but the alternative is to cache after I've started to munge things and I kinda hate doing that in an application like this.
it's easy enough to cache by filename with {actualMovieID}_{query_start_date}_{query_performed_date}.json, just annoying
I probably need a caching implementation that I can re-use between providers. I keep re-writing the simple parts of that right where it's needed.
that N*M is 155, by the way. not really enough to justify being annoyed about it, but enough to be annoying
but spewing 155 files to disk every day, on the other hand, that's another level of obnoxious. It's only 3 MB of data (on the other hand, it's 3 MB of data!) but yea, maybe I should collate these requests and serialize the function with the loop that creates them.
In which I forget to de-dupe
enhance
In addition to their API being⦠idiosyncratic⦠the Southeast Asian showings at Apple Cinema are also an interesting edge case. These are separate showings and would not fit into the false dichotomy of dub/sub that one might be tempted to adopt for this domain:
Coolie (Tamil)
Coolie (Telugu)
Coolie (Hindi)
I could see the case for collapsing those to a single listing (especially if the languages could be combined) but I don't think I will bother for now, it's not as big a problem for the reader as mass-market movies are
okay, I have done the needful and now have a re-usable cache mechanism, so I can stop writing that half-a-dozen lines again and again
I settled on being okay with caching the "showings by date" internal structure that is the output of each provider before they go to the global gather. It's discarding a lot of data from the cache, but that's⦠fine.
The cache is really just there so we don't hammer the upstreams, and if I really want HTTP-layer caching, I can go get someone else's solution for it and plug that in.
several seconds later: ah, crap, I just realized that I'm missing the serialization of one of those types
well, fixing the caching took a while, but now that's sortedβ¦ I hopeβ¦
In the process I also confirmed my theory from yesterday that Regent Theatre's API nonce changes on a daily basis. Thankfully, it's in the HTML served by the schedule root page, so it's just another web request and a pattern-match.
There is I guess potential for TOCTOU with that nonce if the program execution crosses the day boundary, but that's easy: I will "just" put this program on a timer that avoids that problem :D
Coolidge Corner's film pages are taking a snooze a lot. I guess this is the point where I am willing to send them an email so some computer-toucher can have a look at what's going on there
hmm, or maybe it's only some of the pages for the Kurosawa series?
what the heck, I'll see if the issue keeps happening once the series is open, and then I'll reach out if it still is
yet another example of this calendar doing EXACTLY what I want it to do:
cool, there's a showing of ASHES OF TIME on Sunday. CHUNGKING EXPRESS is the only work of Wong Kar-wai that I've seen (it's also showing, on Saturday) but I would love to experience more of his work and scratch the martials-arts movie itch
it's 35mm projection too :D
hmm, uh oh, looks like I'm missing some showings for Somerville Theatre though
well, that's a problem for tomorrow-Snoop, I guess
OR MAYBE NOW-SNOOP
I think I figured it out
yep, I figured it out, I was discarding shows that had already appeared on other days rather than just consolidating repeated showings of the same movie on the same day
easy enough fix, change the 'seen' key to a tuple of (date, film_id) instead of just film_id
Another bug: looks like Alamo lists private events in their JSON API, so I'll have to put in a special rejection filter for those
I wonder what happens if you buy a ticket for a listed private event (it does seem to let you do this)
I mean, I'm sure they would turn you away if you tried to just show up, but like⦠would someone notice? would it error during transaction?
just stepped ankle-deep into someone's edge case there