ah, right, Alamo's data looks fairly far out and my logic is not really prepared to deal with data that repeats day-of-month
I'm settling in pretty strongly to the week-oriented format here so it's not problem to do something crude here and just discard data from more than a week out
looks right now
JavaScript has entered the chat
the JavaScript crimes have inspired more JavaScript crimes and now there is a dumb per-title filter
those probably belong in a modal but I do not want to think about a UI that actually looks nice more than I already have here
I just want it to do the thing it's built to do as rapidly as possible, and I know the next thing I need aside from more data hookups is the ability to filter away things like F1ยฎ
THE MOVIE which are strictly noise for my purposes
Data provider for Somerville Theatre added. Starting to feel more and more like a real tool.
And now another provider for Landmark Kendall Square Cinema, which proved to expose an edge case when we don't have any data for a cinema on a given day.
Their B2B vendor's data is very thorough but also a pain in the ass to retrieve.
I have wired up yet another data provider, this time for The Brattle.
I have never actually been there, but in a continuation of the theme where this tool is helping me, I have learned about some neat showings they are doing Soonโข
(namely, DR. STRANGELOVE and THE THING, both movies I'd love to see again in a theatre)
I feel like this is a good point in this project to ask:
What other #Boston-area theatres should I know about that do appreciable numbers of special film screenings?
So far, I'm keeping track of:
Coolidge Corner Theatre
Somerville Theatre
The Brattle
Alamo Drafthouse
Landmark Kendall Square Cinema
But what else should I know about? I am probably going to add The Capitol and Apple Cinemas to this list.
Anything reachable by the T/commuter rail or bike from downtown is of interest.
#SomervilleMA #CambridgeMA
Still mulling the problem of how to make it easy to dismiss mass-market stuff when the reader of the calendar considers that noise.
Maybe it makes sense to rank titles by their frequency in the filter checkbox? But then it's harder to scan for the one you want. Would a search bar alleviate that problem?
My dream here is a "hide new megacorp releases" button but it's a tough nut to crack.
Especially since I'm not opposed to showing new releases on this calendar, it's just that they aren't really the goal.
Like, I wanna see WEAPONS, so this will probably be useful to select a showing of WEAPONS to attend. But I'm not primarily going to be looking at this calendar to find showings like that.
But it does seem reasonable to assume that anything sufficiently in thrall to Money Bastards will appear multiple times in the data.
The main problem is that I would need to retain a window into the past because I don't want F1ยฎ
to show up as not-a-new-mass-market-release on its last day of screening because of window effects.
Feel like I'm overthinking this but I haven't gotten to the "oh, right, duh" part yet.
A more immediate problem that wants solving is looking out further than a week. Many of these data streams have data looking that far out, although many of the more interesting cinemas fall off faster
wild to me how much javascript is on these sites to do so very much nothing
single script of out a dozen on one page at 4000 lines when prettified, likeโฆ buddy, my thing is currently 624 lines total and I cannot imagine it would take more than 1000 of JS to make it REAL schmick
I know that comparison on LoC is vague at best and I know that webtech exists to serve ads and everything else is a side effect
but goddamn
wow, Regent Theatre's use of a commercial offering called EventON to store their event data really made writing a provider for their events quite a nuisance
features include:
* shocking number of form fields (may be PHP/WP data?)
* date range fields that are ignored when satisfying the request
* multiple nonces re-used for every request
* serving HTML over JSON
* putting more information in that HTML than the sibling JSON metadata
* inability to link to a particular month in the on-page calendar schedule
* no event pages to link to :(
most of those are the software's fault, although the last one feels like the theatre gets some credit. oh well, I'll link to the main schedule page and the user can figure it out, the calendar does tell them what day the event is on
but in the end, I won, their movies now appear on the calendar
does present me with an interesting conundrum of what I will do when I aggregate the info from all of these into a single stream that other people can consumeโฆ not sure if people will want info about non-movies from a project mostly focusing on movies, but OTOH I'm already doing the necessary scraping work for themโฆ
The API backing Apple Cinemas has several eyebrow-raisers:
* TLS fingerprinting by CloudFlare is enabled
* Typos in the API (Location vs. Loction)
* Redundant/ignored query parameters (movieID vs actualMovidId, end datetime for a range query does not matter, does not even need to be after start datetime)
This one is I think going to require me to send 1 + Nmovies*Ndays requests
Writing up my provider code for Apple Cinemas and find myself writing the following in a comment explaining that the query route ignores the end date:
# We'll set this parameter "right" anyway, as a prayer for that messy API's soul.
caching N*M web requests is mildly annoying but the alternative is to cache after I've started to munge things and I kinda hate doing that in an application like this.
it's easy enough to cache by filename with {actualMovieID}_{query_start_date}_{query_performed_date}.json, just annoying
I probably need a caching implementation that I can re-use between providers. I keep re-writing the simple parts of that right where it's needed.
that N*M is 155, by the way. not really enough to justify being annoyed about it, but enough to be annoying
but spewing 155 files to disk every day, on the other hand, that's another level of obnoxious. It's only 3 MB of data (on the other hand, it's 3 MB of data!) but yea, maybe I should collate these requests and serialize the function with the loop that creates them.
In which I forget to de-dupe
enhance
In addition to their API beingโฆ idiosyncraticโฆ the Southeast Asian showings at Apple Cinema are also an interesting edge case. These are separate showings and would not fit into the false dichotomy of dub/sub that one might be tempted to adopt for this domain:
Coolie (Tamil)
Coolie (Telugu)
Coolie (Hindi)
I could see the case for collapsing those to a single listing (especially if the languages could be combined) but I don't think I will bother for now, it's not as big a problem for the reader as mass-market movies are
okay, I have done the needful and now have a re-usable cache mechanism, so I can stop writing that half-a-dozen lines again and again
I settled on being okay with caching the "showings by date" internal structure that is the output of each provider before they go to the global gather. It's discarding a lot of data from the cache, but that'sโฆ fine.
The cache is really just there so we don't hammer the upstreams, and if I really want HTTP-layer caching, I can go get someone else's solution for it and plug that in.
several seconds later: ah, crap, I just realized that I'm missing the serialization of one of those types
well, fixing the caching took a while, but now that's sortedโฆ I hopeโฆ
In the process I also confirmed my theory from yesterday that Regent Theatre's API nonce changes on a daily basis. Thankfully, it's in the HTML served by the schedule root page, so it's just another web request and a pattern-match.
There is I guess potential for TOCTOU with that nonce if the program execution crosses the day boundary, but that's easy: I will "just" put this program on a timer that avoids that problem :D
Coolidge Corner's film pages are taking a snooze a lot. I guess this is the point where I am willing to send them an email so some computer-toucher can have a look at what's going on there
hmm, or maybe it's only some of the pages for the Kurosawa series?
what the heck, I'll see if the issue keeps happening once the series is open, and then I'll reach out if it still is
yet another example of this calendar doing EXACTLY what I want it to do:
cool, there's a showing of ASHES OF TIME on Sunday. CHUNGKING EXPRESS is the only work of Wong Kar-wai that I've seen (it's also showing, on Saturday) but I would love to experience more of his work and scratch the martials-arts movie itch
it's 35mm projection too :D
hmm, uh oh, looks like I'm missing some showings for Somerville Theatre though
well, that's a problem for tomorrow-Snoop, I guess
OR MAYBE NOW-SNOOP
I think I figured it out
yep, I figured it out, I was discarding shows that had already appeared on other days rather than just consolidating repeated showings of the same movie on the same day
easy enough fix, change the 'seen' key to a tuple of (date, film_id) instead of just film_id
Another bug: looks like Alamo lists private events in their JSON API, so I'll have to put in a special rejection filter for those
I wonder what happens if you buy a ticket for a listed private event (it does seem to let you do this)
I mean, I'm sure they would turn you away if you tried to just show up, but likeโฆ would someone notice? would it error during transaction?
just stepped ankle-deep into someone's edge case there