Busy day on the national mall with the annual colorectal cancer awareness toilet seat data visualizations and new epstein titanic sculpture
Busy day on the national mall with the annual colorectal cancer awareness toilet seat data visualizations and new epstein titanic sculpture
A lot of great posts recently on this topic of "vibe coding as enabler," including the two linked here.
For me it's manifested as a way to scratch a digital project itch I've had for a decade, a data explorer for a set of Brewery directories from 1899 - 1918:
hadro.github.io/brewery-guid...
oh shit is "mario day"? let me fire up the emulator...
"Super Mario Sister (Asia) (En) (Pirate).nes"
We presented on our tool for enriching and clustering book data at Code4Lib today. Check it out, and let us know what you think!
data.post45.org/our-tools.html
Huge thanks to @thisismattmiller.com for leading development on this project.
#code4lib #c4l26
Roy Lichtenstein Catalogue RaisonnΓ© site got a serious terms and conditions, complete with auto scroll button before you can use it. Though at least its online + free
I wrote a little Chrome extension to make animated WebP (βweppyβ) files from a region of a webpage: chromewebstore.google.com/detail/weppy...
I use it when writing documentation and I want to show a short animation (in Github README for example). Simpler than a WebM video and more modern than GIF.
...state of the union? π , look at this bluesky quote post network explorer I just made. I added 6 networks so far:
thisismattmiller.github.io/bsky-quote-m...
Title page of book: FREAK TREES OF THE STATE OF NEW YORK The New York State College of Forestry Syracuse University NELSON C. BROWN Acting Dean 1930 Second Edition New York (State) (allege of Forestry, Syracuse.
Shows a strange tree: PRIZE WINNERS First Prize. G. W. Gotham, 89 River Street, Cortland, N. Y. Two elms, the larger tree appears to have absorbed the growth of the smaller tree. Trunk of large tree is bigger above the graft. 3
Shows two strange trees: PRIZE WINNERS Second Prize. C. B. Cox, Adams Center, N. Y. Elm, trunk runs along surface of earth in half circle 45 feet near Adams Center on North Harbor Road. Third Prize. A. Wilson Insley, 30 Eagle Street, Mt. Morris, N. Y. Elm, one mile south of Conesus Lake. 4
Shows two strange trees: PRIZE WINNERS Fourth Prize. George J. Wiedmaier, 222 King Street, Dunkirk, N. Y. Maple, 14 inches in diameter arched 7 feet and anchored in birch tree near Ark- wright, N. Y. Fifth Prize. H. L. Tayntor, McGraw, N. Y. Double beech, near Homer, N. Y. Graft 18 feet in length and 8 inches in diameter. 5
Building my HathiTrust 1930 public domain survey and coming across interesting volumes... like the tree shaming
"Freak trees of the State of New York."
babel.hathitrust.org/cgi/pt?id=co...
Also a lot of photocopies of physical media
Image from Epstine files yellow postit note on green background black redaction bars
Grid paper with black redaction bars
Pink postit note with green background black redaction bars
Green postit note on white background black redaction bars
Some of these Epstein file redactions are very aesthetic. Reminds me of updates.timsherratt.org/2021/04/21/s...
Theoretically yes, the HathiTrust builds a local database. But the tool would need to be updated to know how to work with it, a new service would need to be added, it wouldn't work out of the box.
Diagram illustrating the BookReconciler workflow. On the left, a book cover of The Book of Salt by Monique Truong appears alongside βMinimal Metadata,β listing Author: Truong, Monique and Title: The Book of Salt. An arrow points to a box labeled βBookReconcilerβ with book and diamond icons. A downward arrow leads to βEnriched + Clustered Metadata,β showing multiple editions of the book cover and expanded metadata, including several ISBNs, subject headings (e.g., VietnameseβFrance fiction, women authors, household employees, gay men, cooking), and an author VIAF identifier.
Very happy to introduce a new tool, BookReconciler!
You can take spreadsheets with book data and add subject headings, descriptions, ISBNs, HathiTrust IDs, & more. You can also cluster editions & variations of the same "Work."
Led by @thisismattmiller.com and supported by @post45data.bsky.social.
A hard problem with literary data is navigating btwn editions of books and what the "work," or the theoretical text that unites all editions. I've been lucky to work with @thisismattmiller.com and @mellymeldubs.bsky.social, who built a tool to address this + do much more
arxiv.org/abs/2512.10165
www.google.com/maps/@47.232...
Example and analysis of how AI web scrapers are breaking small and medium cultural heritage sites.
A screen shot of the viz showing clustered email graphed across time by contact
Blog: Visualizing 14,000 Released Epstein Emails.
I built a viz of the emails released as part of the 20K House Oversight Committee docs.
thisismattmiller.com/post/email-v...
- A clustered high level view of the emails by contact across time
- Zoom into individual emails and open the sources
Thanks for checking it out!
LCNAF & Trie β Storing +11M unique names in 50MB data structure in the browser
thisismattmiller.com/post/lcnaf-t...
- Optimizing LCNAF authorized headings into a trie data structure
- In browser MARC file name reconciliation + search tool
- OpenRefine / Command line tools for reconciliation
Halloween blog post: Italian Giallo Horror Films
thisismattmiller.com/post/giallo/
- Using vision language model to analyze a 70 film corpus (π§) / 80,000 frames
- Build and plot βtrope clustersβ across movies
Probably the longest eye acting supercut you've seen: youtu.be/cGrmkOwut6k
Shows a county map of the united states the counties with school districts with banned books are highlighted red.
A screenshot of a the banned book browser interface showing rows of book covers.
New Post: PEN America Banned Books 2025 dataset
thisismattmiller.com/post/book-ba...
Looking at school district book bans
- Interactive Map interface to the books banned in 2024-2025
- A faceted browse interface to the 3700 books
- Subject heading analysis
New Blog Post.
Library of Congress & Flickr Commons: Analysis of user interactions on 40,000 images
thisismattmiller.com/post/lc-flic...
- Organizing 95K photo comments.
- Viewer to explore user georectified images
- Folksonomy tagging vs LCSH Vocabulary
- Placing into the Wiki* knowledge graph
One output, 1 hour 40mins of Siskel and Ebert summaries:
www.youtube.com/watch?v=hFLM...
Trying out workflows that use multimodal LLMs for validating and QA.
In this blog I walk through a test using 1000 Siskel and Ebert videos to extract key video frames and other data.
thisismattmiller.com/post/buildin...
A woodcut mashup image titled: maintenance
maintenance
New dataset on bestsellers from 40+ countries, with consistent coverage for France, Germany, Spain, Italy, and the U.S.
Congrats to the authors @sdileonardi.bsky.social, @beccacohen.bsky.social, and @dan-sinnamon.bsky.social on this major contribution! π
π: doi.org/10.18737/386...
A screen capture from the Siskel and Ebert Show reviewing the movie Gremlins 2: The New Batch. Ebert gave it a thumbs down. Siskel gave it a thumbs down.
Gremlins 2: The New Batch (1990)
Director: Joe Dante
Cast: Phoebe Cates-Kline, Sylvester Stallone, Hulk Hogan, Zach Galligan, Christopher Lee
Watch Review
wp / wd
thisismattmiller.com/post/glitch/
New blog post about @glitch.com shutdown, how I migrated my apps, and how I used glitch for teaching and creative projects.
The Library of Congress BIBFRAME Update is online today at 1PM EDT.
Talks about:
- Hubs (BF ontology)
- BF Cataloging at Penn Libraries
- BF Validation Tooling
listserv.loc.gov/cgi-bin/wa?A...
Need a robots.txt directive indicating bulk download is available, not that they would abide by robots.txt
Yeah we have bots endlessly flooding id.loc.gov stressing servers to the limit trying to scrape millions of html pages even though we offer pretty much all of it as bulk downloads: id.loc.gov/download/