π€ͺ
π€ͺ
AI is autotune (and Beat Detective) for culture writ large. www.nytimes.com/2025/12/03/m...
Even if you're not a partner library, you might be curious about what it's like to work with GRIN. Our technical report has a wealth of details. arxiv.org/abs/2511.11447
We're also sharing the pipeline we developed for Institutional Books that seamlessly dedupes, classifies, and enhances the data once GRIN Transfer brings it down. www.institutional.org/tools
That's why we built GRIN Transfer: a tool for downloading collections, big or small. GRIN Transfer handles request batching, failure recovery, and data aggregation so that libraries can focus on using the data rather than simply gaining access to it. www.institutional.org/posts/grin-t...
We learned this lesson over the months it took to download 1M of Harvard Library's books for our Institutional Books release. As a result, many libraries have yet to take full advantage of the wonderful resources GRIN provides.
When libraries join Google Books, Google not only scans their books, it also makes a wealth of image, OCR, & metadata available to them via the Google Return Interface (GRIN). But working with GRIN can be challenging, so we're releasing a tool to make it easier. www.institutional.org/posts/grin-t...
Data is everything.
Amanda Watson, @institutionaldatainitiative.org's Library Chair and leader of Harvard Law School Library, spoke about the importance of publishing library collections as data to guide the future of AI. hls.harvard.edu/today/food-f...
With @yh-huang.bsky.social, I'm excited to share our Digital Collections Explorer, an open-source, multimodal viewer for digital collections! Users can search with both natural language inputs and reverse image search.
Paper: arxiv.org/abs/2507.00961
Public demo: digital-collections-explorer.com
This starts in an hour.
June 23rd at 12:45pm ET. RSVP here: harvard.zoom.us/meeting/regi...
This Monday, @institutionaldatainitiative.org will host Petr Knoth to share his experience leading CORE ("The worldβs largest collection of open access research papers") as the rise of AI brings new meaning, and challenges, to stewarding knowledge repositories. Join us virtually via the link below.
Cohosted by @institutionaldatainitiative.org and The Berkman Klein Center. harvard.zoom.us/webinar/regi...
Tomorrow, it's our pleasure to host @ayahbdeir.bsky.social to talk about the power of data in building an AI ecosystem that's open, transparent, and fair. 11am ET on June 17th. Register at the link below to attend virtually.
The @institutionaldatainitiative.org is proud to support The New Commons challenge. $100k grants along with mentorship. Let's get impactful data into the AI ecosystem.
To start the weekend, we've got a brand new experience for case law on CourtListener. It has better typography, more features and metadata, five million scanned decisions from @harvardlil.bsky.social, and a lot more. Read all about it and let us know what you think: free.law/2025/03/21/c...
The @institutionaldatainitiative.org at Harvard works with knowledge institutions to increase the availability, diversity, and responsible use of training data for AI. Reach out and join us.
Our goal is to develop methods and tools that can support expert staff at libraries everywhere, increasing the breadth of materials that can be digitized and the speed at which theyβre made accessible to the public. Learn more at BPL: www.bpl.org/news/boston-...
Together, weβll research opportunities to generate machine-readable representations of items, add searchable metadata, and begin the structuring of entire collectionsβall at the moment each item leaves the imaging station.
IDI and BPL are working to change this by collaborating at the outset of a large digitization project, exploring how AI might complement human expertise and strengthen the process in its earliest stages.
BPL is embarking on a new initiative to digitize hundreds of thousands of historic items. Conventional approaches to this scale lead to an impossible choice: sacrifice depth for breadth or drastically limit what gets digitized. AI tools can help, but theyβre relegated to the end of the process.
As the @institutionaldatainitiative.org expands its mission, weβre announcing a collaboration with @bpl.boston.gov to develop AI-driven tools capable of accelerating new digitization at libraries across the world, starting at the Boston Public Library. institutionaldatainitiative.org/posts/using-...
With our digitization at Harvard Law School Library, we'll work to increase access to unique collections, such as the Supreme Court Records and Briefs that are critical to understanding decision-making at the highest U.S. court yet remain largely inaccessible.
If you're part of a library, university, or other knowledge institution and interested in working with a team of data scientists to refine and publish your data, we'd love to chat. And if you're a data scientist or community builder interested in working with institutions, we're hiring.
IDI is building a collection of large, impactful, and widely available datasets to increase AIβs accessibility and diversity while reaffirming institutions as stewards of knowledge.
I'm pleased to announce we're expanding our mission at the @institutionaldatainitiative.org with an open call for institutional collaborators, new digitization at Harvard Law School Library, and additional support to advance this work. institutionaldatainitiative.org/posts/open-c...
Great op-ed from @shaynelongpre.bsky.social on the effects AI β as a technology and as a market β is having on the web. www.technologyreview.com/2025/02/11/1...
In 15mins (1pm ET), I'll be giving a talk about @institutionaldatainitiative.org and our quest to build the most boring dataset in the world. Tune in here: lu.ma/iqkqvcus
The key is involving the institutions themselves in the conversation. Their missions are as much a reflection of the cultures they help to preserve as the data itself, and integrating them is critical as we look for new models to foster and interact with knowledge.