Greg Leppert's Avatar

Greg Leppert

@leppert.me

Working on AI and access to knowledge at Harvard. Executive Director of the Institutional Data Initiative; Chief Technologist of the Berkman Klein Center.

1,146
Followers
141
Following
38
Posts
28.04.2023
Joined
Posts Following

Latest posts by Greg Leppert @leppert.me

πŸ€ͺ

07.12.2025 19:30 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
Why Does A.I. Write Like … That?

AI is autotune (and Beat Detective) for culture writ large. www.nytimes.com/2025/12/03/m...

07.12.2025 15:26 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Preview
GRIN Transfer: A production-ready tool for libraries to retrieve digital copies from Google Books Publicly launched in 2004, the Google Books project has scanned tens of millions of items in partnership with libraries around the world. As part of this project, Google created the Google Return Inte...

Even if you're not a partner library, you might be curious about what it's like to work with GRIN. Our technical report has a wealth of details. arxiv.org/abs/2511.11447

20.11.2025 16:42 πŸ‘ 0 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0
Preview
Institutional Books | Institutional Data Initiative Institutional Books 1.0 is our first release of public domain books. This set was originally digitized through Harvard Library’s participation in the Google Books project..

We're also sharing the pipeline we developed for Institutional Books that seamlessly dedupes, classifies, and enhances the data once GRIN Transfer brings it down. www.institutional.org/tools

20.11.2025 16:42 πŸ‘ 0 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0
Preview
Announcing the release of GRIN Transfer GRIN Transfer, an open source tool that allows Google Books partner libraries to more easily access their Google Books collection.

That's why we built GRIN Transfer: a tool for downloading collections, big or small. GRIN Transfer handles request batching, failure recovery, and data aggregation so that libraries can focus on using the data rather than simply gaining access to it. www.institutional.org/posts/grin-t...

20.11.2025 16:42 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

We learned this lesson over the months it took to download 1M of Harvard Library's books for our Institutional Books release. As a result, many libraries have yet to take full advantage of the wonderful resources GRIN provides.

20.11.2025 16:42 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Preview
Announcing the release of GRIN Transfer GRIN Transfer, an open source tool that allows Google Books partner libraries to more easily access their Google Books collection.

When libraries join Google Books, Google not only scans their books, it also makes a wealth of image, OCR, & metadata available to them via the Google Return Interface (GRIN). But working with GRIN can be challenging, so we're releasing a tool to make it easier. www.institutional.org/posts/grin-t...

20.11.2025 16:42 πŸ‘ 5 πŸ” 3 πŸ’¬ 1 πŸ“Œ 0

Data is everything.

25.07.2025 20:43 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
Food for (AI) thought and the library initiative improving AI’s digital diet - Harvard Law School Amanda Watson of the Harvard Law School Library says the release of Harvard’s digitized collection is only the beginning of collaborations between libraries and tech firms.

Amanda Watson, @institutionaldatainitiative.org's Library Chair and leader of Harvard Law School Library, spoke about the importance of publishing library collections as data to guide the future of AI. hls.harvard.edu/today/food-f...

24.07.2025 19:34 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
Digital Collections Explorer: An Open-Source, Multimodal Viewer for Searching Digital Collections We present Digital Collections Explorer, a web-based, open-source exploratory search platform that leverages CLIP (Contrastive Language-Image Pre-training) for enhanced visual discovery of digital col...

With @yh-huang.bsky.social, I'm excited to share our Digital Collections Explorer, an open-source, multimodal viewer for digital collections! Users can search with both natural language inputs and reverse image search.

Paper: arxiv.org/abs/2507.00961
Public demo: digital-collections-explorer.com

02.07.2025 20:56 πŸ‘ 76 πŸ” 26 πŸ’¬ 2 πŸ“Œ 3

This starts in an hour.

23.06.2025 15:41 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
Welcome! You are invited to join a meeting: IDI Talk with Petr Knoth (CORE). After registering, you will receive a confirmation email about joining the meeting. Welcome! You are invited to join a meeting: IDI Talk with Petr Knoth (CORE). After registering, you will receive a confirmation email about joining the meeting.

June 23rd at 12:45pm ET. RSVP here: harvard.zoom.us/meeting/regi...

20.06.2025 17:43 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

This Monday, @institutionaldatainitiative.org will host Petr Knoth to share his experience leading CORE ("The world’s largest collection of open access research papers") as the rise of AI brings new meaning, and challenges, to stewarding knowledge repositories. Join us virtually via the link below.

20.06.2025 17:43 πŸ‘ 2 πŸ” 2 πŸ’¬ 2 πŸ“Œ 1
Preview
Welcome! You are invited to join a webinar: Open AI Development. After registering, you will receive a confirmation email about joining the webinar. For AI to truly benefit society, it must be built on foundations of transparency, fairness, and accountabilityβ€”starting with the most foundational building block that powers it: data. Not long ago, ...

Cohosted by @institutionaldatainitiative.org and The Berkman Klein Center. harvard.zoom.us/webinar/regi...

16.06.2025 19:48 πŸ‘ 0 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0

Tomorrow, it's our pleasure to host @ayahbdeir.bsky.social to talk about the power of data in building an AI ecosystem that's open, transparent, and fair. 11am ET on June 17th. Register at the link below to attend virtually.

16.06.2025 19:48 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

The @institutionaldatainitiative.org is proud to support The New Commons challenge. $100k grants along with mentorship. Let's get impactful data into the AI ecosystem.

14.04.2025 15:46 πŸ‘ 7 πŸ” 5 πŸ’¬ 0 πŸ“Œ 0
Preview
A Faster, Smarter, Unified Case LawΒ Experience A redesigned case law modernizes the reading experience with enhanced layout and typography, more advanced features, better speed, and more.

To start the weekend, we've got a brand new experience for case law on CourtListener. It has better typography, more features and metadata, five million scanned decisions from @harvardlil.bsky.social, and a lot more. Read all about it and let us know what you think: free.law/2025/03/21/c...

21.03.2025 22:53 πŸ‘ 27 πŸ” 9 πŸ’¬ 1 πŸ“Œ 2

The @institutionaldatainitiative.org at Harvard works with knowledge institutions to increase the availability, diversity, and responsible use of training data for AI. Reach out and join us.

12.03.2025 13:23 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
Boston Public Library Expands Access to Collections Through AI-Enhanced Digitization BOSTON, MA – March 12, 2025 - The Boston Public Library (BPL) is launching a large-scale digitization project to unlock hundreds of thousands…

Our goal is to develop methods and tools that can support expert staff at libraries everywhere, increasing the breadth of materials that can be digitized and the speed at which they’re made accessible to the public. Learn more at BPL: www.bpl.org/news/boston-...

12.03.2025 13:23 πŸ‘ 2 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0

Together, we’ll research opportunities to generate machine-readable representations of items, add searchable metadata, and begin the structuring of entire collectionsβ€”all at the moment each item leaves the imaging station.

12.03.2025 13:23 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

IDI and BPL are working to change this by collaborating at the outset of a large digitization project, exploring how AI might complement human expertise and strengthen the process in its earliest stages.

12.03.2025 13:23 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

BPL is embarking on a new initiative to digitize hundreds of thousands of historic items. Conventional approaches to this scale lead to an impossible choice: sacrifice depth for breadth or drastically limit what gets digitized. AI tools can help, but they’re relegated to the end of the process.

12.03.2025 13:23 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Preview
Using AI to Accelerate Digitization at Boston Public Librarys Today, as part of our mission expansion, we’re announcing a collaboration with BPL to develop AI-driven tools capable of accelerating new digitization of large collections at libraries across the worl...

As the @institutionaldatainitiative.org expands its mission, we’re announcing a collaboration with @bpl.boston.gov to develop AI-driven tools capable of accelerating new digitization at libraries across the world, starting at the Boston Public Library. institutionaldatainitiative.org/posts/using-...

12.03.2025 13:23 πŸ‘ 18 πŸ” 10 πŸ’¬ 1 πŸ“Œ 1

With our digitization at Harvard Law School Library, we'll work to increase access to unique collections, such as the Supreme Court Records and Briefs that are critical to understanding decision-making at the highest U.S. court yet remain largely inaccessible.

05.03.2025 15:36 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

If you're part of a library, university, or other knowledge institution and interested in working with a team of data scientists to refine and publish your data, we'd love to chat. And if you're a data scientist or community builder interested in working with institutions, we're hiring.

05.03.2025 15:36 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

IDI is building a collection of large, impactful, and widely available datasets to increase AI’s accessibility and diversity while reaffirming institutions as stewards of knowledge.

05.03.2025 15:36 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Preview
Expanding Our Mission: An Open Call for Collaborators Today, we’re pleased to announce an open call for institutional collaborators as new support expands the research capacity of the Institutional Data Initiative.

I'm pleased to announce we're expanding our mission at the @institutionaldatainitiative.org with an open call for institutional collaborators, new digitization at Harvard Law School Library, and additional support to advance this work. institutionaldatainitiative.org/posts/open-c...

05.03.2025 15:36 πŸ‘ 11 πŸ” 9 πŸ’¬ 1 πŸ“Œ 0
Preview
AI crawler wars threaten to make the web more closed for everyone There’s an accelerating cat-and-mouse game between web publishers and AI crawlers, and we all stand to lose.

Great op-ed from @shaynelongpre.bsky.social on the effects AI β€” as a technology and as a market β€” is having on the web. www.technologyreview.com/2025/02/11/1...

12.02.2025 19:27 πŸ‘ 3 πŸ” 2 πŸ’¬ 0 πŸ“Œ 0
Preview
Greg Leppert, Harvard | The Most Boring Dataset in the World Β· Luma Foresight Institute’s Intelligent Cooperation Group The Most Boring Dataset in the World Abstract:Β Data is a critical raw material in the construction of AI.…

In 15mins (1pm ET), I'll be giving a talk about @institutionaldatainitiative.org and our quest to build the most boring dataset in the world. Tune in here: lu.ma/iqkqvcus

29.01.2025 17:47 πŸ‘ 4 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

The key is involving the institutions themselves in the conversation. Their missions are as much a reflection of the cultures they help to preserve as the data itself, and integrating them is critical as we look for new models to foster and interact with knowledge.

21.12.2024 21:26 πŸ‘ 9 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0