's Avatar

@cojennin

Member of Technical Staff @AnthropicAI / prev @ MosaicML

70
Followers
217
Following
7
Posts
04.07.2023
Joined
Posts Following

Latest posts by @cojennin

And one bathroom! All the programming I had as a child to ask about water pressure, whenever I deal with a plumber it’s like the Manchurian Candidate

08.04.2025 13:15 πŸ‘ 8 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
Characterizing Datasets and Building Better Models with Continued Pre-Training

What’s the most effective way to add new domain knowledge into an open LLM? A new blog post from my team covers experiments we did at the beginning of the year to start answering this question. It starts, unsurprisingly, with sweeping your learning rate… www.databricks.com/blog/charact...

25.11.2024 23:28 πŸ‘ 22 πŸ” 8 πŸ’¬ 0 πŸ“Œ 1

When you fail to parse your data that’s a jsonl

22.11.2024 00:47 πŸ‘ 7 πŸ” 3 πŸ’¬ 1 πŸ“Œ 0
A plot showing that reranking improves recall as we increase the number of reranked docs, but with increasing docs we diminishing returns and eventually a performance dip.

A plot showing that reranking improves recall as we increase the number of reranked docs, but with increasing docs we diminishing returns and eventually a performance dip.

Mat is not on πŸ¦‹β€”posting on his behalf!

It's time to revisit common assumptions in IR! Embeddings have improved drastically, but mainstream IR evals have stagnated since MSMARCO + BEIR.

We ask: on private or tricky IR tasks, are rerankers better? Surely, reranking many docs is best?

20.11.2024 19:44 πŸ‘ 81 πŸ” 23 πŸ’¬ 4 πŸ“Œ 5

How many documents should you retrieve when using a reranker? The answer might surprise you!

Check out the excellent work from our intern Mathew on this important retrieval question. πŸ‘

20.11.2024 20:07 πŸ‘ 11 πŸ” 3 πŸ’¬ 0 πŸ“Œ 0
Preview
Eyes on the Street: A Safer Walk to Transit By Sheepshead Bay Road - Streetsblog New York City A DOT safety project in Sheepshead Bay is in effect after a rocky path to implementation.

E 15th! nyc.streetsblog.org/2016/12/13/e...

14.11.2024 11:30 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Walter Miller, A Canticle for Leibowitz

10.11.2024 12:43 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

I love the smell of providing executives with actionable insights in the morning

08.11.2024 18:50 πŸ‘ 10 πŸ” 2 πŸ’¬ 0 πŸ“Œ 0

"Son, we live in a world that has dashboards, and those dashboard have to be guarded by data engineers with Spark."

29.10.2024 20:11 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Time for some shitposting about dashboards, since that’s in the zeitgeist:

I call many years ago an eager IT manager telling me we should deprecate all of our traditional SSIS reports. Because all that info could be revealed in a dashboard. And dashboards allow β€œinsight discovery” which is πŸ”₯ …

28.10.2024 12:31 πŸ‘ 50 πŸ” 6 πŸ’¬ 6 πŸ“Œ 6
Post image

Any data people in New York want to grab bagels and talk about AI? We have a group that meets every other Thursday morning. I’d be happy to add people to the google group.

28.10.2024 11:04 πŸ‘ 6 πŸ” 1 πŸ’¬ 0 πŸ“Œ 2

text2sql is coming, look busy

29.10.2024 18:12 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

you either die making good model, or live long enough to see yourself make bad model go fast

29.10.2024 11:47 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

It is surprisingly easy to accidentally dress as shaggy from Scooby-Doo

28.10.2024 12:56 πŸ‘ 1 πŸ” 1 πŸ’¬ 0 πŸ“Œ 1

I really miss the days of showing someone ✨entity extraction✨and watching their eyes glaze over

14.07.2024 23:52 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0