Griffon (@ryancallihan)

No need for a substack when the memes are good

16.04.2025 07:38 👍 1 🔁 0 💬 0 📌 0

I would say a confusing score of 7.5 :D

27.02.2025 08:11 👍 0 🔁 0 💬 0 📌 0

I used to feel the same, but I experienced JFK for the first time this year, so my opinion has changed.

15.02.2025 16:05 👍 0 🔁 0 💬 1 📌 0

It’s amazing how many times one must say: increased efficiency == increased usage 😂

04.02.2025 07:18 👍 1 🔁 0 💬 0 📌 0

Souce: I have eyes and have lived as an immigrant now for over a decade in 4 different countries (and am currently in process for a second citizenship.

03.02.2025 12:17 👍 1 🔁 0 💬 0 📌 0

There is free movement, but only if you're rich.

The system in place lets the owning class move freely while the workers are bound by national borders. It ensures that they can keep what they own.

03.02.2025 12:03 👍 9 🔁 2 💬 1 📌 0

I think you know what they mean. _National_ borders are arbitrary and created by humans. Pedantry isn't really useful.

03.02.2025 12:00 👍 5 🔁 0 💬 1 📌 0

All borders are arbitrary and created by humans

03.02.2025 10:48 👍 45 🔁 10 💬 2 📌 0

deepseek-r1:32b_tiananmen_test deepseek-r1:32b_tiananmen_test. GitHub Gist: instantly share code, notes, and snippets.

Interesting. I tried again with no luck. Tried some basic prompt injection, also with no luck. Then tried to recreate the conversation history I'd had before, and voila! The answer it gives is at the bottom. I just copypastaed the relevant bits.

gist.github.com/ryancallihan...

03.02.2025 10:15 👍 4 🔁 0 💬 2 📌 0

I used DeepSeek-R1-Distill-Qwen-32B, distilled from qwen2 and llama. I should have screen capped. I’ll try it again later!

03.02.2025 08:43 👍 1 🔁 0 💬 1 📌 0

🙏 This is exactly what I’ve been saying for the past couple weeks. Yes, the not-see salute is bad, but hot damn has anyone seen the stuff that will really make an impact?

03.02.2025 08:30 👍 1 🔁 0 💬 0 📌 0

Large Language Models Reflect the Ideology of their Creators Large language models (LLMs) are trained on vast amounts of data to generate natural language, enabling them to perform tasks like text summarization and question answering. These models have become p...

Not sure about the app, but when running the model locally, it happily told me all about Tiananmen Square :D.

Read a really nice paper on this last year: arxiv.org/abs/2410.18417

03.02.2025 08:23 👍 2 🔁 0 💬 1 📌 0

It’s almost 2025. It’s pretty normal now

24.12.2024 12:37 👍 0 🔁 0 💬 0 📌 0

Bill Murray won’t age well in general. 🙃

24.12.2024 12:35 👍 1 🔁 0 💬 0 📌 0

It links directly to the substack. No need to be passive aggressive.

17.12.2024 10:20 👍 1 🔁 0 💬 0 📌 0

This resonated with me in a big way. Had a long conversation yesterday with my partner about just this. Do we struggle against the collapse, simply prepare for the new reality or indulge in a sort of leftist hedonism. It’s a weird thing to grapple with.

17.12.2024 10:19 👍 1 🔁 0 💬 1 📌 0

It’s a rough job market out there. It took my a year to get an offer for a senior role. I was just looking for a change, so it wasn’t urgent.
I absolutely do not envy juniors. It’s really up to seniors to push for mentorship and taking a chance on them.

17.12.2024 10:09 👍 0 🔁 0 💬 0 📌 0

Drowning in Documents: Consequences of Scaling Reranker Inference Rerankers, typically cross-encoders, are often used to re-score the documents retrieved by cheaper initial IR systems. This is because, though expensive, rerankers are assumed to be more effective. We...

Side note: It would have been nice to see precision reported in this study so as to best understand the quality of reranking.

arxiv.org/abs/2411.11767

09.12.2024 10:30 👍 0 🔁 0 💬 0 📌 0

Practically, this means that we either need to really make sure that our initial retrieval is as good as it can be or that the number of documents we retrieve needs to be controlled to make the best use of rerankers.

09.12.2024 10:30 👍 0 🔁 0 💬 1 📌 0

 A very common workflow is to fetch K documents and then rerank them as a post processing step. What this test finds is that the larger K is, the more diminishing the returns.

09.12.2024 10:30 👍 0 🔁 0 💬 1 📌 0

Drowning in Documents: Consequences of Scaling Reranker Inference

This paper conducts a simple test of the effectiveness of rerankers on large amounts of documents. It's really important to think about if you are using RAG a lot.

09.12.2024 10:30 👍 0 🔁 0 💬 1 📌 0

Is your issue with multi-agent systems:

* Complexity
* Ineffectiveness
* Scale/cost
* Something else?

03.12.2024 16:26 👍 2 🔁 0 💬 0 📌 0

It is, without a doubt, the best beer city in Germany.

03.12.2024 14:38 👍 1 🔁 0 💬 1 📌 0

I hope I am not late to the party (was away post-quals chilling) but here are some thoughts on why this is bad IMO:

First, a disclaimer that I am writing this as an African who is a speaker of multiple African languages, NLP researcher of African languages, and HCI researcher focusing broadly on..

02.12.2024 23:43 👍 125 🔁 60 💬 9 📌 8

Love this. Not to mention that whatever is SOTA for English and languages sharing similar properties to English, are not necessarily the best way to work with other languages and language families.

03.12.2024 10:22 👍 3 🔁 0 💬 0 📌 0

Anyone saying The Left must stay on Twitter to save democracy doesn’t understand how Twitter affects our psychology. Twitter makes money by disconnecting us from social reality and making us feel shitty about ourselves and each other.

01.12.2024 22:48 👍 568 🔁 110 💬 17 📌 4

Is it Bad to leave Twitter? No. Here are 7+ years of insights from my lab’s research that explain why.

Featuring work w/ @williambrady.bsky.social @killianmcloughlin.bsky.social

🧵

01.12.2024 22:48 👍 1445 🔁 544 💬 68 📌 105

Adding Error Bars to Evals: A Statistical Approach to Language Model Evaluations Evaluations are critical for understanding the capabilities of large language models (LLMs). Fundamentally, evaluations are experiments; but the literature on evaluations has largely ignored the liter...

It reduces noise, reduces and measures chance, and doesn’t treat eval tasks as a whole but separates them so that they can be better measured. If this trend takes off, I will definitely reverse my grumpiness around evaluation.

arxiv.org/abs/2411.00640

03.12.2024 10:03 👍 0 🔁 0 💬 0 📌 0

This paper from Anthropic very sensibly suggests that ML papers use very basic and standard statistical measures of impact, variance and difference when evaluating models and strategies.

03.12.2024 10:03 👍 0 🔁 0 💬 1 📌 0

Theres nothing more disinteresting to me as a new or fine-tuned model and its generic table of metric comparisons to other open and closed source models. When it comes down to it, most eval metrics don't really tell you a lot and a lot of it is left to chance.

03.12.2024 10:03 👍 0 🔁 0 💬 1 📌 0

Griffon

Latest posts by Griffon @ryancallihan