Can you train a performant language model using only openly licensed text?
We are thrilled to announce the Common Pile v0.1, an 8TB dataset of openly licensed and public domain text. We train 7B models for 1T and 2T tokens and match the performance similar models like LLaMA 1 & 2
06.06.2025 19:18
π 147
π 59
π¬ 2
π 2
Damn, where are these parties iβm missing π
09.03.2025 02:01
π 2
π 0
π¬ 0
π 0
Technically, we do but a lot of that goes paying tuition. Not unlike the 20k for these agents going towards GPU compute π€ͺ
07.03.2025 03:58
π 0
π 0
π¬ 1
π 0
Maybe he thought it was βlocker room talkβ π€ͺ
17.02.2025 23:53
π 0
π 0
π¬ 0
π 0
They're future-proofing the design π
02.12.2024 05:58
π 2
π 0
π¬ 1
π 0
The `decision model n` is being directed by mission control and then forwards a signal to `big data`?? I guess no decision was ever made πππ
02.12.2024 05:23
π 1
π 0
π¬ 1
π 0
Maybe. But probably more likely, they're using QwQ or Deepseek.
02.12.2024 04:21
π 4
π 0
π¬ 1
π 0
Transformers demonstrated how to attend an entire sequence length which at the time was different to many approaches like LSTM that processed tokens sequentially. The attention span across the whole sequence does parallel the aliens from Arrival.
01.12.2024 16:13
π 5
π 0
π¬ 0
π 0
πββοΈ
24.11.2024 00:23
π 1
π 0
π¬ 0
π 0
Attended 2 different lectures (1 class and 1 invited guest lecture) with the similar topic of inference-time scaling. Maybe the matrix is trying to tell me something.
22.11.2024 02:14
π 0
π 0
π¬ 0
π 0
Lectures in #nlp I see that use Taylor Swift to illustrate concepts.
21.11.2024 20:44
π 1
π 0
π¬ 0
π 0
@eleutherai.bsky.social is our official account. Will be posting here and on Twitter from now on.
20.11.2024 14:18
π 20
π 3
π¬ 2
π 0
LTI PhDs seeking refuge in Bluesky
go.bsky.app/NhTwCVb
07.11.2024 16:46
π 4
π 0
π¬ 0
π 0
Hi, I would also like to be included in this list!
07.11.2024 16:45
π 1
π 0
π¬ 0
π 0