Raphael Schumann's Avatar

Raphael Schumann

@schumann

Natural Language Processing PhD Student @ Heidelberg University. https://schumann.pub #NLP #NLProc #ML #AI

1,802
Followers
868
Following
12
Posts
13.09.2023
Joined
Posts Following

Latest posts by Raphael Schumann @schumann

Same boat as your AC

02.03.2025 11:13 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

Could you add me please?

14.01.2025 18:31 ๐Ÿ‘ 5 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

CBOW vs. Skip-gram

20.12.2024 11:59 ๐Ÿ‘ 6 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Great work! Are you going to release the models?

14.12.2024 11:16 ๐Ÿ‘ 6 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

A starter pack for #NLP #NLProc researchers! ๐ŸŽ‰

go.bsky.app/SngwGeS

04.11.2024 10:01 ๐Ÿ‘ 251 ๐Ÿ” 99 ๐Ÿ’ฌ 45 ๐Ÿ“Œ 13

#EMNLP has a nice set of tokenization/subword modeling papers this year.

It's a good mix of tokenization algorithms, tokenization evaluation, tokenization-free methods, and subword embedding probing. Lmk if I missed some!

Here is a list with links + presentation time (in chronological order).

11.11.2024 22:38 ๐Ÿ‘ 47 ๐Ÿ” 16 ๐Ÿ’ฌ 5 ๐Ÿ“Œ 2

First time ML/NLP Bluesky feels alive.

07.11.2024 21:39 ๐Ÿ‘ 3 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

This helped a lot!

07.11.2024 21:27 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

I make sure to even delete paths with my username from code in supplementary material

05.01.2024 15:49 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
State of the art - ACL Wiki

TIL that the ACL Wiki has/had a state-of-the-art overview:

aclweb.org/aclwiki/Stat...

27.11.2023 09:12 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Post image

It also works with Flash Attention 2, although I don't see additional speedups. I don't think FA is optimized for generation.

13.10.2023 11:35 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Preview
Using padding and prefill during inference in huggingface transformers Using padding and prefill during inference in huggingface transformers - run_padding_prefill.py

Conceptually it is clear that this works but I wasn't aware that huggingface passes this through correctly.
Github Gist to reproduce:
gist.github.com/raphael-sch/...

13.10.2023 11:35 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

You have to place the padding tokens in between the prefill and input tokens (example with 3 prefilled tokens):
input_ids: [0, 0, X, X, X, X]
position_ids: [0, 0, 3, 4, 5, 6]
attn_mask: [1, 1, 1, 0, 0, 1, 1, 1, 1]

13.10.2023 11:35 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

Turns out that with the right attention_mask and position_ids you can prefill tokens AND pad batches in huggingface transformers. This speeds up inference, especially if if each instance has the same system prompt prepended. Code below โ†“

13.10.2023 11:34 ๐Ÿ‘ 4 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 1