Lisa Bylinina's Avatar

Lisa Bylinina

@bylinina

linguist bylinina.github.io

236
Followers
189
Following
25
Posts
19.11.2024
Joined
Posts Following

Latest posts by Lisa Bylinina @bylinina

there is this guy on my flight to shanghai sitting next to me checking out the emnlp program and a bunch of papers and his bsky feed is 90% emnlp stuff and on the one hand it would be nice to chat but on the other hand it‘s a 12-hour flight so maybe i it’a better if i focus on my netfix downloads..

05.11.2025 13:14 👍 4 🔁 0 💬 0 📌 0
Post image

I will be attending EMNLP in China to present our paper with @bylinina.bsky.social (who will be in China, too) and Jakub Dotlacil in the BabyLM workshop! Looking forward to meeting people there! ✨ 😊 #EMNLP2025 @emnlpmeeting.bsky.social

lnkd.in/e-Bzz6De

01.11.2025 15:52 👍 12 🔁 3 💬 1 📌 0

oh super-interesting

28.09.2025 10:20 👍 1 🔁 0 💬 0 📌 0

who'll be at emnlp?

23.09.2025 12:41 👍 0 🔁 0 💬 0 📌 0
Post image

got a tiny (approx 50k) grant from NWO to do something about whether (instruction-tuned) lms are an 'agent', a superposition of agents, what's going on there epistemically and also how people interact with these 'personae' -- we'll seeeeee www.nwo.nl/en/researchp...

12.05.2025 14:30 👍 7 🔁 1 💬 0 📌 0
NSF Grant Termination Information Collection Form

Please use this form to submit information identifying specific NSF grants that have been cancelled for any reason after January 20, 2025.


We are tracking these grants to increase transparency, organize affected PIs, and facilitate responses, including via litigation. Please share the form as widely as possible with your networks. 


We are actively building a pipeline to organize these terminations and will soon have a tracker akin to our NIH grant tracker at https://airtable.com/appjhyo9NTvJLocRy/shrNto1NNp9eJlgpA


WE WILL NOT DISCLOSE THE IDENTITY OF ANYONE WHO USES THIS FORM TO PROVIDE INFORMATION. We will keep your identity confidential.


These resources are maintained by Noam Ross of rOpenSci and Scott Delaney of the Harvard T.H. Chan School of Public Health, with input and support from additional volunteers. For any questions, please contact Scott Delaney on Signal (sdelaney.84).


THANK YOU FOR YOUR ASSISTANCE!

NSF Grant Termination Information Collection Form Please use this form to submit information identifying specific NSF grants that have been cancelled for any reason after January 20, 2025. We are tracking these grants to increase transparency, organize affected PIs, and facilitate responses, including via litigation. Please share the form as widely as possible with your networks. We are actively building a pipeline to organize these terminations and will soon have a tracker akin to our NIH grant tracker at https://airtable.com/appjhyo9NTvJLocRy/shrNto1NNp9eJlgpA WE WILL NOT DISCLOSE THE IDENTITY OF ANYONE WHO USES THIS FORM TO PROVIDE INFORMATION. We will keep your identity confidential. These resources are maintained by Noam Ross of rOpenSci and Scott Delaney of the Harvard T.H. Chan School of Public Health, with input and support from additional volunteers. For any questions, please contact Scott Delaney on Signal (sdelaney.84). THANK YOU FOR YOUR ASSISTANCE!

🚨Report your NSF grant terminations! 🚨

We are starting to collect information on NSF grant terminations to create a shared resource as we have for NIH. The more information we collect, the more we can organize, advocate, and fight back! Please share widely!

airtable.com/appGKlSVeXni...

19.04.2025 00:11 👍 640 🔁 664 💬 7 📌 50
Preview
Cutting international bachelor programs threatens psychological science » Eiko Fried Two days ago, four Dutch universities announced discontinuing their English-speaking psychology bachelor programs (1, 2). I will briefly explain (1) how this decision came to be, (2) why this is such ...

Four large Dutch universities, including Leiden University where I work, have decided to throw international psychology bachelor programs under the bus in an effort to appease the rightwing government.

Here's my blog why this is a terrible idea.

eiko-fried.com/cutting-inte...

17.04.2025 22:44 👍 229 🔁 102 💬 10 📌 11

i just need students to see the difference between base and instruction-tuned models trying out different types of prefixes, without them needing to write any code or send their info anywhere

16.04.2025 16:48 👍 0 🔁 0 💬 0 📌 0

do we know a pair of base vs. instruct models that are both deployed by an inference provider on hf (or maybe a hf space but less preferable..) AND that don't require students sending their info for the license agreement?

16.04.2025 16:48 👍 0 🔁 0 💬 2 📌 0
Preview
Going beyond open data – increasing transparency and trust in language models with OLMoTrace | Ai2 OLMoTrace lets you trace the outputs of language models back to their full, multi-trillion-token training data in real time.

oh wow ok allenai.org/blog/olmotrace

10.04.2025 09:15 👍 7 🔁 1 💬 1 📌 0

i mean i'd be really surprised if what lms generate as 'reasoning' text faithfully reflected the ways they come up with the answer. like, what would guarantee that

05.04.2025 13:50 👍 6 🔁 0 💬 1 📌 0

nice!!

05.04.2025 11:16 👍 1 🔁 0 💬 0 📌 0
from minicons import scorer
from nltk.tokenize import TweetTokenizer

lm = scorer.IncrementalLMScorer("gpt2")

# your own tokenizer function that returns a list of words
# given some sentence input
word_tokenizer = TweetTokenizer().tokenize

# word scoring
lm.word_score_tokenized(
    ["I was a matron in France", "I was a mat in France"], 
    bos_token=True, # needed for GPT-2/Pythia and NOT needed for others
    tokenize_function=word_tokenizer,
    bow_correction=True, # Oh and Schuler correction
    surprisal=True,
    base_two=True
)

'''
First word = -log_2 P(word | <beginning of text>)

[[('I', 6.1522440910339355),
  ('was', 4.033324718475342),
  ('a', 4.879510402679443),
  ('matron', 17.611848831176758),
  ('in', 2.5804288387298584),
  ('France', 9.036953926086426)],
 [('I', 6.1522440910339355),
  ('was', 4.033324718475342),
  ('a', 4.879510402679443),
  ('mat', 19.385351181030273),
  ('in', 6.76780366897583),
  ('France', 10.574726104736328)]]
'''

from minicons import scorer from nltk.tokenize import TweetTokenizer lm = scorer.IncrementalLMScorer("gpt2") # your own tokenizer function that returns a list of words # given some sentence input word_tokenizer = TweetTokenizer().tokenize # word scoring lm.word_score_tokenized( ["I was a matron in France", "I was a mat in France"], bos_token=True, # needed for GPT-2/Pythia and NOT needed for others tokenize_function=word_tokenizer, bow_correction=True, # Oh and Schuler correction surprisal=True, base_two=True ) ''' First word = -log_2 P(word | <beginning of text>) [[('I', 6.1522440910339355), ('was', 4.033324718475342), ('a', 4.879510402679443), ('matron', 17.611848831176758), ('in', 2.5804288387298584), ('France', 9.036953926086426)], [('I', 6.1522440910339355), ('was', 4.033324718475342), ('a', 4.879510402679443), ('mat', 19.385351181030273), ('in', 6.76780366897583), ('France', 10.574726104736328)]] '''

another day another minicons update (potentially a significant one for psycholinguists?)

"Word" scoring is now a thing! You just have to supply your own splitting function!

pip install -U minicons for merriment

02.04.2025 03:35 👍 21 🔁 7 💬 3 📌 0

ah that's great, makes a lot of things much faster to try out!

02.04.2025 08:55 👍 1 🔁 0 💬 0 📌 0

or what happened!

28.03.2025 11:23 👍 0 🔁 0 💬 0 📌 0

you have to tell me which starter pack i apparently suddenly ended up in

28.03.2025 11:18 👍 1 🔁 0 💬 1 📌 0

Sounds familiar

20.03.2025 14:08 👍 1 🔁 0 💬 0 📌 0
Post image

the tiny books have arrived

21.01.2025 12:33 👍 27 🔁 0 💬 1 📌 0

Forthcoming titles in Elements in Semantics: 1. Abzianidze, @bylinina.bsky.social, Paperno Deep Learning and Semantics. 2.K. Davidson Semantics of Depiction, 3. Chatzikyriakidis Cooper, Gregoromichelaki, Sutton Types and the structure of meaning: Issues in compositional and lexical semantics
(2/3)

15.12.2024 22:15 👍 3 🔁 2 💬 0 📌 0

waluigi!!

13.12.2024 09:44 👍 1 🔁 0 💬 0 📌 0

all invitations i find in my inbox are actually invitations to work a bit more

04.12.2024 08:54 👍 4 🔁 0 💬 0 📌 0

yeah it's super-interesting to me somehow suddenly which i didn't expect and i don't know what to do with it but i'll just be curious about it i guess

02.12.2024 10:53 👍 1 🔁 0 💬 1 📌 0

this is so cool - it's the 2nd time i see this thread and again i think how cool it is. you know why? well for obv reasons but also bc i've been thinking recently about how the linguistic will of one person or group of people (prescriptive organizations but not necessarily) can do things to language

02.12.2024 10:46 👍 1 🔁 0 💬 1 📌 0

... buying out research time with grant budgets -- most likely gone. maybe that's just the reality of an assistant prof position (and up), maybe also amplified by budget cuts -- but is that it? am i just going to be talking most of the time rather than doing anything? depressing really

30.11.2024 10:00 👍 2 🔁 0 💬 0 📌 0

in order to actually do smth in research directions i'm interested in i need some bandwidth: research time, phd students to work with, experiment budgets. in nl it's getting more and more complicated (for obv reasons): some ways to get phd students are frozen, some grants not announced anymore..

30.11.2024 09:59 👍 3 🔁 0 💬 1 📌 0
Preview
Semantics and Deep Learning Cambridge Core - Philosophy of Mind and Language - Semantics and Deep Learning

this thing is coming out soon btw! www.cambridge.org/core/element...

26.11.2024 15:42 👍 3 🔁 0 💬 0 📌 0

meeee!

26.11.2024 15:40 👍 2 🔁 0 💬 0 📌 0

nah i wasn’t serious

19.11.2024 18:42 👍 0 🔁 0 💬 0 📌 0

thx!! now i’m annoyed i’m not in it

19.11.2024 17:06 👍 0 🔁 0 💬 1 📌 0

linguists? computational linguists? nlp people? semantics people? anybody?

19.11.2024 15:28 👍 8 🔁 1 💬 3 📌 0