Excited to present two papers at #ACL2025!
ποΈ30 July, 11 AM: πΏ-Stance: A Large-Scale Real World Dataset of Stances in Legal Argumentation. w/ Douglas Rice and @brenocon.bsky.social
πAt Hall 4/5. π§΅π
Excited to present two papers at #ACL2025!
ποΈ30 July, 11 AM: πΏ-Stance: A Large-Scale Real World Dataset of Stances in Legal Argumentation. w/ Douglas Rice and @brenocon.bsky.social
πAt Hall 4/5. π§΅π
A poster for "Culture is not Trivia: sociocultural theory for cultural NLP" which takes the form of a flow-chart. The central question, and the starting point of the flow chart, is "What is culture in cultural NLP?" An arrow is labeled "wait, so what's cultural NLP?" This leads to a block explaining that the goals of cultural NLP are described in section 2 of the paper. They include inclusivity, depth, discerning, and adaptiveness. That leads to an arrow that says "that sounds great!". But there are recurring challenges in this kind of work! Section 3 surveys some of these: a discomfort around the proxies being chosen, a lack of coverage, and a lack of dynamicity. That in turn leads to an arrow labeled "Hm, sounds like we need to figure out..." and it leads back to the main question: "What is culture in cultural NLP?" A final arrow extends below this block: "Well, who's to say, really?" This points to sociocultural linguistics. Section 4 explores how other disciplines, like sociolinguistics, linguistic anthropology, and discourse analysis have faced similar challenges in the past. Section 4.2 gives an overview of sociocultural linguistics, which is a set of principles tying together some convergent themes: emergence, positionality, indexicality, relationality, and partialness. One arrow extends from this asking, "what's that have to do with cultural NLP?" Section 5 gives a case study of how indexicality clarifies how to think about stereotypies in the context of mining cultual knowledge from the web. Another arrow says "How can I build safe NLP systems?" Section 6.2 explores how localization can serve as a useful model from building culturally aware technologies because it forces developers to define culture explicitly and tractably. Finally, an arrow asks "how can I study culture with NLP methods?" Section 6.1 lays out theoretically motivated directions for future empirical and theoretical work in computationally modeling culture.
I'm thrilled to be doing an oral presentation on "Culture is not Trivia" at #ACL2025 next Wednesday 7/30, as well as participating in the human-centered NLP panel afterwards!
(thanks also @lauraknelson.bsky.social for the shoutout in her #ic2s2 keynote today!)
aclanthology.org/2025.acl-lon...
Research Borderlands is now on ACL anthology.
aclanthology.org/2025.acl-lon...
Come hear me talk about it at #IC2S2 in the plenary talks tomorrow, 24 July, after the morning keynote and in the poster session after lunch.
I will also be at #acl2025, presenting the poster at 11 AM on Wed, 30th July.
If you are interested in YouTube data, I'm giving a talk at 2:30pm today with the Online Platforms and Algorithms II session in Vingen 3+4.
#ic2s2
βwhich came first, the pun, or the research project?β
*academics, head hung low, whispering*
βtheβ¦ the punβ
How can we generate synthetic data for a task that requires global reasoning over a long context (e.g., verifying claims about a book)? LLMs aren't good at *solving* such tasks, let alone generating data for them. Check out our paper for a compression-based solution!
Culture is not trivia: sociocultural theory for cultural NLP. By Naitian Zhou and David Bamman from the Berkeley School of Information and Isaac L. Bleaman from Berkeley Linguistics.
There's been a lot of work on "culture" in NLP, but not much agreement on what it is.
A position paper by me, @dbamman.bsky.social, and @ibleaman.bsky.social on cultural NLP: what we want, what we have, and how sociocultural linguistics can clarify things.
Website: naitian.org/culture-not-...
1/n
Bill Labov died this morning. I'm not coherent enough to talk about how important and influential and brilliant he was. I am very sad.
I was so lucky to know him, and I am grateful every day that he (and Gillian, and Walt, etc) built an academic field where kindness is expected.
Our paper was accepted to #COLING! If you work on low-resource MT and have ever found yourself limited to bible data, you might find this interesting.
We're hiring new #nlp faculty this year!
Asst or Assoc Professors in NLP at UMass CICS --
careers.umass.edu/amherst/en-u...
If it's okay / not full, mind adding me to the list?
Screenshot of the abstract for the paper, "Once More, With Feeling: Measuring Emotion of Acting Performances in Contemporary American Film". The abstract reads: "Narrative film is a composition of writing, cinematography, editing, and performance. While much computational work has focused on the writing or visual style in film, we conduct in this paper a computational exploration of acting performance. Applying speech emotion recognition modes and a variationist sociolinguistic analytical framework to a corpus of popular, contemporary American film, we find narrative structure, diachronic shifts, and genre- and dialogue-based constraints located in spoken performances.
π¬ Coming soon to a theater near you!πΏ
Film is a semiotically rich medium: meaning is conveyed through the music, visuals, language, and more. A new paper from me and @dbamman.bsky.social explores what it means to computationally study performance in film.
Website: naitian.org/once-more-wi...
1/n
A photo of Boulder, Colorado, shot from above the university campus and looking toward the Flatirons.
I'm recruiting 1-2 PhD students to work with me at the University of Colorado Boulder! Looking for creative students with interests in #NLP and #CulturalAnalytics.
Boulder is a lovely college town 30 minutes from Denver and 1 hour from Rocky Mountain National Park π
Apply by December 15th!
If you're still at #EMNLP2024 check out this work by first year PhD student @rohan-das.bsky.social. Our main idea is to look at media framing through the chains of events they choose to highlight
Update: seems like the Underline schedules are still up for most events, and they tend to sort in-person presentations by track, so that just leaves out the Virtual Posters / Findings not on OpenReview and TACL.
Does anyone know a good way to filter ACL/EMNLP papers by track? I wanted to make a feed for CSS&CA papers, but the only places I can find the tracks are ARR OpenReview and the pdf handbook for EMNLP.
#EMNLP has a nice set of tokenization/subword modeling papers this year.
It's a good mix of tokenization algorithms, tokenization evaluation, tokenization-free methods, and subword embedding probing. Lmk if I missed some!
Here is a list with links + presentation time (in chronological order).
π’ Check out the lineup of papers our students will be showcasing at #EMNLP2024 in Miami next week! π΄ We'll be presenting new work on morphology, Q&A, and narratives.π