4. #BabyLM challenge description paper, co-authored by Lucas Georges Gabriel Charpentier
babylm.github.io
4. #BabyLM challenge description paper, co-authored by Lucas Georges Gabriel Charpentier
babylm.github.io
3. "EdinHelsOW WMT 2025 CreoleMT System Description: Improving Lusophone Creole Translation through Data Augmentation, Model Merging and LLM Post-editing" by Jacqueline Rowe, Ona de Gibert, Mateusz Klimaszewski, Coleman Haley, Alexandra Birch and Yves Scherrer
(proc. of WMT)
www2.statmt.org/wmt25/
2. "Improved Norwegian BokmΓ₯l Translations for FLORES" by Petter MΓ¦hlum, Anders NΓ¦ss Evensen and Yves Scherrer
(in proceedings of the WMT 2025 workshop)
www2.statmt.org/wmt25/
1. "Explaining novel senses using definition generation with open language models" by Mariia Fedorova, Andrey Kutuzov, Francesco Periti, Yves Scherrer
(in EMNLP Findings)
arxiv.org/abs/2509.26181
The #EMNLP2025 conference is starting in two weeks in Suzhou, China. @emnlpmeeting.bsky.social @ltgoslo.bsky.social
The Oslo Language Technology Group will be there with at least four papers, see the threadπ§΅:
We're hiring! A postdoc-level researcher position in NLP, focusing on generative approaches to event extraction, is open at the University of Oslo. The contract is for 30 months. Closing date 11 Aug. Come join us! www.jobbnorge.no/en/available...
5. "Systematic Generalization in Language Models Scales with Information Entropy" by Sondre Wold, Lucas Charpentier, Γtienne Simon arxiv.org/abs/2505.13089 (ACL Findings)
See you in Vienna!
(end of π§΅)
4. "NorEval: A Norwegian Language Understanding and Generation Evaluation Benchmark" by Vladislav Mikhailov, Tita Enstad, David Samuel, Hans Christian FarsethΓ₯s, Andrey Kutuzov, Erik Velldal, and Lilja Γvrelid
arxiv.org/abs/2504.07749 (ACL Findings)
3. "Re-identification of De-identified Documents with Autoregressive Infilling" by Lucas Charpentier and Pierre Lison
arxiv.org/abs/2505.12859 (main ACL)
2. "Circuit Compositions: Exploring Modular Structures in Transformer-Based Language Models" by Philipp Mondorf, Sondre Wold (LTG), and Barbara Plank
arxiv.org/abs/2410.01434 (main ACL)
1. "An Expanded Massive Multilingual Dataset for High-Performance Language Technologies (HPLT)". LTG co-authors: Nikolay Arefyev, Mariia Fedorova, Andrey Kutuzov, Petter Mæhlum, Vladislav Mikhailov, Stephan Oepen, David Samuel and many others from hplt-project.org
arxiv.org/abs/2503.10267 (main ACL)
LTG β the Oslo Language Technology Group β will be presenting five papers at the #ACL2025NLP conference of @aclmeeting.bsky.social this summer in #Vienna, see paper descriptions below π§΅
You can find all our papers in the @nodalida.bsky.social proceedings:
dspace.ut.ee/items/5b6a0e...
π Multi-label Scandinavian Language Identification (SLIDE), by Fedorova et al.
π Interactive maps for corpus-based dialectology, by Scherrer et al.
π NorEventGen: generative event extraction from Norwegian news, by You et al.
π Mixed Feelings: Cross-Domain Sentiment Classification of Patient Feedback, by RΓΈnningstad et al.
π Large Language Models for Small Languages: A Study of Continual Pretraining on Languages of Norway, by Samuel et al.
π Benchmarking Abstractive Summarisation: A Dataset of Human-authored Summaries of Norwegian News Articles, by Touileb et al.
π A Collection of Question Answering Datasets for Norwegian, by Mikhailov et al.
π The Impact of Copyrighted Material on Large Language Models: A Norwegian Perspective, by de la Rosa et al.
Over the coming days, LTG will be presenting 8 fresh papers at the NoDaLiDa/Baltic-HLT conference in Tallinn π₯
Several of these represent collaborations with colleagues from UiB, NTNU, and the National Library of Norway. π€
Come see us if your'e at #NoDaLiDa
See list of papers in the π§΅ below: