's Avatar

@ltgoslo

The Language Technology Group (LTG) at the University of Oslo, Norway do research on a range of topics in Natural Language Processing (NLP), including language modeling for Norwegian and other languages.

56
Followers
31
Following
18
Posts
03.03.2025
Joined
Posts Following

Latest posts by @ltgoslo

4. #BabyLM challenge description paper, co-authored by Lucas Georges Gabriel Charpentier

babylm.github.io

21.10.2025 15:28 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
WMT 2025

3. "EdinHelsOW WMT 2025 CreoleMT System Description: Improving Lusophone Creole Translation through Data Augmentation, Model Merging and LLM Post-editing" by Jacqueline Rowe, Ona de Gibert, Mateusz Klimaszewski, Coleman Haley, Alexandra Birch and Yves Scherrer
(proc. of WMT)
www2.statmt.org/wmt25/

21.10.2025 15:28 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
WMT 2025

2. "Improved Norwegian BokmΓ₯l Translations for FLORES" by Petter MΓ¦hlum, Anders NΓ¦ss Evensen and Yves Scherrer
(in proceedings of the WMT 2025 workshop)
www2.statmt.org/wmt25/

21.10.2025 15:27 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
Explaining novel senses using definition generation with open language models We apply definition generators based on open-weights large language models to the task of creating explanations of novel senses, taking target word usages as an input. To this end, we employ the datas...

1. "Explaining novel senses using definition generation with open language models" by Mariia Fedorova, Andrey Kutuzov, Francesco Periti, Yves Scherrer
(in EMNLP Findings)
arxiv.org/abs/2509.26181

21.10.2025 15:26 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

The #EMNLP2025 conference is starting in two weeks in Suzhou, China. @emnlpmeeting.bsky.social @ltgoslo.bsky.social

The Oslo Language Technology Group will be there with at least four papers, see the thread🧡:

21.10.2025 15:25 πŸ‘ 1 πŸ” 1 πŸ’¬ 4 πŸ“Œ 0
Preview
Researcher in Natural Language Processing (283057) | University of Oslo Job title: Researcher in Natural Language Processing (283057), Employer: University of Oslo, Deadline: Monday, August 11, 2025

We're hiring! A postdoc-level researcher position in NLP, focusing on generative approaches to event extraction, is open at the University of Oslo. The contract is for 30 months. Closing date 11 Aug. Come join us! www.jobbnorge.no/en/available...

30.06.2025 12:00 πŸ‘ 1 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0
Preview
Systematic Generalization in Language Models Scales with Information Entropy Systematic generalization remains challenging for current language models, which are known to be both sensitive to semantically similar permutations of the input and to struggle with known concepts pr...

5. "Systematic Generalization in Language Models Scales with Information Entropy" by Sondre Wold, Lucas Charpentier, Γ‰tienne Simon arxiv.org/abs/2505.13089 (ACL Findings)

See you in Vienna!
(end of 🧡)

10.06.2025 08:26 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
NorEval: A Norwegian Language Understanding and Generation Evaluation Benchmark This paper introduces NorEval, a new and comprehensive evaluation suite for large-scale standardized benchmarking of Norwegian generative language models (LMs). NorEval consists of 24 high-quality hum...

4. "NorEval: A Norwegian Language Understanding and Generation Evaluation Benchmark" by Vladislav Mikhailov, Tita Enstad, David Samuel, Hans Christian FarsethΓ₯s, Andrey Kutuzov, Erik Velldal, and Lilja Øvrelid
arxiv.org/abs/2504.07749 (ACL Findings)

10.06.2025 08:25 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
Re-identification of De-identified Documents with Autoregressive Infilling Documents revealing sensitive information about individuals must typically be de-identified. This de-identification is often done by masking all mentions of personally identifiable information (PII), ...

3. "Re-identification of De-identified Documents with Autoregressive Infilling" by Lucas Charpentier and Pierre Lison
arxiv.org/abs/2505.12859 (main ACL)

10.06.2025 08:23 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
Circuit Compositions: Exploring Modular Structures in Transformer-Based Language Models A fundamental question in interpretability research is to what extent neural networks, particularly language models, implement reusable functions through subnetworks that can be composed to perform mo...

2. "Circuit Compositions: Exploring Modular Structures in Transformer-Based Language Models" by Philipp Mondorf, Sondre Wold (LTG), and Barbara Plank
arxiv.org/abs/2410.01434 (main ACL)

10.06.2025 08:22 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
HPLT - High Performance Language Technologies A space that combines petabytes of natural language data with large-scale model training

1. "An Expanded Massive Multilingual Dataset for High-Performance Language Technologies (HPLT)". LTG co-authors: Nikolay Arefyev, Mariia Fedorova, Andrey Kutuzov, Petter Mæhlum, Vladislav Mikhailov, Stephan Oepen, David Samuel and many others from hplt-project.org
arxiv.org/abs/2503.10267 (main ACL)

10.06.2025 08:20 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

LTG – the Oslo Language Technology Group – will be presenting five papers at the #ACL2025NLP conference of @aclmeeting.bsky.social this summer in #Vienna, see paper descriptions below 🧡

10.06.2025 08:18 πŸ‘ 4 πŸ” 3 πŸ’¬ 5 πŸ“Œ 0

You can find all our papers in the @nodalida.bsky.social proceedings:
dspace.ut.ee/items/5b6a0e...

03.03.2025 11:38 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

πŸ“„ Multi-label Scandinavian Language Identification (SLIDE), by Fedorova et al.

πŸ“„ Interactive maps for corpus-based dialectology, by Scherrer et al.

03.03.2025 11:32 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

πŸ“„ NorEventGen: generative event extraction from Norwegian news, by You et al.

πŸ“„ Mixed Feelings: Cross-Domain Sentiment Classification of Patient Feedback, by RΓΈnningstad et al.

03.03.2025 11:31 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

πŸ“„ Large Language Models for Small Languages: A Study of Continual Pretraining on Languages of Norway, by Samuel et al.

πŸ“„ Benchmarking Abstractive Summarisation: A Dataset of Human-authored Summaries of Norwegian News Articles, by Touileb et al.

03.03.2025 11:31 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

πŸ“„ A Collection of Question Answering Datasets for Norwegian, by Mikhailov et al.

πŸ“„ The Impact of Copyrighted Material on Large Language Models: A Norwegian Perspective, by de la Rosa et al.

03.03.2025 11:30 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Over the coming days, LTG will be presenting 8 fresh papers at the NoDaLiDa/Baltic-HLT conference in Tallinn πŸ”₯

Several of these represent collaborations with colleagues from UiB, NTNU, and the National Library of Norway. 🀝

Come see us if your'e at #NoDaLiDa

See list of papers in the 🧡 below:

03.03.2025 11:29 πŸ‘ 5 πŸ” 2 πŸ’¬ 5 πŸ“Œ 0