Tanise Ceron's Avatar

Tanise Ceron

@taniseceron

Postdoc @milanlp.bsky.social | Interested in language models and how they shape the information environment

141
Followers
161
Following
38
Posts
09.02.2025
Joined
Posts Following

Latest posts by Tanise Ceron @taniseceron

Come join our group! Still one day left for applying. 😊

30.01.2026 18:01 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

3) Aligning LLMs on political opinions with English data transfers to all other Western languages we've evaluated on: FR, IT, GE, ES.

Congrats team for the acceptance and for the great work! @franziweeber.bsky.social will be presenting it in person at EACL between March 24–29. 😊

30.01.2026 17:59 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Models do become more right-leaning on close-ended questions, but they only become a little less left-leaning on open-ended evaluations such as writing opinionated paragraphs on certain political issues.

30.01.2026 17:59 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

2) Aligning LLMs with DPO on right-leaning opinions does have an impact on the stance of the models. However, this comes with a caveat.

30.01.2026 17:59 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Some findings that I find particularly impactful for the area of political biases in LLMs:

1) Aligning LLMs with DPO on left-leaning opinions does not have a significant impact on the stance of the models given that vanilla LLMs already reflect a more left-leaning alignment.

30.01.2026 17:59 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

πŸš€ We’re opening 2 fully funded postdoc positions in #NLP!

Join the MilaNLP team and contribute to our upcoming research projects.

πŸ”— More details: milanlproc.github.io/open_positio...

⏰ Deadline: Jan 31, 2026

18.12.2025 15:29 πŸ‘ 19 πŸ” 13 πŸ’¬ 0 πŸ“Œ 2

I will be @euripsconf.bsky.social this week to present our paper as non-archival at the PAIG workshop (Beyong Regulation:
Private Governance & Oversight Mechanisms for AI). Very much looking forward to the discussions!

If you are at #EurIPS and want to chat about LLM's training data. Reach out!

02.12.2025 21:47 πŸ‘ 9 πŸ” 4 πŸ’¬ 0 πŸ“Œ 0

We could fool ourselves saying that it's because there's no panettone in other periods of the year :P

27.11.2025 19:32 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

We go out of the routine every now and then at the lab. :)

27.11.2025 16:08 πŸ‘ 4 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Preview
Open Source Generative AI Index: openness leaderboard Evidence-based assessment of Generative AI openness: a comprehensive index comparing LLMs, text-to-image models, audio, and other Generative AI models

Partial answer to my question:
osai-index.eu/the-index?ty...

24.11.2025 12:51 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

In this paper, we investigate how well media frames generalize across different media landscapes. The 15 MFC frames remain broadly applicable, but requires revisions of the guidelines to adapt to the local context.

More on aclanthology.org/2025.starsem...

24.11.2025 10:36 πŸ‘ 4 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0

@agnesedaff.bsky.social presented our work on "Generalizability of Media Frames: Corpus creation and analysis across countries" at *SEM co-located with EMNLP 2025 in China.

24.11.2025 10:36 πŸ‘ 8 πŸ” 1 πŸ’¬ 2 πŸ“Œ 0

@mmitchell.bsky.social

18.11.2025 06:36 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Does anyone know any good resource that systematically documents information about the training data of different LLMs (e.g. name of datasets, language proportion, etc whenever available)?

18.11.2025 06:27 πŸ‘ 2 πŸ” 0 πŸ’¬ 2 πŸ“Œ 0
Post image Post image

Proud to present our #EMNLP2025 papers!
Catch our team across Main, Findings, Workshops & Demos πŸ‘‡

31.10.2025 14:04 πŸ‘ 11 πŸ” 4 πŸ’¬ 12 πŸ“Œ 2

Great, thanks a lot!

19.10.2025 09:59 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

As I wasn't at the conference, I'd love to be able to watch the recording. Is it available online anywhere? :)

16.10.2025 09:01 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Great collaboration with Dmitry Nikolaev, @dominsta.bsky.social and @deboranozza.bsky.social ☺️

29.09.2025 14:54 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

- Finally, and for me, most interestingly, our analysis suggests that political biases are already encoded during the pre-training stage.

Taken these evidences together, we highlight important implications these results play on data processing in the development of fairer LLMs.

29.09.2025 14:54 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

- There's a strong correlation (Pearson r=0.90) between the predominant stances in the training data and the models’ behavior when probed for political bias on eight policy issues (e.g., environmental protection, migration, etc).

29.09.2025 14:54 πŸ‘ 2 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

- Source domains of pre-training documents differ significantly, with right-leaning content containing twice as many blog posts and left-leaning content 3 times as many news outlets.

29.09.2025 14:54 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

- The framing of political topics varies considerably: right-leaning labeled documents prioritize stability, sovereignty, and cautious reform via technology or deregulation, while left-leaning documents emphasize urgent, science-led mobilization for systemic transformation and equity.

29.09.2025 14:54 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

- left-leaning documents consistently outnumber right-leaning ones by a factor of 3 to 12 across training datasets.
- pre-training corpora contains about 4 times more politically engaged content than post-training data.

29.09.2025 14:54 πŸ‘ 4 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

We have the answers of these questions here : arxiv.org/pdf/2509.22367

We analyze theΒ political content of the training data from OLMO2, the largest fully open-source model.
πŸ•΅οΈβ€β™€οΈ We run an analysis in all the datasets (2 pre- and 2 post-training) used to train the models. Here are our findings:

29.09.2025 14:54 πŸ‘ 5 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

πŸ“£ New Preprint!
Have you ever wondered what the political content in LLM's training data is? What are the political opinions expressed? What is the proportion of left- vs right-leaning documents in the pre- and post-training data? Do they correlate with the political biases reflected in models?

29.09.2025 14:54 πŸ‘ 47 πŸ” 14 πŸ’¬ 2 πŸ“Œ 1

Tanise Ceron, Dmitry Nikolaev, Dominik Stammbach, Debora Nozza: What Is The Political Content in LLMs' Pre- and Post-Training Data? https://arxiv.org/abs/2509.22367 https://arxiv.org/pdf/2509.22367 https://arxiv.org/html/2509.22367

29.09.2025 06:31 πŸ‘ 1 πŸ” 3 πŸ’¬ 0 πŸ“Œ 0

Thanks SoftwareCampus for supporting Multiview, the organizers of INRA, and Sourabh Dattawad and @agnesedaff.bsky.social for the great collaboration!

26.09.2025 16:20 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Our evaluation with normative metrics shows that this approach does not diversify only frames in user's history, but also sentiment and news categories. These findings demonstrate that framing acts as a control lever for enhancing normative diversity.

26.09.2025 16:20 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

In this paper, we propose introduce media frames as a device for diversifying perspectives in news recommenders. Our results show an improvement in exposure to previously unclicked frames up to 50%.

26.09.2025 16:20 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

Today Sourabh Dattawad presented our work "Leveraging Media Frames to Improve Normative Diversity in News Recommendations" at INRA (International Workshop on News Recommendation and Analytics) co-located with RecSys 2025 in Prague.
arxiv.org/pdf/2509.02266

26.09.2025 16:20 πŸ‘ 2 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0