Amir Hossein Kargaran's Avatar

Amir Hossein Kargaran

@kargaranamir

PhD Student at @cislmu.bsky.social Multilingual NLP and LLMs Twitter: https://x.com/amir_nlp Homepage: https://kargaranamir.github.io

35
Followers
91
Following
6
Posts
25.11.2024
Joined
Posts Following

Latest posts by Amir Hossein Kargaran @kargaranamir

Are you working on multilingual, multicultural #LLM? Interested in diverse & inclusive language modeling?

😎 Stay tuned at our MELT workshop at #COLM2025

πŸ”— melt-workshop.github.io

We welcome 2p (EA), 4p (short), 8p (long) papers as well as talented reviewers:

πŸ”— forms.gle/MYcXED7RLJDS...

05.06.2025 08:39 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
GitHub - cisnlp/code-specific-neurons: πŸ’»πŸ” How Programming Concepts and Neurons Are Shared in Code Language Models πŸ’»πŸ” How Programming Concepts and Neurons Are Shared in Code Language Models - cisnlp/code-specific-neurons

This work has been accepted as a Findings paper at ACL 2025 (@aclmeeting), in collaboration with Yihong Liu, @yvofr.bsky.social, and Hinrich SchΓΌtze. Code available at: github.com/cisnlp/code-specific-neurons

03.06.2025 17:50 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

We observe both similarities and differences in how LLMs represent natural languages versus prgramming langauges.

03.06.2025 17:50 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

New paper: How does pretraining on programming languages + English shape LLMs' concept space?
πŸ” Do LLMs use English or a programming language as a kind of pivot language?
🧠 Are neurons language-specific or shared across programming languages and English?
πŸ”— arxiv.org/abs/2506.01074

03.06.2025 17:22 πŸ‘ 6 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0

Thanks to everyone who stopped by at our work! I’ll be at the conference until the closing night and would love to meet and connect with more people. Feel free to DM me here or on the Whova app.

13.12.2024 04:18 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
GlotCC: An Open Broad-Coverage CommonCrawl Corpus and Pipeline for Minority Languages The need for large text corpora has increased with the advent of pretrained language models and, in particular, the discovery of scaling laws for these models. Most available corpora have sufficient d...

πŸ‡¨πŸ‡¦ I'll be in Montreal December 4–8, then Vancouver for NeurIPS to present our work on pretraining data for minority languages (arxiv.org/abs/2410.23825). Looking forward to reconnecting and meeting new people. DM me if you want to meet in the upcoming days! :)

01.12.2024 21:18 πŸ‘ 0 πŸ” 1 πŸ’¬ 0 πŸ“Œ 1