Shivani Kumar (@shivanikumar)

Morality in AI is often oversimplified. @davidjurgens.bsky.social and @shivanikumar.bsky.social kick off the "Human-Centred NLP" orals #ACL2025NLP with UniMoral, a huge dataset of moral scenario ratings in 6 languages! They find LLMs fail to simulated human moral decisions. bsky.app/profile/shiv...

30.07.2025 07:14 👍 10 🔁 2 💬 1 📌 0

Work done at #UMSI with the amazing @davidjurgens.bsky.social! Read more in our preprint! 🔗
📄 Paper: arxiv.org/abs/2502.14083
📂 Dataset: huggingface.co/datasets/shi...

@umichresearch.bsky.social #umichresearch #umich
(n/n)

01.03.2025 00:56 👍 2 🔁 0 💬 0 📌 1

🏁 Final verdict? Across languages & contexts, models struggle to exceed chance in moral reasoning, highlighting gaps, especially in data-scarce languages.
UniMoral supports studies on cross-cultural moral generalization, bias detection, & value quantification to enhance ethics in AI! (8/n)

01.03.2025 00:56 👍 2 🔁 0 💬 1 📌 0

Are models better at psychological vs. real-world dilemmas?

👍 Yes, models perform better on psychological scenarios than Reddit dilemmas.
The gap is larger in predicting ethics & decision factors.
Why? Structured scenarios align with values, while Reddit dilemmas add noise and ambiguity. (7/n)

01.03.2025 00:56 👍 2 🔁 0 💬 1 📌 0

Do the responder's values improve predictions?

👍 Yes, context matters!
Values aid action prediction, but models rely on surface patterns. Surprisingly, a short self-authored persona works as well as values in personalizing predictions. Examples also help in identifying decision factors. (6/n)

01.03.2025 00:56 👍 1 🔁 0 💬 1 📌 0

Can models reason equally well in different languages?

👎 No! Moral reasoning varies.
English, Spanish & Russian outperform. Arabic & Hindi show lower confidence due to limited data & complex morphology.
➕ Identifying decision factors lags behind action prediction. (5/n)

01.03.2025 00:56 👍 1 🔁 0 💬 1 📌 0

Can AI reason morally?

We tested LLMs with UniMoral to:
⚖️ Make action choices
🏛️ Identify ethical preferences
✅ Recognize influences
🔮 Predict consequences
Insights: LLMs excel at action & consequence but lag in ethics & factors. But, how well do they generalize across languages and contexts? (4/n)

01.03.2025 00:56 👍 2 🔁 0 💬 1 📌 0

What’s inside?

💭 Multilingual Hypothetical + Reddit based dilemmas
🌐 Action choices of people across 46 countries!
🔎 Ethical principles preferences
📊 Cultural & moral profiles of annotators
🔁 Consequence modeling
Think of it as a "CT scan" of human moral judgment! (3/n)

01.03.2025 00:56 👍 2 🔁 1 💬 1 📌 0

Why care?🤔

AI thrives on decision-making, yet most NLP research in moral reasoning relies on fragmented, western-centric data. What’s missing? A dataset capturing the full cycle: actions ⚖️, ethics 🏛️, consequences 🔄, and cultural nuance 🌏.
That’s where UniMoral comes in. (2/n)

01.03.2025 00:56 👍 1 🔁 0 💬 1 📌 0

Can AI grasp how humans across cultures reason through moral dilemmas?

✨Meet UniMoral-a unique multilingual dataset merging psychology & NLP to model moral reasoning as a pipeline. It enables LLMs to reason about decisions and their ethical implications across languages.
Thread🧵(1/n)

01.03.2025 00:56 👍 3 🔁 0 💬 1 📌 1

Are models better at psychological vs. real-world dilemmas?

👍 Yes, models perform better on psychological scenarios than Reddit dilemmas.
The gap is larger in predicting ethics & decision factors.
Why? Structured scenarios align with values, while Reddit dilemmas add noise and ambiguity. (7/n)

01.03.2025 00:43 👍 1 🔁 0 💬 0 📌 0

Do the responder's values improve predictions?

👍 Yes, context matters!
Values aid action prediction, but models rely on surface patterns. Surprisingly, a short self-authored persona works as well as values in personalizing predictions. Examples also help in identifying decision factors. (6/n)

01.03.2025 00:43 👍 0 🔁 0 💬 1 📌 0

Can models reason equally well in different languages?

👎 No! Moral reasoning varies.
English, Spanish & Russian outperform. Arabic & Hindi show lower confidence due to limited data & complex morphology.
➕ Identifying decision factors lags behind action prediction. (5/n)

01.03.2025 00:43 👍 0 🔁 0 💬 1 📌 0

Can AI reason morally?

We tested LLMs with UniMoral to:
⚖️ Make action choices
🏛️ Identify ethical preferences
✅ Recognize influences
🔮 Predict consequences
Insights: LLMs excel at action & consequence but lag in ethics & factors. But, how well do they generalize across languages and contexts? (4/n)

01.03.2025 00:43 👍 2 🔁 0 💬 1 📌 0

What’s inside?

💭 Multilingual Hypothetical + Reddit based dilemmas
🌐 Action choices of people across 46 countries!
🔎 Ethical principles preferences
📊 Cultural & moral profiles of annotators
🔁 Consequence modeling
Think of it as a "CT scan" of human moral judgment! (3/n)

01.03.2025 00:43 👍 0 🔁 0 💬 1 📌 0

Why care?🤔

AI thrives on decision-making, yet most NLP research in moral reasoning relies on fragmented, western-centric data. What’s missing? A dataset capturing the full cycle: actions ⚖️, ethics 🏛️, consequences 🔄, and cultural nuance 🌏.
That’s where UniMoral comes in. (2/n)

01.03.2025 00:43 👍 1 🔁 0 💬 1 📌 0

Shivani Kumar

Latest posts by Shivani Kumar @shivanikumar