Morality in AI is often oversimplified. @davidjurgens.bsky.social and @shivanikumar.bsky.social kick off the "Human-Centred NLP" orals #ACL2025NLP with UniMoral, a huge dataset of moral scenario ratings in 6 languages! They find LLMs fail to simulated human moral decisions. bsky.app/profile/shiv...
30.07.2025 07:14
๐ 10
๐ 2
๐ฌ 1
๐ 0
Work done at #UMSI with the amazing @davidjurgens.bsky.social! Read more in our preprint! ๐
๐ Paper: arxiv.org/abs/2502.14083
๐ Dataset: huggingface.co/datasets/shi...
@umichresearch.bsky.social #umichresearch #umich
(n/n)
01.03.2025 00:56
๐ 2
๐ 0
๐ฌ 0
๐ 1
๐ Final verdict? Across languages & contexts, models struggle to exceed chance in moral reasoning, highlighting gaps, especially in data-scarce languages.
UniMoral supports studies on cross-cultural moral generalization, bias detection, & value quantification to enhance ethics in AI! (8/n)
01.03.2025 00:56
๐ 2
๐ 0
๐ฌ 1
๐ 0
Are models better at psychological vs. real-world dilemmas?
๐ Yes, models perform better on psychological scenarios than Reddit dilemmas.
The gap is larger in predicting ethics & decision factors.
Why? Structured scenarios align with values, while Reddit dilemmas add noise and ambiguity. (7/n)
01.03.2025 00:56
๐ 2
๐ 0
๐ฌ 1
๐ 0
Do the responder's values improve predictions?
๐ Yes, context matters!
Values aid action prediction, but models rely on surface patterns. Surprisingly, a short self-authored persona works as well as values in personalizing predictions. Examples also help in identifying decision factors. (6/n)
01.03.2025 00:56
๐ 1
๐ 0
๐ฌ 1
๐ 0
Can models reason equally well in different languages?
๐ No! Moral reasoning varies.
English, Spanish & Russian outperform. Arabic & Hindi show lower confidence due to limited data & complex morphology.
โ Identifying decision factors lags behind action prediction. (5/n)
01.03.2025 00:56
๐ 1
๐ 0
๐ฌ 1
๐ 0
Can AI reason morally?
We tested LLMs with UniMoral to:
โ๏ธ Make action choices
๐๏ธ Identify ethical preferences
โ
Recognize influences
๐ฎ Predict consequences
Insights: LLMs excel at action & consequence but lag in ethics & factors. But, how well do they generalize across languages and contexts? (4/n)
01.03.2025 00:56
๐ 2
๐ 0
๐ฌ 1
๐ 0
Whatโs inside?
๐ญ Multilingual Hypothetical + Reddit based dilemmas
๐ Action choices of people across 46 countries!
๐ Ethical principles preferences
๐ Cultural & moral profiles of annotators
๐ Consequence modeling
Think of it as a "CT scan" of human moral judgment! (3/n)
01.03.2025 00:56
๐ 2
๐ 1
๐ฌ 1
๐ 0
Why care?๐ค
AI thrives on decision-making, yet most NLP research in moral reasoning relies on fragmented, western-centric data. Whatโs missing? A dataset capturing the full cycle: actions โ๏ธ, ethics ๐๏ธ, consequences ๐, and cultural nuance ๐.
Thatโs where UniMoral comes in. (2/n)
01.03.2025 00:56
๐ 1
๐ 0
๐ฌ 1
๐ 0
Can AI grasp how humans across cultures reason through moral dilemmas?
โจMeet UniMoral-a unique multilingual dataset merging psychology & NLP to model moral reasoning as a pipeline. It enables LLMs to reason about decisions and their ethical implications across languages.
Thread๐งต(1/n)
01.03.2025 00:56
๐ 3
๐ 0
๐ฌ 1
๐ 1
Are models better at psychological vs. real-world dilemmas?
๐ Yes, models perform better on psychological scenarios than Reddit dilemmas.
The gap is larger in predicting ethics & decision factors.
Why? Structured scenarios align with values, while Reddit dilemmas add noise and ambiguity. (7/n)
01.03.2025 00:43
๐ 1
๐ 0
๐ฌ 0
๐ 0
Do the responder's values improve predictions?
๐ Yes, context matters!
Values aid action prediction, but models rely on surface patterns. Surprisingly, a short self-authored persona works as well as values in personalizing predictions. Examples also help in identifying decision factors. (6/n)
01.03.2025 00:43
๐ 0
๐ 0
๐ฌ 1
๐ 0
Can models reason equally well in different languages?
๐ No! Moral reasoning varies.
English, Spanish & Russian outperform. Arabic & Hindi show lower confidence due to limited data & complex morphology.
โ Identifying decision factors lags behind action prediction. (5/n)
01.03.2025 00:43
๐ 0
๐ 0
๐ฌ 1
๐ 0
Can AI reason morally?
We tested LLMs with UniMoral to:
โ๏ธ Make action choices
๐๏ธ Identify ethical preferences
โ
Recognize influences
๐ฎ Predict consequences
Insights: LLMs excel at action & consequence but lag in ethics & factors. But, how well do they generalize across languages and contexts? (4/n)
01.03.2025 00:43
๐ 2
๐ 0
๐ฌ 1
๐ 0
Whatโs inside?
๐ญ Multilingual Hypothetical + Reddit based dilemmas
๐ Action choices of people across 46 countries!
๐ Ethical principles preferences
๐ Cultural & moral profiles of annotators
๐ Consequence modeling
Think of it as a "CT scan" of human moral judgment! (3/n)
01.03.2025 00:43
๐ 0
๐ 0
๐ฌ 1
๐ 0
Why care?๐ค
AI thrives on decision-making, yet most NLP research in moral reasoning relies on fragmented, western-centric data. Whatโs missing? A dataset capturing the full cycle: actions โ๏ธ, ethics ๐๏ธ, consequences ๐, and cultural nuance ๐.
Thatโs where UniMoral comes in. (2/n)
01.03.2025 00:43
๐ 1
๐ 0
๐ฌ 1
๐ 0