Collective Intelligence Project's Avatar

Collective Intelligence Project

@cip.org

We're on a mission to steer transformative technology for the collective good. cip.org

518
Followers
44
Following
152
Posts
13.05.2024
Joined
Posts Following

Latest posts by Collective Intelligence Project @cip.org

Preview
2025 Global Dialogues Index β€” The Collective Intelligence Project

Full 2025 Global Dialogues Index Report
cip.org/2025gdindex

21.01.2026 21:02 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Put together, these findings reveal early glimpses of the ways in which AI is reorganizing trust, intimacy, and work, providing a picture of how the world now lives with AI

21.01.2026 21:02 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

New report drop!

After seven rounds of Global Dialogues with more than 6000 people across 70 countries in 2025, we are releasing the 2025 Global Dialogues Index Report.

blog.cip.org/2025gdindex

21.01.2026 21:02 πŸ‘ 7 πŸ” 3 πŸ’¬ 2 πŸ“Œ 3
Preview
Audrey Tang and Divya Siddarth on Outfitting Democracy for the AI Era Podcast Episode Β· Possible Β· 08/13/2025 Β· 52m

Apple: podcasts.apple.com/us/podcast/a...

Spotify: open.spotify.com/episode/6UDj...

15.08.2025 14:08 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

- Why "uncommon ground" beats common ground every time

- Sci-fi book recommendations

- And much more

15.08.2025 14:08 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Preview
Global Dialogues Exploring humanity's vision for artificial intelligence through global conversations and collective intelligence.

- Our work bringing 100K+ people into AI development through globaldialogues.ai

- How we're building evaluation benchmarks from lived experiences, not just lab tests

- Digital twins that could represent your values without taking up all your evenings

15.08.2025 14:08 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

What you'll find in this episode:

- How Taiwan crowdsourced anti-deepfake legislation in 24 hours (and it worked)

- Why 1 in 3 adults now use AI for daily emotional support, and what that means for democracy

15.08.2025 14:08 πŸ‘ 1 πŸ” 0 πŸ’¬ 2 πŸ“Œ 0
Preview
Audrey Tang and Divya Siddarth on Outfitting Democracy for the AI Era Podcast Episode Β· Possible Β· 08/13/2025 Β· 52m

@divya.bsky.social and @audreyt.org joined @reidhoffman.bsky.social and Aria Finger, hosts of the Possible Podcast, to talk about how democracy and AI can bring out the best of each other.

Apple: podcasts.apple.com/us/podcast/a...

Spotify: open.spotify.com/episode/6UDj...

15.08.2025 14:08 πŸ‘ 2 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

We're asking the a global sample of the world: "π–―π–Ύπ—‹π—Œπ—ˆπ—‡π–Ίπ—…π—…π—’, π—π—ˆπ—Žπ—…π–½ π—’π—ˆπ—Ž 𝖾𝗏𝖾𝗋 π–Όπ—ˆπ—‡π—Œπ—‚π–½π–Ύπ—‹ 𝗁𝖺𝗏𝗂𝗇𝗀 𝖺 π—‹π—ˆπ—†π–Ίπ—‡π—π—‚π–Ό π—‹π–Ύπ—…π–Ίπ—π—‚π—ˆπ—‡π—Œπ—π—‚π—‰ 𝗐𝗂𝗍𝗁 𝖺𝗇 𝖠𝖨, 𝗂𝖿 𝗍𝗁𝖾 𝖠𝖨 π—π–Ίπ—Œ 𝖺𝖽𝗏𝖺𝗇𝖼𝖾𝖽 π–Ύπ—‡π—ˆπ—Žπ—€π—?"

Prediction time: What % do you think will say yes?

Tell us your response in the comments!

26.05.2025 17:59 πŸ‘ 3 πŸ” 1 πŸ’¬ 2 πŸ“Œ 0
Preview
LLM Judges Are Unreliable β€” The Collective Intelligence Project When Large Language Models are used as judges for decision-making across various sensitive domains, they consistently exhibit unpredictable and hidden measurement biases, making their verdicts unrelia...

10/10: Read the piece to learn more about this under-explored issue.

It includes specific strategies to address these biases and provides access to the full Github suite.

www.cip.org/blog/llm-jud...

23.05.2025 17:27 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

9/10: We built a Github suite to systematically test and quantify these biases.

It lets you:

23.05.2025 17:27 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

8/10: To improve reliability: Neutralize labels, vary order, empirically validate all prompt components, and optimize scoring mechanics. Diversify your model portfolio and critically evaluate human baselines.

23.05.2025 17:27 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

7/10: These aren't just minor quirks. LLMs lack the mechanistic precision of traditional software. Their architecture means system prompts and input material exist in the same context, leading to unpredictable interactions.

23.05.2025 17:27 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

6/10: Rubric-based scoring is also affected. We observed 'recency bias' where criteria scored later received lower averages. Holistic vs. isolated evaluation dramatically shifted scores too.

23.05.2025 17:27 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

5/10: For example, in pairwise choices, LLMs favored "Response B" 60-69% of the time, a significant deviation from random. Even explicit "de-biasing" prompts sometimes increased bias.

23.05.2025 17:27 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

4/10: LLMs exhibit cognitive biases similar to humans: serial position, framing, anchoring. Our tests across frontier models from Google, Mistral, Anthropic, and OpenAI consistently show these biases in judgment contexts.

23.05.2025 17:27 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

3/10: "Prompt engineering" often relies on untested folklore. We found even minor prompt changes, like "Response A" vs. "Response B" labeling, significantly bias LLM choices.

23.05.2025 17:27 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

2/10: This is important because LLMs are increasingly deployed for evaluation tasks, ranking, decision-making, and judgement in many critical domains.

23.05.2025 17:27 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

1/10: LLM Judges Are Unreliable.

Our latest blog post from @j11y.io shows that positional preferences, order effects, and prompt sensitivity fundamentally undermine the reliability of LLM judges.

23.05.2025 17:27 πŸ‘ 2 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Preview
Global Dialogues Challenge β€” The Collective Intelligence Project

The Collective Intelligence Project @cip.org has launched the Global Dialogues Challenge, an open call to explore global perspectives on the future of artificial intelligence.

A $10,000 prize fund will be distributed among the winning entrants.

www.cip.org/challenge

21.05.2025 14:56 πŸ‘ 6 πŸ” 2 πŸ’¬ 1 πŸ“Œ 1

We're really thrilled to be able to have such a juicy prize fund. If you're feeling a sassiness with data and want to build something small to explore or inspire better AI for humans, take a look and enter. cip.org/challenge

Step 1. Grab the data.
Step 2. Build something cool.

<3

20.05.2025 12:39 πŸ‘ 2 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0
Preview
Global Dialogues Challenge β€” The Collective Intelligence Project

Details and how to apply: cip.org/challenge

19.05.2025 17:56 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

Submissions will be judged by an amazing panel:

@audreyt.org (Cyber Ambassador-at-large for Taiwan)

@nabiha.bsky.social (Executive Director of @mozilla.org )

Zoe Hitzig (Research Scientist at OpenAI and Poet)

19.05.2025 17:56 πŸ‘ 4 πŸ” 2 πŸ’¬ 3 πŸ“Œ 0

The challenge runs from Monday, May 19th through Friday, July 11th.

A $10,000 prize fund will be distributed among the winning submissions.

19.05.2025 17:56 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Preview
Global Dialogues Exploring humanity's vision for artificial intelligence through global conversations and collective intelligence.

This is an open call to explore global perspectives on AI using the public datasets sourced from our globaldialogues.ai project.

Participants can submit benchmarks, visualizations, artistic responses, or analytical reflections.

19.05.2025 17:56 πŸ‘ 2 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

We're officially launching the Global Dialogues Challenge!

19.05.2025 17:56 πŸ‘ 5 πŸ” 3 πŸ’¬ 2 πŸ“Œ 2
Preview
These new AI benchmarks could help make models less biased They could offer a more nuanced way to measure AI’s bias and its understanding of the world.

www.technologyreview.com/2025/03/11/1...

14.03.2025 18:36 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

β€œWe have been sort of stuck with outdated notions of what fairness and bias means for a long time,” says
@divya.bsky.social, β€œwe have to be aware of differences, even if that becomes somewhat uncomfortable.”

Read the full @technologyreview.com article on new approaches to evaluating AI ⬇️

14.03.2025 18:36 πŸ‘ 5 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Preview
When One LLM Drools, Multi-LLM Collaboration Rules This position paper argues that in many realistic (i.e., complex, contextualized, subjective) scenarios, one LLM is not enough to produce a reliable output. We challenge the status quo of relying sole...

5/ "When One LLM Drools, Multi-LLM Collaboration Rules" argues that single LLMs underrepresent real-world diversity. The authors propose multi-LLM collaboration to address reliability, democratization, and pluralism.

arxiv.org/abs/2502.04506

03.03.2025 02:04 πŸ‘ 2 πŸ” 0 πŸ’¬ 2 πŸ“Œ 0
Preview
Vending-Bench: A Benchmark for Long-Term Coherence of Autonomous Agents While Large Language Models (LLMs) can exhibit impressive proficiency in isolated, short-term tasks, they often fail to maintain coherent performance over longer time horizons. In this paper, we prese...

4/ Vending-Bench tests LLMs' long-term coherence and capital acquisitionβ€”a capability relevant to AI risk scenarios.

Top models struggle with simple business tasks, and breakdowns don't stem from memory limits.

arxiv.org/abs/2502.15840

03.03.2025 02:04 πŸ‘ 1 πŸ” 0 πŸ’¬ 2 πŸ“Œ 0