π¨Reminder: Submissioms for the ORIGen workshop at COLM are due today!!! π¨
CfP: origen-workshop.github.io/submissions/
OpenReview submission page: openreview.net/group?id=col...
π¨Reminder: Submissioms for the ORIGen workshop at COLM are due today!!! π¨
CfP: origen-workshop.github.io/submissions/
OpenReview submission page: openreview.net/group?id=col...
I'm trying to make "bleet" a thing
Jesse Thomason and Jesse Zhang in their respective PhD robes.
This month, @jessezhang.bsky.social completed his PhD defense and signed to start a postdoc with @abhishekunique7.bsky.social at UW! Keep an eye on his journey :) www.jessezhang.net
I'm sad to lose one of my sinistral students but glad to produce another Dr. Jesse π
The only silver lining of my ACL rejection is that I have something to submit to EMNLP
LLMs are all around us, but how can we foster reliable and accountable interactions with them??
To discuss these problems, we will host the first ORIGen workshop at @colmweb.org! Submissions welcome from NLP, HCI, CogSci, and anything human-centered, due June 20 :)
origen-workshop.github.io
This! So much this!!!
Nothing says βI love youβ like outsourcing your parentsβ phone calls to a chatbot. π Social isolation in aging is real. Connection isnβt something you can automate.
Why does everyone think we can just throw a chatbot at every problem? </rhetorical>
www.404media.co/i-tested-the...
Ty for the plug π
Model confidence is a good decision aid (arxiv.org/pdf/2001.02114), while explanations are less useful and can cause over-reliance (arxiv.org/abs/2310.12558, arxiv.org/pdf/2406.19170). Other interaction cues like AI warmth can also make a difference (arxiv.org/abs/2407.07950).
Arresting and threatening to deport students because of their participation in political protest is the kind of action one ordinarily associates with the worldβs most repressive regimes. Itβs genuinely shocking that this appears to be whatβs going on right here. 1/
What do you mean by core capabilities, for VLMS? IMO core capabilities should be determined by the applications we care about, and I'd argue medical use cases are as important (if not more) as MSCOCO-style images/scenes
I worry that concerns with "superintelligence" are being blurred with concerns around *ceding human control*.
A "SuperDumb" system can create mutually assured destruction. What it takes is allowing AI systems to execute code autonomously in military operations.
"The first guest on Gavin Newsom's podcast was Charlie Kirk" is more than enough for me to say "absolutely not" to any suggestion Newsom play any role in the future of the Democratic Party. People like him are the past, the failures, the ones who got us here.
What are you using o1pro for? And in what aspects do you think it's better than other LLMs?
Is this advice you reserve for a particular class of problems, or is it just generally applicable because we still don't know the full breadth of LLM capabilities?
I'm always three days away from being three days away
We hope our work inspires the community to more closely consider how user characteristics, including but not limited to trust, affect how people rely on AI assistance.
Work done with the always-awesome @thomason.bsky.social!
Improving AI reliability is more important than ever as AI systems are increasingly deployed in real-world settings with high stakes. We believe it is important for AI researchers to think about the user-AI dyad π§π€, rather than just the AI in a vacuum.
These findings show that being able to estimate usersβ trust levels can enhance human-AI collaboration πͺ but we also find that modeling user trust is very challenging! π Our work reveals promising new directions for user modeling that extend beyond merely learning user preferences.
We show that adapting AI behavior to user trust levels, by showing AI explanations during moments of low trust and counter-explanations during high trust, effectively mitigates inappropriate reliance and improves decision accuracy! These improvements are also seen with other intervention strategies.
In two decision-making tasks, we find that low and high user trust levels worsen under-reliance and over-reliance on AI recommendations, respectively πππ
Can the AI assistant do something differently when user trust is low/high to prevent such inappropriate reliance? Yes!
People are increasingly relying on AI assistance, but *how* they use AI advice is influenced by their trust in the AI, which the AI is typically blind to. What if they werenβt?
We show that adapting AI assistants' behavior to user trust mitigates under- and over-reliance!
arxiv.org/abs/2502.13321
Whatβs a technology that you think is overhyped? Iβm going to give a sideways answer to this, which is that the venture capital business model needs to be understood as requiring hype. You can go back to the Netscape IPO, and that was the proof point that made venture capital the financial lifeblood of the tech industry. Venture capital looks at valuations and growth, not necessarily at profit or revenue. So you donβt actually have to invest in technology that works, or that even makes a profit, you simply have to have a narrative that is compelling enough to float those valuations. So you see this repetitive and exhausting hype cycle as a feature in this industry. A couple of years ago, you would have been asking me about the metaverse, then last year, you would have asked me about Web3 and crypto, and for each of these inflection points thereβs an Andreessen Horowitz manifesto. Itβs not simply that one piece of technology is overhyped, itβs that hype is a necessary ingredient of the current business ecosystem of the tech industry. We should examine how often the financial incentive for hype is rewarded without any real social returns, without any meaningful progress in technology, without these tools and services and worlds ever actually manifesting. Thatβs key to understanding the growing chasm between the narrative of techno-optimists and the reality of our tech-encumbered world.
Stand by this: www.politico.com/newsletters/...
Do each of these correspond to a particular conf deadline? I'm guessing
May: EMNLP
July: AACL?
Oct: EACL/NAACL
Feb: ACL
βΌοΈ Ever wish LLMs would just... slow down for a second?
In our latest work, "Better Slow than Sorry: Introducing Positive Friction for Reliable Dialogue Systems", we delve into how strategic delays can enhance dialogue systems.
Paper Website: merterm.github.io/positive-fri...
βToward the end of the November dinner, Trump raised the matter of the lawsuit, the people said. The president signaled that the litigation had to be resolved before Zuckerberg could be βbrought into the tent,β one of the people said.β
Theyβre in the tent now. Cowards.
Hi Marc! Could I get added?
Ooh what agent? Any pointers to how I can set this up?
EveryPhD EveryLab all at once
As long as the last time you saw/spoke to them was last year -- I wish my dentist Happy New Year in August.
You forgot about mid-training (which incidentally is also what I call my training runs).