William Jurayj's Avatar

William Jurayj

@williamjurayj

PhD student at Johns Hopkins CLSP (@jhuclsp.bsky.social). Researching natural and formal language processing. williamjurayj.com

419
Followers
217
Following
15
Posts
11.11.2024
Joined
Posts Following

Latest posts by William Jurayj @williamjurayj

Super cool work! I hope this insight can help policymakers think more clearly about childrens safety on tech / AI platforms.

23.02.2026 03:03 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

'Corporate Childrearing' is forthcoming in Duke Law Journal (26-27). Corporations play a huge role in children's identity formation. The piece reshapes the family law triangle into a square to make their influence and their intrusion on family relationships explicit.
papers.ssrn.com/sol3/papers....

22.02.2026 22:56 πŸ‘ 6 πŸ” 2 πŸ’¬ 1 πŸ“Œ 0
Preview
Teaching AI to admit uncertainty Johns Hopkins researchers show how different "odds" can teach AI models to admit when they're not confident enough in an answer

JHU computer scientists including @williamjurayj.bsky.social propose a method that allows #AI models to spend more time thinking through problems & uses a confidence score to determine when the AI should say "I don't know" rather than risking a wrong answer, which is crucial for high-stakes domains.

02.07.2025 18:59 πŸ‘ 1 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0
a 3D graph with the X axis of compute budget, Y axis of accuracy, and Z axis of confidence threshold. The chart shows that accuracy increases with higher compute and confidence thresholds, though the trade-off tends to be fewer questions answered overall.

a 3D graph with the X axis of compute budget, Y axis of accuracy, and Z axis of confidence threshold. The chart shows that accuracy increases with higher compute and confidence thresholds, though the trade-off tends to be fewer questions answered overall.

You can't just be right, you have to know you're right. Good advice for LLMs, according to new Johns Hopkins research. Sometimes no answer is better than a wrong one - life or death choices in medicine, for example, or big financial decisions. 🧡

19.03.2025 17:27 πŸ‘ 12 πŸ” 5 πŸ’¬ 1 πŸ“Œ 0

and here I was thinking you were out at the Opera 🀯

01.03.2025 23:38 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

It's been a joy working with @jeff-cheng.bsky.social & Ben Van Durme on this project. And huge thanks to @alexmartin314.bsky.social, @miriamsw.bsky.social, @marcmarone.com, @orionweller.bsky.social, and everyone else who gave very helpful feedback over the past weeks.

20.02.2025 15:14 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

To our knowledge this is the first work to raise this point in the new area of LLM test-time scaling, but the community has been aware of this for a long time. E.g., the Watson effort on Jeopardy, and a push by Jordan Boyd-Graber to reward systems that hold back dubious answers.

20.02.2025 15:14 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

We propose the standard evaluation format of β€œJeopardy odds”: win a point when you’re right, lose a point when you’re wrong. Here we see compute scaling distinctions that were hidden when evaluating under a zero-risk setting. Selection functions matter!

20.02.2025 15:14 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

We test DeepSeek-R1 and find that scaling test-time compute can substantially increase a model’s confidence in correct answers, drawing a wider gap between correct and incorrect answers.

20.02.2025 15:14 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

🚨 You are only evaluating a slice of your test-time scaling model's performance! 🚨

πŸ“ˆ We consider how models’ confidence in their answers changes as test-time compute increases. Reasoning longer helps models answer more confidently!

πŸ“: arxiv.org/abs/2502.13962

20.02.2025 15:14 πŸ‘ 14 πŸ” 10 πŸ’¬ 1 πŸ“Œ 1
Preview
Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation Transformers have revolutionized vision and natural language processing with their ability to scale with large datasets. But in robotic manipulation, data is both limited and expensive. Can manipulati...

You might look into behavior cloning agents, which is a pretty robust space (e.g. arxiv.org/abs/2209.05451)

I could be misunderstanding what you're looking for though, since this feels very different from the CogAI/SOAR items you point to.

31.12.2024 00:54 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

I’d say a key factor is whether a person’s put in a good faith effort to be right for the right reasons. But I’m to other explanations!

06.12.2024 20:10 πŸ‘ 5 πŸ” 0 πŸ’¬ 0 πŸ“Œ 1

In many ways, the Vision Pro hits on both categories.

27.11.2024 18:00 πŸ‘ 3 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

At this point, I would probably buy a cellular phone that they made

27.11.2024 17:54 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

I think 17th century English were more likely to be enjoying Tea than Coffee

27.11.2024 17:52 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

πŸ‘‹

25.11.2024 16:24 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Did you recently visit an Apple store?

25.11.2024 04:21 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

I saw this happen live, it was tragic

25.11.2024 04:10 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

I noticed a lot of starter packs skewed towards faculty/industry, so I made one of just NLP & ML students: go.bsky.app/vju2ux

Students do different research, go on the job market, and recruit other students. Ping me and I'll add you!

23.11.2024 19:54 πŸ‘ 176 πŸ” 54 πŸ’¬ 101 πŸ“Œ 4