2. A proposal for evaluation of "superhuman" systems in healthcare: ai.nejm.org/doi/full/10....
2. A proposal for evaluation of "superhuman" systems in healthcare: ai.nejm.org/doi/full/10....
Check out our related work:
1. Gaps in ability of models to adjust response and differential diagnosis for cancer patients: www.thelancet.com/journals/lan...
@shan23chen.bsky.social
Our research has found that even when chatbots are given specific patient context, they often drift back toward generic, "average patient" responses. They see the data, but they don't always weigh it like a physician would.
As I shared in the NYT, models often see the data but fail to weigh it like a physician, drifting toward generic "average patient" responses. Context window β Clinical reasoning.
www.nytimes.com/2025/12/03/w...
ββJust because youβre providing all of this information to language models,β @daniellebitterman.bsky.social says, βdoesn't mean they're effectively using that info in the same way that a physician wouldβ.
And once people upload this kind of data, they have limited control over how it is used.βπ§ͺπ
Check out our editorial on Zazzetti et al (2025)'s paper on synthetic data generation for breast cancer, in JCO CCI! Synthetic data could help with many gaps in clinical AI research, but challenges remain especially (IMO) issues with out-of-domain generalization @shan23chen.bsky.social
Super proud of @shan23chen.bsky.social for his podium presentation on his research into LLM sycophancy in the face of illogical medical queries at #AMIA25!
Full paper: www.nature.com/articles/s41...
Also cited yesterday in the NYT! www.nytimes.com/2025/11/16/w...
LLMs tend to prioritize helpfulness > reason. We show that safety-aware, compute-efficient fine-tuning helps models reason more critically in healthcare domain, and generalizes to improved safety alignment across other domains.
www.nature.com/articles/s41... @shan23chen.bsky.social
An overemphasis on helpfulness makes LLMs vulnerable.
Research shows models will comply with illogical medical requests, generating false information. This sycophantic tendency can be corrected with specific prompting and fine-tuning. #MedSky #MedAI #MLSky
Mass General physician-scientist @daniellebitterman.bsky.social discusses how AI assists the clinical data pipeline leading to better treatments for patients. Listen to unNatural Selection & register for #WMIF2025 at the link in bio to hear more : www.unnaturalselection.net/podcast/s1e19
#MedTech
Our paper on multilingual reasoning is accepted to Findings of #EMNLP2025! π (OA: 3/3/3.5/4)
We show SOTA LMs struggle with reasoning in non-English languages; prompt-hack & post-training improve alignment but trade off accuracy.
π arxiv.org/abs/2505.22888
See you in Suzhou! #EMNLP
Are you driven to use AI to transform patient outcomes in oncology? My lab in the AI in Medicine Program (Mass General Brigham, Harvard Medical School) is seeking Postdoctoral Fellows to pioneer applications of AIβespecially LLMsβin cancer care. More here: www.linkedin.com/posts/daniel...
Reliability of Large Language Model Knowledge Across Brand and Generic Cancer Drug Names | JCO Clinical Cancer Informatics ascopubs.org/doi/abs/10.1... #JCOCCI @daniellebitterman.bsky.social
Does your LRM reason in your language? Check out new preprint led by β¨ @jiruiqi.bsky.social & @shan23chen.bsky.social. Implications for safety/human oversight & accuracy!
Led by @shan23chen.bsky.social!
Agents are all the rage and we need to track their abilities in the medical domain. Enter MedBrowseComp, the 1st benchmark to assess agents' abilities to reason, navigate the web, and search for verifiable med info!
Preprint: arxiv.org/abs/2505.14963
Site: moreirap12.github.io/mbc-browse-a...
"I think we have massive opportunity in cancer care to get patients to the right care, the most advanced care earlier, by taking those workforce shortages and using AI to get to solutions."
#STATBreakthrough
"The other thing I'm scared of, it's a patient's voice is going to be come lost in the conversation of on what type of AI is developed and how we implement it," Danielle
#STATBreakthrough
Iβm thrilled to be in San Francisco for @statnews.com's Breakthrough West Summit! Iβll be bringing my firsthand perspective as a physician-scientist to speak about how AI is transforming cancer care, alongside leaders in the field.
Let's connect if you're here!
#STATBreakthroughSummitWest
A social card that reads Featured Session: AI in Cancer Care. Then underneath are four headshots and titles. They read: Danielle Bitterman, M.D., Clifford A. Hudis, M.D., Karen Knudsen, Ph.D., and STAT's Angus Chen.
AI in Cancer Care
Artificial intelligence has the potential to upend oncology, changing everything from diagnosis to treatment options. Get a wide-ranging view of how the use of technology could play out over the next few years.
Moderated by @angusrohan.bsky.social
#STATBreakthrough
Exciting news: we are organizing a shared task β 2nd edition of the Chemotherapy Treatment Timelines Extraction from the Clinical Narrative (text mining task) -- collocated with the Clinical NLP Workshop. Do LLMs solve the task? Check out bit.ly/ChemoTimelin...
graph of NIH basisfor new drugs
A pie graph worth keeping in mind as the NIH budget plummets jamanetwork.com/journals/jam... for 356 new FDA drugs approved
Conference and professional societies: PLEASE make hybrid options available for attendees and presenters at your conferences so that scientists from HHS-funded agencies can attend. These are unmissable opportunities to promote all the great intramural science and scientists from our government.
My Perspective in @NEJM_AI. AI could distort clinical decision-making in ways that prioritize profit over patient care. Oversight & regulation must go beyond performance metrics alone to address hidden commercial forces that could shape decision support. ai.nejm.org/doi/full/10....
My opinion as an actual NIH-funded researcher (unlike Vinay) at ucsf: his lies about how NIH dollars are used reflect a complete lack of understanding of how research is performed, a lack of respect for research, and are harmful to the entire biomedical research enterprise #grifter
Budgeting for the next year of my grants and they will all need to be rescoped, even before the 15% IDC rate. NCI funding at 83% for new awards and another 10% reduction for renewals (current state). Essentially, we are getting 50% of what we asked for...how is this sustainable? @carlbergstrom.com
As a cancer doctor I see every day how NIH-funded clinical trials save lives and has made the U.S. a leader in medical innovation. Here's one example: In the 1970s, childhood cancer survival was only 58%. Today it is 85%, largely thanks to NIH/NCI funding of Children's Oncology Group trials.
Congressional delegation outside USAID now: βWe are here to shed a light on a crime unfolding before our eyes.β
Senator Andy Kim just went to the USAID building, talked to the security guard there to confirm employees are being barred entry, and then did a press gaggle right there in front to call it out.
This is doing something. This is making an effort on messaging. Other Democratic lawmakers: take notes.