Find me at my poster in Hall C3 and/or find out more here: jedi.nicpopovic.com
Find me at my poster in Hall C3 and/or find out more here: jedi.nicpopovic.com
Fact decomposition for interpretable and robust NLI without the need for an LLM?
Let me tell you how!
At 2pm today, I will be presenting “Extractive Fact Decomposition for Interpretable Natural Language Inference in One Forward Pass” at #EMNLP2025!
🚀 We are excited to introduce Kaleidoscope, the largest culturally-authentic exam benchmark.
📌 Most VLM benchmarks are English-centric or rely on translations—missing linguistic & cultural nuance. Kaleidoscope expands in-language multilingual 🌎 & multimodal 👀 VLMs evaluation
A bit of a mess around the conflict of COLM with the ARR (and to lesser degree ICML) reviews release. We feel this is creating a lot of pressure and uncertainty. So, we are pushing our deadlines:
Abstracts due March 22 AoE (+48hr)
Full papers due March 28 AoE (+24hr)
Plz RT 🙏
fyi: links seem to be broken
While you're not wrong, it felt like the paper in question had involved manual work. For example, the figures did not look generated or just modified. They were remade and unfortunately less clear than the original. Also, the paraphrasing wasn't great, I think ChatGPT would've done a better job 😅
That's a good guess :)
I reviewed a paper last year where the approach section turned out to be a sentence-for-sentence copy (+ simple paraphrasing) of another paper. Figures were (poorly) redrawn, too. Of course the evaluation section had new results, beating SOTA by miles... Still amazes me that someone would try that.
thanks for the insight! the whole process is such a black box. nice to hear that applications are still being read by humans, even if the workload is so high :)
Just wondering how many applications even make it through the initial filter and to your desk..
"Sora is a data-driven physics engine."
x.com/chrisoffner3...
The best part about beating sota is that you'll finally get to find the bug in your eval code.
Nice, would be interesting to see how it performs on various tasks depending on the language used for CoT...
Anybody else notice Qwen models occasionally (really not very often) switching to Chinese mid-sentence?
The answer to life's problems is simple:
sudo reboot
Depending on how funny you think it is that you just typed "meat-llama" instead of "meta-llama", it might be time for a break 🥩🦙
Cool, thanks for the link!
I wonder how much the results would be affected if the user messages include telling the assistant that it did a great job 😅
kind of like the whole “gpt will do better if you offer a tip” thing
For ICL, is it better to put examples in the system prompt or as user/assistant messages?
Now we just need to know which one to set to one for AGI :D
👋
Alibaba has their own version on GPT-o1. This might be the best description of “o1-type”systems so far arxiv.org/abs/2411.14405
👋
👀
Has GPT-4o always done this kind of thing, or is it trained on "self-correction" chains from o1 now?
TIL. And I thought I was being silly by suggesting its probably "all clear" in a strong german accent...
Thank you :)
Huh, so do gifs not work here...?
First post, just to fill the void on my profile :)
Check out my recent EMNLP paper on how to use probing classifiers for streaming named entity recognition!
Link to paper and demo (please try the demo, I'm really proud of it 😅): ember.nicpopovic.com