To an extent the format allows for that—you can’t accomplish 50% of 7 day tasks unless you are way over 50% of 1 day tasks
To an extent the format allows for that—you can’t accomplish 50% of 7 day tasks unless you are way over 50% of 1 day tasks
This METR eval is way beyond the scope it was originally intended to cover and has taken on a life of its own. The team knows this but I think model progress might be outstripping the rate at which they can come up with a successor.
Bad sign
This could have been SF not Shanghai
Authors attribute the poor RL results to teacher mismatch, which is plausible, but it may be that transformers just do better (or more interpretable) RL, swamping any advantage in pretraining.
...but these limitations really only apply to pretraining. A fixed depth transformer can't solve certain classes of problems in a *single* forward pass, which might show up in pretraining loss, but a reasoning model can simulate arbitrary circuit depths in thinking tokens.
2/3
AI2 did a near 1:1 comparison between pure transformer and hybrid archs: allenai.org/papers/olmo-...
Pretraining: hybrid gated deltanet clearly wins
RL: mixed at best
They also point out theoretical limitations of transformers' fixed circuit depths
1/2
not having to use gimp is right up there with not having to write bash as a QoL improvement from AI
no offense but if your sole metric for belief evaluation is "which of these make me feel the best" you're just epistemically completely turbofucked. maybe there's a nicer way to phrase that but that's the gist. like I'd love to believe I'm impervious to disease, and bullets. but
At long last we have built Her, from the classic Sci-Fi movie Don't Build Her.
www.minimax.io/news/a-deep-...
wait you're telling me I lived 2 blocks from the pope
right I have no problem with decapitation strikes per se but in this case it's not clear what red line we're trying to enforce for the next dictator who makes trouble
is it clear what precisely we are deterring?
Another favorite: www.usenix.org/system/files...
I'm a simple man; I see Mickens, I repost.
WELL WELL WELL NOT SO EASY TO FIND A PLAN TO TERMINATE A CONFLICT THAT DOESN’T SUCK SHIT HUH?
what was this in reply to?
Proud of Anthropic for holding the line, hope folks at other labs will look closely at what they're agreeing to wrt the DoD.
A shame because we need a strong, rational DoD, not one looking to fight imaginary culture war enemies.
www.anthropic.com/news/stateme...
It is both unprecedented and largely nonsensical--it should have very little effect beyond just cancelling the contracts, except making Pete feel like a big strong man I guess.
This is very cool! I'm curious to see the failure cases. I imagine it has very good reliability for certain task/attacker objective combos (i.e. when the attacker wants to call a tool the task doesn't require) but I don't see how it can handle cases where the attack is using a legit tool.
I may not have gotten that Vercept job, but I did end up at the same place starting on the same day.
Funny how life works!
www.anthropic.com/news/acquire...
may i recommend moving into an apartment in the most expensive city on earth while still owning your home several states away and debating how much of a loss you're willing to take selling it
Currently the answer is yes, attacks transfer pretty well across models! See e.g. arxiv.org/pdf/2307.15043 (old but still applicable)
But I think stopping attacks from transferring is more tenable than stopping adversarial optimization against the model itself.
it's a good approach but very limiting for many use cases
Are you envisioning deterministic access controls or the model itself specifying access controls before it sees the untrusted data?
White-box access to a model or a sufficiently close distillation of that model allows adversarial optimization of attacks, which is how the strongest attacks (and the ones I'm least optimistic about handling) are created.
Yes that is the threat model I'm talking about. Obviously only one of many and not the most important one but still, would like for us to solve it!
thread is starting to become a tree
bsky.app/profile/gput...
Prompt injection isn't about getting a model to do something the model author doesn't want. It's about third parties getting the model to do something the model user doesn't want.