i do empathize with reactions against "AI as art creator" types of applications, though. there are certain things that need human involvement to be considered meaningful, even if at the surface they are hard to distinguish
i do empathize with reactions against "AI as art creator" types of applications, though. there are certain things that need human involvement to be considered meaningful, even if at the surface they are hard to distinguish
yeah there is a real risk of falling very behind if you reflexively decide against using LLMs these days (at least if your work involves writing lots of code)
Thanks! If I ever follow up on the blog I'll def take a look
tbf my earliest blog trilogy is on modeling multi-armed bandit tasks like the IGT. e.g see below along with the next 2 in the sequence:
haines-lab.com/post/2017-04...
Ha maybe that will be the next one! Given I just got the IAT one out, and my pacing of one blog every 1-2 years, might be a while 😅
> that's for the community to pursue
I think this is the crux of the issue—there's not a single "community" using the IGT. Math psych/cognitive science folks are operating under a distinct set of standards + measurement practices, which I believe is obscured by treating the lit monolithically
Yes totally makes sense given the goal! My point in general is that people often discount an entire literature based on results like this, when in fact there are bodies of work within the literature that have given quite a lot of attention to measurement. The IGT is one such case
Not denying the mess 😁
bsky.app/profile/nate...
Not to discount the fact that measurement and use of the IGT is generally a mess in the literature! Because it certainly is. But I think these meta reviews/analyses often miss sound areas of research when looking at the forest. Models are the best measurement tools we have for these tasks
Great post! One thing the IGT paper misses is that, despite what a bunch of folks are doing with sum scores, there is a rich history of building principled models of the IGT, dating back to Busemeyer & Stout (2002). The paper leaves you thinking the whole literature is a mess, which is not true IMO
that's not a bad idea 🤓
appreciate it! and this is never going to be a paper.. im blogging for the love of the game, the publishing process would kill the fun
im blogging for the move of the game, the publishing process would kill the fun
An aside—LLMs have basically solved data viz for me. What used to take a good bit of tinkering in ggplot or plotly now can be one-shotted with a data snippet/schema as context and a plain English description of what I want. The bar for data viz is now higher 🤓
ah so an actual paper and not a blog project?? lol
Ooo very nice, will be curious to read more on that!
Thank you!
And yes you definitely could, although I think drift rate is the most plausible mechanism in this case. e.g. the conflicting stimuli produce trade offs during evidence accumulation that dampen the accumulation process.
That said, task manipulations (eg priming) may influence other pars
In case you missed it 🤓
9/9 The interpretation depends on how you evaluate discriminant validity. Implicit and explicit attitudes share ~ 67% of variance, but that still leaves 33% unique variance. Regardless, the IAT is a very noisy measure, and I wouldn't trust results that don't account for measurement error 🤓
8/9 Low loadings mean low reliability, and low reliability means summary measure correlations are attenuated by measurement error. But! Our models account for this, and model 2 reveals a latent correlation of r = .82. Model comparison adds to the story, altogether ruling out r = 0 and r =1
7/9 After fitting our models, we can derive standardized loadings that tell us the correlation between the latent construct(s) and the indicators (drift rates for the IAT; items for self-reports). The IAT loading is generally quite low, although so are some of the survey items:
6/9 So, these models better accounts for task- and -item specific variance when estimating the latent constructs, but we also need to consider measurement error. To do so, we model all data simultaneously, making different assumptions about the latent implicit-explicit correlation:
5/9 For self-report items, depending on whether the item requires a Likert vs continuous thermometer response, we can use categorical vs continuous models inspired by Item Response Theory (IRT). The left and right plots here illustrate how IRT models can capture either type of item:
4/9 Of course, summary measures like the IAT D-Score or summed scores from surveys both: (1) conflate the underlying construct of interest with task- or item-specific variation; and (2) ignore measurement error. We can do better with generative models, starting with a conflict model of the IAT:
3/9 The reason it gets complicated—meta-analyses consistently reveal correlations around r = .2-.3 between IAT D-scores and self-report measures, which IAT proponents use to claim discriminant validity (implicit constructs must be real and different from explicit). Our data follow this pattern:
2/9 The basic question we are trying to resolve is deceptively simple: "What is the relationship between implicit attitudes (as measured by the IAT) and explicit attitudes (as measured by self-report)?". This question has created a ton of controversy, both in and outside of academia
1/9 New blog is live! This is part 2 of a series—last time we looked at the Dunning-Kruger effect, now we are digging in to Implicit vs Explicit attitudes and the Implicit Association Test. To start, of course we need a good meme...
haines-lab.com/post/part-2-...
the hardware angle is truly bizarre, seems desperate tbh
i mean look at rabbit r1 lol
and a fun look into the data generating process for the implicit association test 🧐
coming soon! part 2 of the series (finally..), this time looking at implicit attitudes instead of the Dunning-Kruger effect—here is a sneak peek 🤓