We're hiring 5 T/TT faculty in Neuroscience, including Computational Neuroscience, at U Notre Dame
We'll start reviewing applications very soon, so if you're thinking about applying, please apply now/soon!
apply.interfolio.com/173031
We're hiring 5 T/TT faculty in Neuroscience, including Computational Neuroscience, at U Notre Dame
We'll start reviewing applications very soon, so if you're thinking about applying, please apply now/soon!
apply.interfolio.com/173031
I'm posting this again for anyone who might have missed it last time:
Notre Dame is hiring 5 tenure or tenure-track professors in Neuroscience, including Computational Neuroscience, across 4 departments.
Feel free to reach out with any questions.
And please share!
apply.interfolio.com/173031
The University of Notre Dame is hiring 5 tenure or tenure-track professors in Neuroscience, including Computational Neuroscience, across 4 departments.
Come join me at ND! Feel free to reach out with any questions.
And please share!
apply.interfolio.com/173031
Couldn't the same argument be made for conference presentations (which 90% of the time only describe published work)?
When _you_ publish a new paper, lots of people notice, lots of people read it. No explainer thread needed. Deservedly so, because you have a reputation for writing great papers.
When Dr. Average Scientist publishes a paper, nobody notices, nobody reads it without some leg work to get it out there
Thanks! Let us know if you have comments or questions
In other words:
Plasticity rules like Oja's let us go beyond studying how synaptic plasticity in the brain can _match_ the performance of backprop.
Now, we can study how synaptic plasticity can _beat_ backprop in challenging, but realistic learning scenarios.
Finally, we meta-learned pure plasticity rules with no weight transport, extending our previous work. When Oja's rule was included, the meta-learned rule _outperformed_ pure backprop.
We find that Oja's rule works, in part, by preserving information about inputs in hidden layers. This is related to its known properties in forming orthogonal representations. Check the paper for more details.
Vanilla RNNs trained with pure BPTT fail on simple memory tasks. Adding Oja's rule to BPTT drastically improves performance.
We often forget how important careful weight initialization is for training neural nets because our software initializes them for us. Adding Oja's rule to backprop also eliminates the need for careful weight initialization.
We propose that plasticity rules like Oja's rule might be part of the answer. Adding Oja's rule to backprop improves learning in deep networks in an online setting (batch size 1).
For example, a 10-layer ffwd network trained on MNIST using online learning (batch size 1) performs poorly when trained with pure backprop. How does the brain learn effectively without all of these engineering hacks?
In our new preprint, we dug deeper into this observation. Our motivation is that modern machine learning depends on lots of engineering hacks beyond pure backprop: gradients averaged over batches, batchnorm, momentum, etc. These hacks don't have clear, direct biological analogues.
In previous work on this question, we meta-learned linear combos of plasticity rules. In doing so, we noticed something intersting:
One plasticity rule improved learning, but its weight updates weren't aligned with backprop's. It was doing something different. That rule is Oja's plasticity rule.
A lot of work in "NeuroAI," including our own, seeks to understand how synaptic plasticity rules can match the performance of backprop in training neural nets.
New preprint with my postdoc, Navid Shervani-Tabar, and former postdoc, Marzieh Alireza Mirhoseini.
Ojaโs plasticity rule overcomes challenges of training neural networks under biological constraints.
arxiv.org/abs/2408.08408
A scientific figure blueprint guide!
If this seems empty it's bcz I don't plan to use it anytime soon!
I made this figure panel size guide to avoid thinking about dimensions every time. Apparently this post is going to be a ๐งต! So feel free to bookmark it and save some time of yours.
Interesting comment, but you need to define what you mean by "neuroanatomy." Does such a thing actually exist? As a thing in itself or as a phenomenon? What would Kant have to say? ;)
Sorry, I didn't mean to phrase that antagonistically.
I just think that unless we're talking just about anatomy and we're restricting to a direct synaptic pathway (which maybe you are) then it's difficult to make this type of question precise without concluding that everything can query everything
Unless we're talking about a direct synapse, I don't know how we can expect to answer this question meaningfully when a neuromuscular junction in my pinky toe can "readout" and "query" photoreceptors in my retina.
Thanks. Yeah, I think this example helps clarify 2 points:
1) large negative eigenvalues are not necessary for LRS, and
2) high-dim input and stable dynamics are not sufficient for high-dim responses.
Motivated by this conversation, I added eigenvalues to the plot and edited the text a bit, thx!
Well deserved. Congratulations, Adrienne!
^ I feel like this is a problem you'd be good at tackling
One thing I tried to work out, but couldn't: We assumed a discrete number of large sing vals of W, but what if there a continuous, but slow decay (eg, power law).
How to derive the decay rate of the var expl vals in terms of the sing val decay rate and the overlap matrix?
Maybe it's possible to write this condition on sing vals of P in terms of eigenspectrum of W in a simple way, but I don't know how.
High-dim dynamics has additional constraints. but when the low rank part has rank>1, it's not just negative overlaps between sing vecs. Instead, the "overlap matrix" needs to lack small singular values.
Attached is an example (Fig 2d,e in paper) with pos and neg overlaps (P is the overlap matrix).
I don't think your reduction to eigenvalues does not capture everything, though.
For example, LRS is very general, occurs in the attached example where the dominant left- and right singular vectors are near-orthogonal. E-vals are negative, but O(1) in magnitude, not separated from bulk.
To clarify before I continue:
LRS is defined as the presence of a small number of suppressed directions (the last blue dot in the var expl figure we are replying to).
High-dim responses is the absence of a small number of amplified directions.
I attached our assumptions and conditions for each.
Your example with a small bulk and a separate e-val near 1 can be a normal matrix and would not give LRS. In fact, the separated e-val need not be near 1, just O(1) and <1. But the net would still produce high-dim responses.
We had this example in a draft, but it got removed, will add it to Supp.