We found more optimizations. Now fitting a infinite gaussian mixture model from 0 to 0.99 ARI on 1 Billion rows and 2 columns in < 20 sec. No variational inference. No subsampling. Just good old fashioned MCMC. π€―
We found more optimizations. Now fitting a infinite gaussian mixture model from 0 to 0.99 ARI on 1 Billion rows and 2 columns in < 20 sec. No variational inference. No subsampling. Just good old fashioned MCMC. π€―
This is Lace's successor, which is in still development. Hoping to have a demo showing scaling to trillions of records via distributed inference in couple of months.
To the people out there saying we need boosted trees and neural nets because we can't be #bayesian at scale: here I am using MCMC to fit a *100 million* row Infinite Mixture Model in #rustlang in less than 30 seconds on a Macbook Pro.
"Sorry, we can't make this technology that sucks and nobody wants and that uses enough power to blow up the moon unless we *also* steal people's shit to throw into our content woodchipper in order to produce mediocre digital particleboard out of the cumulative artbarf."
futurism.com/the-byte/ope...
If you have a lot of people in your house that like pancakes try a Dutch baby instead. Way easier. And fancier (according to my 6yo)
PPLs have struggled to gain traction in industry. Conventional wisdom blames scaling. I argue that PPLs' challenges aren't about scaling at all. They're about learning. And sometimes, to go faster, we need to slow down.
heresy.ai/a-better-ppl/
#bayesian #machinelearning
WRT the last post re: compile-time-generated Dirichlet process mixture models in #rustlang: we are doing a sweep of serial collapsed Gibbs on a 100k rows by 5 columns table in ~55ms on an M4 Macbook pro.
For those in the thread: we're comparing against standard serial Gibbs.
We have implementations of split-merge and parallel slice in Lace. The times per-iteration and to-converge are quite different for these kernels. It's usually best to alternate as they're better at different kinds of moves.
#rustlang has been an awesome choice for our probabilistic programming language backend. We've been experimenting with using declarative macros to build custom ML structures at compile time. We're seeing 3-4x inference speedups over using Vecs and enums π₯
Found a small number of errors in the UCI ML repo AI4I synthetic predictive maintenance dataset. Cleaned version hosted on our site.
note that data arenβt erroneous per se, the processes and code behind them are. Iβve used similar techniques to find bugs in my own code
redpoll.ai/blog/errors-...
Plover is a tool that finds errors/anomalies in databases. We were able to compile bits of it to web assembly (it is written in #rustlang) so you can try it in your browser client-side (no sending your data off to some server). If you have a s CSV, it's mostly drag and drop.
Decided to kick things off with something that is a bit inflammatory, but is my whole reason for being right now: AI is really hard to use for good.
heresy.ai/ai-leans-evil/