August ML coding challenge is alive and kicking - join for free and learn something with me π
www.sarahglasmacher.com/ml-repo-stru...
August ML coding challenge is alive and kicking - join for free and learn something with me π
www.sarahglasmacher.com/ml-repo-stru...
now iβll go and finish the setup for my august coding challenge and send out the newsletter and maybe iβll get back to my optuna code after running or tomorrow π€π»
galaxy-inferno-codes.kit.com
this week was a bit full and nothing felt like enough with the upcoming deadline, but we made great progress in the end merging all our feature branches together and still are getting a tracked and registered machine learning model in the end
happy friday, coders β¨οΈπ technically the work week is over - however I have an important project presentation next tuesday, so I think iβll tweak my gridsearch some more π«£
for me itβs also βitβs more and more embarrassing to be wrong or to admit I donβt knowβ π«€
personally, I believe we should admit to not knowing a lot more than we do, but the longer Iβm in corporate, the more I think it damages my image if I do admit this publicly.
Being an adult truly involves a lot of cleaning - not only your apartment but for some reason also your computer? And your phone camera roll (don't tell me I'm the only one with 2k screenshots π«£) - and then there is work documentation to do as well. Anyways, I did some cleaning this weekend
this one felt okay though & I think itβs important for data scientists to move more towards tools like this to ever be able to deploy projects. Thatβs my large goal for the rest of the year: how can we enable data science teams to develop βbetterβ code in small achievable steps π£
published my 4th article of the year on my blog π₯³ and my first ever post about mlflow, where iβm trying to overcome my imposter syndrome: iβm learning a lot, but am far from an expert and am sooo afraid of giving bad advice π«
buff.ly/pwLyjbL
my linkedin post finally published *with* its image attached π₯³
(also someone pls tell me why writing 400 word social posts feels easy but writing an 800 word blog post feels impossible?)
βRustβ always trips me up because I can never tell on first glance if weβre talking about the game or the coding language - somehow my brain always defaults to the game and ends up confused π€·π»ββοΈ
itβs not just the pay - the mindset is so extremely risk-averse and stubborn in my experience. And Iβm in Germany, weβre not big risk takers to begin with, but public companies are an extreme even here. Itβs really hard to change things or create new ideas.. at some point you get tired of it
No thank you, chatgpt, i would *not* like a printable tracker, I havenβt used my printer in 5 years π π« Is this a new thing or did I somehow tell it I love printing things?
what are the more appealing options? iβm out of the loop
Another day of triple checking all data andβ¦ finding out two temperature columns are not the same?! π« π€ I will never take sorted and orderly data for granted again.
New newsletter: my team and I were deep into a data audit - the forecasting model has already been trained - so the data should be very clean and complete, right? After all, someone has already worked with it... Spoiler alert: our data wasnβt nearly as complete as we thought. π
Is there any good source on comparing which LLMs/providers people actually use day to day in terms of percentages? I feel like we always hear from vocal minorities when new models launch, but how much has, for example, Gemini actually grown on the market in total? π
not sure, i just tried to google it but most results just describe βnormalβ packages. I do the βpip install -e .β in the folder of my local library and then add it as a dev dependency to my project.toml too. Maybe not quite the *correct* uv way, but at least it works
not sure if i understand your set up correctly, but i think you need to build the library first before you can install it? Or run something like βpip install -e .β which keeps updating the build on changes π€ not sure if βuv addβ handles all of that
building the prototype was never the real challenge?! so I struggle to see the βlife changingβ advantage here
building a fast prototype is nice and all, but if itβs fast and dirty, it tells you nothing about the upcoming challenges in the deployment process and you will need to start completely from scratch to build it into a fully deployable and maintainable product π€·π»ββοΈ
vibe coding?
see you in 2 years, when you start discussing βvibe debuggingβ, βvibe tech debtβ, βvibe monitoringβ, βvibe deploymentβ, βvibe securityβ and βvibe cost analyticsβ
However, it also seems like many embedding models are *trained* using cosine similarity for the loss function, so in a way it makes sense to use it for retrieval via embeddings too - and many of the vector metrics share properties, so it's not like another metric computes smth completely different
FYI even OpenAI say in their documentation of text-embedding-3-large the following: "We recommend cosine similarity. The choice of distance function typically doesn't matter much." π π€·ββοΈ platform.openai.com/docs/guides/...
I just wanted to write a quick tutorial on how I've used cosine similarity in pgvector to search for RAG sources in my side project... and now I'm doing a whole deep dive into "why tf are we using cosine similarity at all?" π€ and that's why I'm a very inconsistent content creator
today I added all my Obsidian/second brain markdown files into the local RAG vector store and did some small improvements to the retrieval to get a flexible number of results back
π©π»βπ» RAG side project update: Somehow I thought that fine-tuning a similarity threshold for RAG would be super difficult, but it turns out for my use case the βdefaultβ of 0.5 works just fine π€·π»ββοΈπ
this will be mainly an exploration task, i donβt mean to solve the whole problem in 4 days, but 4 days is enough to try out 1-2 simple techniques in a notebook or quick script and write a short post about the problem itself, so end of month sprint, letβs go π§¨
I promised I would figure out what to do with my side project - here we are: i quickly threw together a short post on my website about a veery quick February project π¬
I said I would do a coding project each month, and I donβt intend to break that promise even if it means doing it in 4 days π
the irony of starting a RAG project & then immediately being thrown into a time series forecasting project at workπ so now iβm spending all my free time learning about time series. Itβs not even just about the available time - i find it incredibly hard to go βsponge-modeβ on two topics at once. π§½
iβve been in the office quite a bit more lately, the parking situation is horrendous but the smaller building is cozy - itβs pretty much exclusively IT of the company so I know most of the people I meet in the hallway and we have our own bigger kitchen to hang out in π