Iβll be honest, though, it was the Peep Show reference in the article image that got me reading!
Iβll be honest, though, it was the Peep Show reference in the article image that got me reading!
3. Use tiered asynchronous checkpointing (DRAM -> node local storage -> shared storage) to avoid blocking GPUs
2. A fast parallel distributed file system is not needed for training, adds unnecessary complexity to the system
Interesting points I picked up
1. To keep GPUs busy in LLM training use node local SSDs to store training
Really interesting article about storage needs of LLM training/inference, covering data gathering/preprocessing, training, and inferencing
I used this to do an editable pip install of a src-layout Python package I was working on in the dev container, with all the development dependencies. Now when the dev container is launched, the Python package is already installed and ready to develop/test/debug! Very handy.
A really nice feature is that dev containers make it very easy to do post-build customisation of a base image/Containerfile involving the source code being developed. You specify commands to run in the built container with your source code workspace (from the host) mounted into the container.
Quite impressed. After reading the VSCode dev containers docs (code.visualstudio.com/docs/devcont...), it didnβt take long for me to get a containerised dev environment set upβ¦
My end of year work Xmas present to myself was to spend the afternoon of my last work day playing with dev containers (containers.dev), a tool I had been wanting to try out for some time π
An unordered list of interesting things I learned:
* How to write a ReFrame test
* What a roofline model is
* There are many novel/exotic hardware testbeds dotted around the UK (thanks to the ExCALIBUR H&ES programme)
* People really like to see and hold the GH200 and Grace CPU superchips
I had a great time at #CIUK. My first time at the event. Still processing all the input received!
Hello, BlueSky π
Iβm not sure how or if you fit into my life, but letβs give it a try and find out!