's Avatar

@braaannigan

119
Followers
180
Following
158
Posts
18.11.2023
Joined
Posts Following

Latest posts by @braaannigan

@davidho.bsky.social Do you know why my post doesn't appear on the oceanography feed?

27.02.2026 12:44 👍 0 🔁 0 💬 1 📌 0
Preview
MITgcm/MITgcm | DeepWiki This document provides a high-level introduction to the MITgcm (MIT General Circulation Model) repository structure, core architecture, and key systems. MITgcm is a flexible ocean and climate modeling

Want to understand the source code of the MITgcm ocean model? We can do this using the deepwiki tool which uses LLMs to build detailed docs

deepwiki.com/MITgcm/MITgcm
🌊

27.02.2026 12:43 👍 1 🔁 0 💬 1 📌 0
Getting started | uv uv is an extremely fast Python package and project manager, written in Rust.

If you work with python I highly recommend uv as the single tool you use to manage:
- installing python for a project
- creating and running virtual envs
- managing dependencies
- packaging to run on other machines

docs.astral.sh/uv/getting-s...

It's faster and more comprehensive than pip/venv/etc

26.02.2026 09:46 👍 3 🔁 0 💬 0 📌 0

Lots of scientists still use Jupyter notebooks for analysis, but these don't integrate well with agentic coding.

As an alternative I'd suggest marimo notebooks, which have a similar interface but which an agent can run like a script

24.02.2026 10:52 👍 4 🔁 0 💬 0 📌 0
Post image

Polars has built in date/datetime/duration functions. I use them a lot because they have a consistent API across python versions and the syntax for working with timezones is a lot easier to remember than Python datetimes!

26.09.2025 10:03 👍 0 🔁 0 💬 0 📌 0
Post image

Polars has neat built-in approaches for casting common string datetime formats these days, so long .str.strfmt followed by some pattern I could never remember

25.09.2025 15:32 👍 0 🔁 0 💬 0 📌 0
Post image

Need to find performance bottlenecks? Then pyinstrument is an excellent tool. Recently it showed me that my pipeline run weren't slow because of my data - it was because I was re-authenticating to AWS every time. You get this nice visual which makes it easy to spot the laggards

08.09.2025 10:03 👍 0 🔁 0 💬 0 📌 0

I'm finding that O3 generates technically valid Polars code, but it leans very heavily on working with Series like numpy arrays and never comes close to proper lazy mode Polars syntax

10.07.2025 14:03 👍 0 🔁 0 💬 0 📌 0
Post image

New blog post from NVIDIA and Polars showing how you can process datasets too large to fit on GPU memory (link below). For a single GPU it may be best to use the spill-to-system-memory approach while for mutli-gpus there is a new streaming engine approach

02.07.2025 13:02 👍 2 🔁 0 💬 0 📌 0
Preview
Generating Polars code with LLMs - Polars user guide Generating Polars code with LLMs Large Language Models (LLMs) can sometimes return Pandas code or invalid Polars code in their output. This guide presents approaches that help LLMs generate valid Polars code more consistently. These approaches have been developed by the Polars community through test...

I put together a user guide page on getting the best Polars code from LLMs. That was months ago, however!  How do you think it needs to be updated?

20.06.2025 10:31 👍 0 🔁 0 💬 0 📌 0
Post image

As projects mature you will want to invest in a tool to validate the schema and data in your dataframes. This blog post sets out a good summary on the different options for Polars users: https://posit-dev.github.io/pointblank/blog/validation-libs-2025/

18.06.2025 10:03 👍 2 🔁 0 💬 0 📌 0
Post image

Pypi download stats work in mysterious ways. In the last few months Polars exhibited low continuous growth. Then basically overnight downloads almost double and become much more variable. Why?

09.06.2025 11:31 👍 1 🔁 0 💬 1 📌 0

Let me count the ways that lazy mode in Polars ❤️ Parquet files

1. Polars can get the schema to start the query
2. Polars can use projection pushdown to subset columns
3. Polars can use predicate pushdown to limit the row groups it reads from the file when a filter is applied

19.05.2025 09:11 👍 1 🔁 0 💬 0 📌 0
Forecasting: Principles and Practice, the Pythonic Way

Interested in forecasting in python? A major new free online textbook by the leading forecasting academics and practitioners has been released: https://otexts.com/fpppy/

This adapts Rob Hyndman's excellent R forecast book to the python world

07.05.2025 09:02 👍 2 🔁 0 💬 1 📌 0
Post image

Using pytest with Polars? When there's an error the default traceback is often very long and you have to scroll through a lot to get to the relevant part. You can make it snappier by passing --tb=short to your pytest command to get to the point!

01.05.2025 12:31 👍 0 🔁 0 💬 0 📌 0
Post image

You can add a new column to a Polars DataFrame at a specified index position with insert_column. Your data needs to be a Polars Series first

01.05.2025 10:03 👍 1 🔁 0 💬 0 📌 0

One habit I've picked up with LLMs: if I'm working in a terminal but have to much data to read then I generate a function that takes my dataframe and produces a html page with plotly charts that I can then open in the browser. Basically an on-demand dashboard

30.04.2025 13:03 👍 0 🔁 0 💬 0 📌 0
Post image

We can handle tricky JSON with Polars nested dtypes.

Here we have a list of dicts. But each row also contains a list of dicts. We deal with this by exploding the inner list of dicts to get each entry on its own row. Then we unnest the inner dicts so each field is its own column

30.04.2025 09:12 👍 1 🔁 0 💬 1 📌 0

Not at the moment, I'm afraid, they come from my O'Reilly workshop

28.04.2025 10:24 👍 1 🔁 0 💬 0 📌 0

It should be called look-at-the-data science

24.04.2025 12:30 👍 2 🔁 0 💬 0 📌 0
Post image

One thing to be careful with Polars is using pl.when.then in cases where it isn't needed as Polars pre-calculates all of the possible paths. It may be that a pl.when.then can be replaced by a join or replace_strict. This query is 5x faster as a join for example

22.04.2025 09:02 👍 1 🔁 0 💬 0 📌 0
Post image

One thing to be careful with Polars is using pl.when.then in cases where it isn't needed as Polars pre-calculates all of the possible paths. It may be that a pl.when.then can be replaced by a join or replace_strict. This query is 5x faster as a join for example

17.04.2025 09:02 👍 1 🔁 0 💬 0 📌 0

GPUs are a great fit for for dataframes, but use remains niche. However, the sheer volume of GPU manufacturing capacity means the cost/hassle of using them will drop. NVIDIA is pushing forward on the software side with Polars to make this a much more common experience

14.04.2025 15:02 👍 1 🔁 0 💬 0 📌 0

The XGBoost Random Forest (XGBRFRegressor) is a criminally underrated forecasting model. You can see how overlooked it is by the fact that if I ask LLMs to use the XGBoost Random Forest they still start using the extremely slow sklearn Random Forest instead

06.04.2025 09:01 👍 3 🔁 0 💬 1 📌 0
Post image

Polars has native support for nested data types - it's a long way from object columns with Python dictionaries in Pandas. Native support means Polars has an API built to work with nested data and a query engine that can do vectorized transformations on nested data

02.04.2025 10:02 👍 2 🔁 0 💬 0 📌 0
Post image

One tool I use a lot these days is token-count. I use it to check how many tokens there are in one or more files before adding them to model context. It's a command line tool that can be pip installed. In this example we see that there are 300k tokens in just one Polars crate!

01.04.2025 08:01 👍 0 🔁 0 💬 0 📌 0
Post image

Frantically trying to finish my Polars LLM evals experiments before my online event on Wednesday. I'll be evaluating which models work best for Polars and how you can prompt engineer to even better results. Deepseek-v3 the hot (and cheap) new entrant!

30.03.2025 09:01 👍 2 🔁 0 💬 0 📌 0
Post image

You can change display properties for Polars with pl.Config settings. In the snippet below I change to markdown format. This can be very handy - in JIRA, for example, with the markdown format a dataframe renders as a nice table rather than a mess of data

28.03.2025 10:45 👍 1 🔁 0 💬 0 📌 0
Post image

You can set a default engine for Polars instead of specifying it in every .collect statement. You do this with the POLARS_ENGINE_AFFINITY env var. The options are in-memory (default), streaming or gpu. If your query isn't supported with the last 2 then it reverts to in-memory

26.03.2025 10:01 👍 3 🔁 0 💬 1 📌 0
Post image

We can make a column based on if-elif-else in Polars with when.then.otherwise. The trick is that we can chain together as many when.thens as we need.

In this example we classify under 18 as a child, 18-64 as working age and over 64 as retired (as if any of us will retire at 65😭

25.03.2025 09:18 👍 4 🔁 0 💬 0 📌 0