Thanks! All the fiddling with SVGs was worth it! π
Thanks! All the fiddling with SVGs was worth it! π
π We released DuckDB v1.4.2, the second patch release of our LTS edition.
π We are shipping new Iceberg features, improved logger/profiler integration and several bugfixes. The new DuckDB version can also read and write Vortex files.
π For more details, read
duckdb.org/2025/11/12/a...
DuckDB does not support GraphQL. GraphQL itself is a bit of misnomer as it is not a full-fledged graph query language, it's primarily intended query REST endpoints. GQL and SQL/PGQ are full-fledged graph query languages, supporting both pattern matching and path finding.
π¬ We are hosting the first βDuckDB in Scienceβ meetup in London on September 4, co-located with VLDB 2025.
π We'll have two deep-dive presentations from DuckDB's developers and four lightning talks from community members.
π For details and registration, see duckdb.org/events/2025/...
π On September 4, we are hosting a new kind of meetup in London which will focus on the use of DuckDB in Science and Education!
β‘οΈ We still have some spots for lightning talks. If you're working with DuckDB in your research and/or classroom, consider sharing your story!
π duckdb.org/events/2025/...
π The DuckDB 1.3.2 bugfix release is out!
π¦ The Python and CLI clients are already on the latest version, while the rest will follow in the coming days.
π See the detailed change log at github.com/duckdb/duckd...
I work at DuckDB Labs so obviously I am biased but this really looks like a prime use case for @duckdb.org
Last year I reimplemented a lot of the cut / awk / csvkit examples of the βData Science at the Command Line Book in DuckDBβ book in DuckDB and got good results:
szarnyasg.org/posts/data-s...
I don't think there is such a test in DuckDB at the moment. You'd have to look at the binary code with a disassembler and try to find vector instructions.
It would be an interesting experiment to try to make use of RISC-V RVV but I'm not aware of any attempts.
In the official DuckDB code base, the engine doesn't have any platform-specific code to ensure portability. So it's up to the compilers to auto-vectorize the code.
I wrote a small Bash snippet to streamline my workflow for using Cloudflare R2 with the AWS CLI:
szarnyasg.org/posts/cloudf...
It turns out DuckDB can load Latin-2 encoded CSV files just fine with the combination of iconv and the shellfs extension.
New blog post by @a13x.bsky.social and @archie.sarrewood.com:
Reading and Writing Google Sheets in DuckDB
This post introduces the gsheets community extension, which enables DuckDB to directly read from and write to Google Sheets.
duckdb.org/2025/02/26/g...
Here is a query making use of prefix aliases in all three clauses:
SELECT
"Station name": s.name_short,
"Max distance": max(d.distance)
FROM s: 's3://duckdb-blobs/stations.parquet'
JOIN d: 's3://duckdb-blobs/distances.parquet'
ON d.station1 = s.code
GROUP BY ALL
ORDER BY "Max distance" DESC;
I recently added your instructions for building DuckDB on RISC-V to the DuckDB documentation: duckdb.org/docs/dev/bui...
Thanks for the great work on this!
I don't think this is possible in the moment. I would go the other route and try to do unnest and join. To save memory, you could peel away the nested column (CREATE TEMP TABLE tmp AS SELECT column FROM original_table), and do the unnest and join on this table, then join it back to the original.
The list_reduce function iterates through the list and picks the correct categoriy.
You can generalize this and put a MAP value into the list_reduce function to capture the mapping, then do exact matching on the MAP's keys. For more details, see list_reduce in the docs: duckdb.org/docs/sql/fun...
I ran into a similar problem recently when I needed to categorize posts into according to their length:
β 0: 0 β€ length < 40
β 1: 40 β€ length < 80
β 2: 80 β€ length < 160
β 3: 160 β€ length
I came up with this:
list_reduce([0, 40, 80, 160], (acc, x, i) -> IF(x <= length, i - 1, acc)) AS category
My post on DuckDB vs. wc received a lot of feedback. Based on these, I ran a few more experiments to see how DuckDB stacks up against parallelized wc and grep/ripgrep on Linux.
I wrote up my results in a blog post.
TL;DR: it depends but DuckDB is still pretty fast!
szarnyasg.org/posts/duckdb...
Oops, that's the difference of reading the CSV with or without its header. Well-spotted!
3) The ts command adds a timestamp at the beginning of each line. On macOS, it's available in the moreutils package on Homebrew.
2) A single sed command can include multiple search and replace pairs separated by semicolon. This makes sed commands *even less readable*, so use it with caution.
1) The bat tool β an alternative to cat β prints the newline characters if it's invoked with the -A switch. This output mode reveals whether a file is using CR/LF or LF newlines (or both).
I solved the exercises in Chapter 5 of the "Data Science at the Command Line" book using the @duckdb.org CLI client. Most exercises had a straightforward solution in DuckDB's SQL β see here:
szarnyasg.org/posts/data-s...
I gathered some ideas for DuckDB blog posts and learned a few new CLI tricks: