Gabor Szarnyas's Avatar

Gabor Szarnyas

@szarnyasg.org

Head of DevRel at DuckDB Labs

577
Followers
82
Following
18
Posts
28.10.2024
Joined
Posts Following

Latest posts by Gabor Szarnyas @szarnyasg.org

Thanks! All the fiddling with SVGs was worth it! πŸ™Œ

17.11.2025 21:34 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

πŸš€ We released DuckDB v1.4.2, the second patch release of our LTS edition.

πŸ”Ž We are shipping new Iceberg features, improved logger/profiler integration and several bugfixes. The new DuckDB version can also read and write Vortex files.

πŸ“– For more details, read
duckdb.org/2025/11/12/a...

12.11.2025 13:22 πŸ‘ 39 πŸ” 5 πŸ’¬ 0 πŸ“Œ 1

DuckDB does not support GraphQL. GraphQL itself is a bit of misnomer as it is not a full-fledged graph query language, it's primarily intended query REST endpoints. GQL and SQL/PGQ are full-fledged graph query languages, supporting both pattern matching and path finding.

24.10.2025 07:31 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

πŸ”¬ We are hosting the first β€œDuckDB in Science” meetup in London on September 4, co-located with VLDB 2025.

πŸ” We'll have two deep-dive presentations from DuckDB's developers and four lightning talks from community members.

πŸ“ For details and registration, see duckdb.org/events/2025/...

28.08.2025 22:03 πŸ‘ 11 πŸ” 4 πŸ’¬ 0 πŸ“Œ 0
Preview
DuckDB Meetup on Science and Education in London DuckDB is an in-process SQL database management system focused on analytical query processing. It is designed to be easy to install and easy to use. DuckDB has no external dependencies. DuckDB has bin...

πŸŽ“ On September 4, we are hosting a new kind of meetup in London which will focus on the use of DuckDB in Science and Education!

⚑️ We still have some spots for lightning talks. If you're working with DuckDB in your research and/or classroom, consider sharing your story!

πŸ”— duckdb.org/events/2025/...

18.08.2025 18:41 πŸ‘ 13 πŸ” 5 πŸ’¬ 0 πŸ“Œ 0
Post image

πŸš€ The DuckDB 1.3.2 bugfix release is out!

πŸ“¦ The Python and CLI clients are already on the latest version, while the rest will follow in the coming days.

πŸ”– See the detailed change log at github.com/duckdb/duckd...

08.07.2025 11:01 πŸ‘ 24 πŸ” 6 πŸ’¬ 0 πŸ“Œ 1
Data Science at the Command Line Book in DuckDB Today I solved the exercises in Chapter 5 of the Data Science at the Command Line book using the DuckDB command line client. This page documents my solutions. Prerequisites Clone the https://github.co...

I work at DuckDB Labs so obviously I am biased but this really looks like a prime use case for @duckdb.org

Last year I reimplemented a lot of the cut / awk / csvkit examples of the β€œData Science at the Command Line Book in DuckDBβ€œ book in DuckDB and got good results:

szarnyasg.org/posts/data-s...

16.05.2025 12:13 πŸ‘ 4 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0

I don't think there is such a test in DuckDB at the moment. You'd have to look at the binary code with a disassembler and try to find vector instructions.

09.05.2025 08:30 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

It would be an interesting experiment to try to make use of RISC-V RVV but I'm not aware of any attempts.

In the official DuckDB code base, the engine doesn't have any platform-specific code to ensure portability. So it's up to the compilers to auto-vectorize the code.

08.05.2025 10:12 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Cloudflare R2 command line snippet I am a big fan of Cloudflare R2, an object storage that provides egress-free downloads. R2 is compatible with the AWS S3 API, so you can use the AWS CLI tool – with a few caveats. These include: You n...

I wrote a small Bash snippet to streamline my workflow for using Cloudflare R2 with the AWS CLI:

szarnyasg.org/posts/cloudf...

06.04.2025 06:09 πŸ‘ 3 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

It turns out DuckDB can load Latin-2 encoded CSV files just fine with the combination of iconv and the shellfs extension.

07.03.2025 13:13 πŸ‘ 16 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0
Post image

New blog post by @a13x.bsky.social and @archie.sarrewood.com:

Reading and Writing Google Sheets in DuckDB

This post introduces the gsheets community extension, which enables DuckDB to directly read from and write to Google Sheets.

duckdb.org/2025/02/26/g...

28.02.2025 09:53 πŸ‘ 32 πŸ” 5 πŸ’¬ 1 πŸ“Œ 5

Here is a query making use of prefix aliases in all three clauses:

SELECT
"Station name": s.name_short,
"Max distance": max(d.distance)
FROM s: 's3://duckdb-blobs/stations.parquet'
JOIN d: 's3://duckdb-blobs/distances.parquet'
ON d.station1 = s.code
GROUP BY ALL
ORDER BY "Max distance" DESC;

25.02.2025 15:04 πŸ‘ 2 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Preview
Unofficial and Unsupported Platforms Warning The platforms listed on this page are not officially supported. The build instructions are provided on a best-effort basis. Community contributions are very welcome. DuckDB is built and distri...

I recently added your instructions for building DuckDB on RISC-V to the DuckDB documentation: duckdb.org/docs/dev/bui...

Thanks for the great work on this!

21.02.2025 20:10 πŸ‘ 6 πŸ” 0 πŸ’¬ 2 πŸ“Œ 0

I don't think this is possible in the moment. I would go the other route and try to do unnest and join. To save memory, you could peel away the nested column (CREATE TEMP TABLE tmp AS SELECT column FROM original_table), and do the unnest and join on this table, then join it back to the original.

19.02.2025 11:19 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
Lambda Functions Lambda functions enable the use of more complex and flexible expressions in queries. DuckDB supports several scalar functions that operate on LISTs and accept lambda functions as parameters in the for...

The list_reduce function iterates through the list and picks the correct categoriy.

You can generalize this and put a MAP value into the list_reduce function to capture the mapping, then do exact matching on the MAP's keys. For more details, see list_reduce in the docs: duckdb.org/docs/sql/fun...

19.02.2025 10:27 πŸ‘ 2 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

I ran into a similar problem recently when I needed to categorize posts into according to their length:

– 0: 0 ≀ length < 40
– 1: 40 ≀ length < 80
– 2: 80 ≀ length < 160
– 3: 160 ≀ length

I came up with this:

list_reduce([0, 40, 80, 160], (acc, x, i) -> IF(x <= length, i - 1, acc)) AS category

19.02.2025 10:27 πŸ‘ 2 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

My post on DuckDB vs. wc received a lot of feedback. Based on these, I ran a few more experiments to see how DuckDB stacks up against parallelized wc and grep/ripgrep on Linux.

I wrote up my results in a blog post.

TL;DR: it depends but DuckDB is still pretty fast!
szarnyasg.org/posts/duckdb...

04.12.2024 21:25 πŸ‘ 13 πŸ” 2 πŸ’¬ 1 πŸ“Œ 1

Oops, that's the difference of reading the CSV with or without its header. Well-spotted!

02.12.2024 22:58 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

3) The ts command adds a timestamp at the beginning of each line. On macOS, it's available in the moreutils package on Homebrew.

30.11.2024 19:50 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

2) A single sed command can include multiple search and replace pairs separated by semicolon. This makes sed commands *even less readable*, so use it with caution.

30.11.2024 19:50 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

1) The bat tool – an alternative to cat – prints the newline characters if it's invoked with the -A switch. This output mode reveals whether a file is using CR/LF or LF newlines (or both).

30.11.2024 19:50 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Data Science at the Command Line Book in DuckDB Today I solved the exercises in Chapter 5 of the Data Science at the Command Line book using the DuckDB command line client. This page documents my solutions. Prerequisites Clone the https://github.co...

I solved the exercises in Chapter 5 of the "Data Science at the Command Line" book using the @duckdb.org CLI client. Most exercises had a straightforward solution in DuckDB's SQL – see here:

szarnyasg.org/posts/data-s...

I gathered some ideas for DuckDB blog posts and learned a few new CLI tricks:

30.11.2024 19:50 πŸ‘ 17 πŸ” 2 πŸ’¬ 1 πŸ“Œ 0