Andy Grove's Avatar

Andy Grove

@andygrove.io

Apache Arrow & DataFusion PMC Member. Original creator of Apache DataFusion.

2,714
Followers
83
Following
52
Posts
03.07.2023
Joined
Posts Following

Latest posts by Andy Grove @andygrove.io

Post image

Helpful advice. Thanks, Claude.

27.01.2026 17:48 πŸ‘ 16 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
Databases in 2025: A Year in Review The world tried to kill Andy off but he had to stay alive to to talk about what happened with databases in 2025.

I've posted my latest recap of the world of databases: www.cs.cmu.edu/~pavlo/blog/...

All the hot topics from the last year:
β€’ More Postgres action!
β€’ MCP for everyone!
β€’ MongoDB gets litigious with FerretDB!
β€’ File formats!
β€’ Market movements!
β€’ The richest person in the history of the world!

05.01.2026 14:14 πŸ‘ 77 πŸ” 26 πŸ’¬ 1 πŸ“Œ 6
Preview
feat: Add microbenchmark for string functions by andygrove Β· Pull Request #26 Β· apache/datafusion-benchmarks This PR adds microbenchmarks for scanning a Parquet file and evaluating a single string expression per row. The benchmark runs against DuckDB and DataFusion and compares the results. Assuming that ...

Is there anyone in my network with DuckDB skills who could review a PR that runs a Python script to compare the performance of DataFusion and DuckDB for some simple SQL queries?

github.com/apache/dataf...

30.12.2025 21:52 πŸ‘ 3 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0
Future of Iceberg Support in Comet Β· Issue #2921 Β· apache/datafusion-comet What is the problem the feature request solves? Comet currently has two different approaches to scanning Iceberg tables. One approach is based on integrating with the Iceberg Java library, and the ...

There is a new Comet issue to discuss the future of Iceberg support and whether we should focus on using the iceberg-rust or Java implementation of Iceberg. Please add your thoughts if this is something that you care about!

github.com/apache/dataf...

17.12.2025 16:31 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Apache DataFusion Comet 0.11.0 Release - Apache DataFusion Blog

On behalf of the DataFusion PMC, I'm excited to announce the release of version 0.11.0 of the Comet accelerator for Apache Spark!

datafusion.apache.org/blog/2025/10...

22.10.2025 14:21 πŸ‘ 6 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

It’s steak night tonight and our dog is patiently waiting for her share.

06.10.2025 02:14 πŸ‘ 11 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0

I like the name β€œRAD stack” for this.

24.09.2025 13:16 πŸ‘ 8 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Apache DataFusion Comet 0.10.0 Release - Apache DataFusion Blog

Check out the latest release of the Comet accelerator for Apache Spark

datafusion.apache.org/blog/2025/09...

18.09.2025 03:06 πŸ‘ 10 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0
Video thumbnail

Introducing Iron Vector: native, columnar, vectorized, high-performance accelerator for Apache Flink SQL and Table API built on top of Rust, Arrow and DataFusion.

Reduce your Flink compute cost by up to 2x or handle 2x more data with the same infrastructure.

15.09.2025 16:04 πŸ‘ 17 πŸ” 5 πŸ’¬ 1 πŸ“Œ 1
Preview
crates.io phishing campaign | Rust Blog Empowering everyone to build reliable and efficient software.

We received reports of a phishing campaign targeting crates​.io users. Do not click on links asking to authenticate to protect your account. More information: blog.rust-lang.org/2025/09/12/c...

12.09.2025 14:22 πŸ‘ 112 πŸ” 57 πŸ’¬ 0 πŸ“Œ 2
Post image

Thanks to @clflushopt.bsky.social, make massive TPCH datasets with tpchgen-cli 2.0:

SF1000 (1TB raw, 220GB in @ApacheParquet ) in less than 10 mins (6m45s) on aging laptop

Try it now:

pip install tpchgen-cli
tpchgen-cli --scale-factor 1000 --parts 100 --format=parquet

github.com/clflushopt/t...

04.09.2025 12:51 πŸ‘ 4 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0
Post image

I've been helping our analytics team integrate our DataFusion-based query engine for Postgres into EDB Postgres Distributed and finally here's an end-to-end demo.

You get HA Postgres plus seamless replication and DataFusion-based queries. This query turned out 6x faster than PG.

04.09.2025 16:16 πŸ‘ 14 πŸ” 4 πŸ’¬ 1 πŸ“Œ 0
Post image

How my day is going

22.08.2025 19:45 πŸ‘ 7 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Comet Roadmap β€” Apache DataFusion Comet documentation

We now have a roadmap section in the Comet contributor guide, in case anyone was wondering what we are focusing on lately and what features will be arriving in future releases.

datafusion.apache.org/comet/contri...

20.08.2025 21:12 πŸ‘ 6 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
Software Engineer, ASE Cassandra Storage - Jobs - Careers at Apple Apply for a Software Engineer, ASE Cassandra Storage job at Apple. Read about the role and find out if it’s right for you.

Cassandra Team at Apple is searching for a fresh grad / person early in their career to join our ranks in SF/Bay Area!

Come work on super interesting problems with world class team. Help us build better Cassandra!

Ping me if you’re interested!

jobs.apple.com/en-us/detail...

18.07.2025 21:02 πŸ‘ 16 πŸ” 10 πŸ’¬ 0 πŸ“Œ 0
Preview
perf: Add performance tracing capability by andygrove Β· Pull Request #1706 Β· apache/datafusion-comet Which issue does this PR close? Closes #1705 Rationale for this change This feature makes it possible to visualize the flow of calls during query execution. What changes are included in this PR?...

It took me a really long time to understand the flow of execution between JVM and native code during query execution in Comet. I wish I had thought about adding a tracing capability earlier.

github.com/apache/dataf...

02.05.2025 15:08 πŸ‘ 6 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0
Apache DataFusion Python 46.0.0 Released - Apache DataFusion Blog

We're pleased to announce that Apache DataFusion in Python 46.0.0 is released! Since the last announcement post we've had a lot of great features and new contributors. Please check out the blog post with details.

datafusion.apache.org/blog/2025/03...

#DataFusion #Python #DataFrame #PyData #Apache

07.04.2025 12:27 πŸ‘ 6 πŸ” 2 πŸ’¬ 0 πŸ“Œ 0
Preview
Senior Software Development Engineer (Apache Spark) - Apple Data Platform - Jobs - Careers at Apple Apply for a Senior Software Development Engineer (Apache Spark) - Apple Data Platform job at Apple. Read about the role and find out if it’s right for you.

We have a position open in the Spark team at Apple, in our Cupertino, CA office. The role would include working on Apache DataFusion Comet.

jobs.apple.com/en-us/detail...

02.04.2025 17:28 πŸ‘ 14 πŸ” 6 πŸ’¬ 0 πŸ“Œ 0
Apache DataFusion Comet: Benchmarks Derived From TPC-H β€” Apache DataFusion Comet documentation

We have TPC-H benchmarks for single node with a small scale factor in the contributors guide. We only benchmark against Spark though and not against Spark RAPIDS.

datafusion.apache.org/comet/contri...

21.03.2025 00:49 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Here's the blog post announcing Comet 0.7.0

datafusion.apache.org/blog/2025/03...

21.03.2025 00:32 πŸ‘ 7 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0

I hate to say it, but "it depends". I'd recommend running your own benchmarks for your specific workloads. Performance will also vary greatly by environment (number of CPUs vs GPUs, different GPU types, and so on).

19.03.2025 22:20 πŸ‘ 0 πŸ” 0 πŸ’¬ 2 πŸ“Œ 0
Preview
GitHub - apache/datafusion-comet: Apache DataFusion Comet Spark Accelerator Apache DataFusion Comet Spark Accelerator. Contribute to apache/datafusion-comet development by creating an account on GitHub.

DataFusion Comet 0.7.0 is now available in Maven. We'll be publishing a blog post next week with all the details.

The repo has been updated with the latest benchmark results. For single executor TPC-H @ 100 GB, we now see a 2.2x increase over Spark (up from 2x in 0.6.0).

github.com/apache/dataf...

19.03.2025 17:11 πŸ‘ 12 πŸ” 1 πŸ’¬ 1 πŸ“Œ 1

One month on, and I have zero regrets about quitting Facebook & Instagram.

I have replaced the scrolling time with listening to podcasts.

I now stay in touch with family overseas via email and photo sharing, and I use Snapchat for sharing photos with immediate family, privately. Works great.

18.02.2025 17:55 πŸ‘ 15 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Preview
Comparing Apache, CNCF, and Commonhaus | cnr.sh I've used open source projects for over 30 years and contributed for about 20 of those. My first interaction with an open source foundation was with Apache when I began working with Apache Hadoop ...

Chris Riccomini (@chris.blue) shares his thoughts on Open Source foundations: Apache, CNCF, Commonhaus. He also explains why Commonhaus is a better fit for SlateDB

cnr.sh/posts/compar...

18.02.2025 12:49 πŸ‘ 14 πŸ” 6 πŸ’¬ 0 πŸ“Œ 0
Apache DataFusion Comet 0.6.0 Release - Apache DataFusion Blog

Comet 0.6.0 has been released. This is a smaller release than usual now that we have moved to an approximately monthly release cadence to match core DataFusion.

datafusion.apache.org/blog/2025/02...

18.02.2025 17:29 πŸ‘ 6 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Apache DataFusion Ballista 43.0.0 Released - Apache DataFusion Blog

Ballista 43.0.0 has been released, and now provides seamless integration with DataFusion.

datafusion.apache.org/blog/2025/02...

12.02.2025 17:49 πŸ‘ 16 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Apache DataFusion Community Meeting 2025/01/22 08:57 MST - Recording
Apache DataFusion Community Meeting 2025/01/22 08:57 MST - Recording YouTube video by Datadog

Check out this excellent presentation from @robtandy.bsky.social on his work with the DataFusion Ray project from last week's DataFusion community meetup.

It is a great overview of how to build a distributed system on top of DataFusion.

www.youtube.com/watch?v=ceTo...

29.01.2025 14:48 πŸ‘ 11 πŸ” 2 πŸ’¬ 1 πŸ“Œ 2
Preview
This Week in Comet (Jan 26) Β· Issue #1342 Β· apache/datafusion-comet Introduction These notes reflect things I am personally involved in or thinking about and may not cover all activities. Feel free to add comments for anything that I missed. Previous week's issue: ...

This Week in DataFusion Comet (Jan 26):

github.com/apache/dataf...

26.01.2025 20:13 πŸ‘ 6 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Communication β€” Apache DataFusion documentation

Is this using Arrow and/or DataFusion? If so, our Discord is probably a good place to ask.

datafusion.apache.org/contributor-...

23.01.2025 22:17 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

I've finally decided to quit using Facebook. My feed is overwhelmed with nonsense content that I am not interested in and cannot seem to block.

It is a real shame, though, because it was a good way to stay connected with family.

Is there a viable alternative? What are others using instead?

18.01.2025 17:45 πŸ‘ 6 πŸ” 0 πŸ’¬ 4 πŸ“Œ 1