Xiangpeng Hao's Avatar

Xiangpeng Hao

@xiangpeng.systems

Database/storage Flight/DataFusion/Arrow/Parquet PhD student@UW-Madison https://xiangpeng.systems

1,031
Followers
121
Following
36
Posts
22.10.2024
Joined
Posts Following

Latest posts by Xiangpeng Hao @xiangpeng.systems

Preview
parquet-linter: A better Parquet is Parquet itself – Xiangpeng’s blog Unleash the performance potential of your Parquet files

Simply applying basic linting rules (like don't compress pages where it doesn't help) reduces parquet files sizes by 5% and decreases decode time by 20%.

@xiangpeng.systems shows how in his latest blog
blog.xiangpeng.systems/posts/parque...

23.02.2026 15:26 👍 19 🔁 2 💬 0 📌 0

Thank you for sharing, glad to see many people think alike!

04.02.2026 15:30 👍 2 🔁 0 💬 0 📌 0
Preview
Stop building systems for agents – Xiangpeng’s blog Build agent systems for human.

Stop building systems for agents, build systems for human.
We need infrastructures to help us holding accountability of agent's code.
blog.xiangpeng.systems/posts/stop-b...

02.02.2026 17:34 👍 6 🔁 1 💬 1 📌 0
Preview
GitHub - microsoft/bf-tree: Bf-Tree is a modern read-write-optimized concurrent larger-than-memory range index in Rust from MS Research. Bf-Tree is a modern read-write-optimized concurrent larger-than-memory range index in Rust from MS Research. - microsoft/bf-tree

After two years since publication, Bf-Tree is finally open-sourced github.com/microsoft/bf...
The GitHub repo just hit the Hacker News front page 🎉🎉🎉

29.01.2026 02:31 👍 9 🔁 0 💬 0 📌 0

Mitchell is the creator of the github.com/mosure/bevy_... render-pipeline plugin for Bevy, and he’ll dive into real-time radiance-field rendering, GPU data layouts, kernels, profiling, and more.

03.12.2025 16:12 👍 5 🔁 2 💬 0 📌 0

Appreciate the kind words!

03.11.2025 23:24 👍 1 🔁 0 💬 0 📌 0

Nice to see this getting shared! 🙌 Now I’m even more motivated to turn it into a full course.

29.10.2025 19:01 👍 4 🔁 0 💬 0 📌 0

Just like other big cities, Madison is getting its own systems talk series. Come join us!

24.10.2025 20:08 👍 7 🔁 1 💬 0 📌 0

LiquidCache a distributed pushdown cache for DataFusion, designed to cut down S3 requests for diskless databases.

💻 Code: github.com/XiangpengHao...
📄 Paper (VLDB 2026): github.com/XiangpengHao...

10.09.2025 20:48 👍 9 🔁 0 💬 0 📌 0
What is LiquidCache?

Thanks you for sharing! slides are here 👉 what-is-liquid-cache.xiangpeng.systems

02.09.2025 00:26 👍 2 🔁 0 💬 0 📌 0

Hey Tyler 👋 welcome back! I'd be happy to chat, I work in the data systems space (database + storage + cloud) from the same group that also studies storage fault!

01.08.2025 19:52 👍 2 🔁 0 💬 0 📌 0
Preview
GitHub - XiangpengHao/liquid-cache: 10x lower latency for cloud-native DataFusion 10x lower latency for cloud-native DataFusion. Contribute to XiangpengHao/liquid-cache development by creating an account on GitHub.

Project repo: github.com/XiangpengHao...

16.05.2025 00:53 👍 6 🔁 0 💬 0 📌 0
Data-Aware Caching for Cloud Analytics

Data-Aware Caching for Cloud Analytics

Join my PhD prelim talk next Monday:

Data-Aware Caching for Cloud Analytics

🕐 May 19, 1PM CDT
📍 CS2310 or Zoom: uwmadison.zoom.us/j/3081128886

16.05.2025 00:52 👍 8 🔁 1 💬 1 📌 0
Preview
Optimizing SQL (and DataFrames) in DataFusion: Part 1 This post reviews what a Query Optimizer is, what it does, and why you need one for SQL and DataFrames. It also describes how industrial Query Optimizers are structured and standard optimization class...

My manifesto on optimizing SQL and DataFrames in query engines (including an explanation of why Apache DataFusion doesn't have a complex join ordering algorithm):
www.influxdata.com/blog/optimiz... www.influxdata.com/blog/optimiz...

04.04.2025 16:41 👍 7 🔁 1 💬 0 📌 0
Preview
Build your own S3-Select in 400 lines of Rust – Xiangpeng’s blog DataFusion is ALL YOU NEED

New blog post: "Build your own S3-Select in 400 lines of Rust"

Check it out 😉: blog.xiangpeng.systems/posts/build-...

24.03.2025 14:13 👍 10 🔁 3 💬 0 📌 0
Preview
GitHub - excalidraw/excalidraw: Virtual whiteboard for sketching hand-drawn like diagrams Virtual whiteboard for sketching hand-drawn like diagrams - excalidraw/excalidraw

Credit goes to github.com/excalidraw/e... for making it easy😉

14.03.2025 14:00 👍 0 🔁 0 💬 1 📌 0
Experimental parquet decoder with first-class selection pushdown support by XiangpengHao · Pull Request #6921 · apache/arrow-rs Which issue does this PR close? Many long lasting issues in DataFusion and Parquet. Note that this PR may or may not close these issues, but (imo) it will be the foundation to future more optimiza...

Here's the PR: github.com/apache/arrow...

13.03.2025 18:36 👍 1 🔁 0 💬 0 📌 0
Preview
Efficient Filter Pushdown in Parquet – Xiangpeng’s blog How to implement efficient filter pushdown in Parquet readers and why it’s challenging in practice.

I submitted a PR that cuts average ClickBench latency by 15% for DataFusion! But reviewing it wasn't straightforward due to the nature of complex performance tuning dynamics, so I made a blog post to explain why it works -- check it out: blog.xiangpeng.systems/posts/parque...

13.03.2025 18:36 👍 16 🔁 2 💬 2 📌 0
Evil Scheduler: Mastering Concurrency Through Interactive Debugging – Ao Li TLDR Watch the video below to see how Fray debugger works! I enjoy the concept of Deadlock Empire, an interactive game that teaches the semantics of locks and other concurrency primitives. The core id...

We are excited to share Fray Debugger (aoli.al/blogs/deadlo...), an IntelliJ plugin that allows you to control concurrent execution deterministically!

We have translated the Deadlock Empire (deadlockempire.github.io) into Java to demonstrate how to use Fray Debugger.

12.03.2025 19:25 👍 3 🔁 1 💬 1 📌 0

Meanwhile, as a PhD student, I still feel frustrated comparing my systems to many ideas that seem novel but lack practical impact. That said, I find “feet on the ground, head in the clouds” research very inspiring -- it’s probably what keeps me motivated to stay in academia.

10.03.2025 19:11 👍 1 🔁 0 💬 0 📌 0

Thanks for the insightful points, Marc! I totally agree that academia is important in many areas. I'm planning a follow-up post discussing the kinds of research that are impactful and beneficial to people, and your examples strongly resonate with what I have in mind!

10.03.2025 19:05 👍 1 🔁 0 💬 0 📌 0

Thanks for sharing your perspective! It’s always helpful to hear insights from folks who’ve spent time in industry. There’s definitely room for academia to evolve, and I’m hopeful it will :)

10.03.2025 18:49 👍 1 🔁 0 💬 0 📌 0
Post image

@xiangpeng.systems shared a great post about system researchers. I wrote a comment on it and would like to share some thoughts here and offer complementary ideas.

In short: build paper with open source.

xuanwo.io/links/2025/0...

10.03.2025 07:26 👍 8 🔁 2 💬 0 📌 0
Preview
Where are we now, system researchers? – Xiangpeng’s blog

Wrote a blog post reflecting my thoughts on DeepSeek, NSF funding and system research communities in general. Apologies for the bold claims -- hope they can invite some discussions.
blog.xiangpeng.systems/posts/system...

10.03.2025 04:49 👍 11 🔁 2 💬 2 📌 0

Compile to WASM is a very interesting idea! I think Fray at some point explored this a bit, not sure about the current status

22.02.2025 18:52 👍 0 🔁 0 💬 0 📌 0
shuttle - Rust Shuttle is a library for testing concurrent Rust code, heavily inspired by Loom.

Current approaches need to replace std locks with framework provided locks, like the ones in shuttle: docs.rs/shuttle/late...

I think binary instrumentation like the one in this paper is possible, but I'm not an expert on this. www.microsoft.com/en-us/resear...

22.02.2025 18:49 👍 0 🔁 0 💬 0 📌 0

I heard from Fray dev that it is getting a built-in interactive debugger, which visualizes what each threads is doing at a given moment, I can see it to be incredibly useful!

22.02.2025 18:44 👍 1 🔁 0 💬 0 📌 0
Preview
GitHub - awslabs/shuttle: Shuttle is a library for testing concurrent Rust code Shuttle is a library for testing concurrent Rust code - awslabs/shuttle

Yes, Loom and shuttle: github.com/awslabs/shut...

They are incredibly useful at identifying and reproducing bugs, but I find it quite hard to use them with a debugger, as lldb needs frequently jump to different stacks and I soon lost track of what's going on...

22.02.2025 18:42 👍 1 🔁 0 💬 0 📌 0

Checkout the underneath framework: github.com/cmu-pasta/fray
Looking forward to a future Rust support😉

22.02.2025 16:50 👍 9 🔁 0 💬 3 📌 0
Gemini API pricing  |  Google AI for Developers The Gemini API for developers offers a robust free tier and flexible pricing as you scale.

It uses Gemini free tier API to translate natural language to SQL: ai.google.dev/pricing#1_5f...

24.11.2024 20:01 👍 0 🔁 0 💬 0 📌 0