Simply applying basic linting rules (like don't compress pages where it doesn't help) reduces parquet files sizes by 5% and decreases decode time by 20%.
@xiangpeng.systems shows how in his latest blog
blog.xiangpeng.systems/posts/parque...
Simply applying basic linting rules (like don't compress pages where it doesn't help) reduces parquet files sizes by 5% and decreases decode time by 20%.
@xiangpeng.systems shows how in his latest blog
blog.xiangpeng.systems/posts/parque...
Thank you for sharing, glad to see many people think alike!
Stop building systems for agents, build systems for human.
We need infrastructures to help us holding accountability of agent's code.
blog.xiangpeng.systems/posts/stop-b...
After two years since publication, Bf-Tree is finally open-sourced github.com/microsoft/bf...
The GitHub repo just hit the Hacker News front page 🎉🎉🎉
Mitchell is the creator of the github.com/mosure/bevy_... render-pipeline plugin for Bevy, and he’ll dive into real-time radiance-field rendering, GPU data layouts, kernels, profiling, and more.
Appreciate the kind words!
Nice to see this getting shared! 🙌 Now I’m even more motivated to turn it into a full course.
Just like other big cities, Madison is getting its own systems talk series. Come join us!
LiquidCache a distributed pushdown cache for DataFusion, designed to cut down S3 requests for diskless databases.
💻 Code: github.com/XiangpengHao...
📄 Paper (VLDB 2026): github.com/XiangpengHao...
Thanks you for sharing! slides are here 👉 what-is-liquid-cache.xiangpeng.systems
Hey Tyler 👋 welcome back! I'd be happy to chat, I work in the data systems space (database + storage + cloud) from the same group that also studies storage fault!
Data-Aware Caching for Cloud Analytics
Join my PhD prelim talk next Monday:
Data-Aware Caching for Cloud Analytics
🕐 May 19, 1PM CDT
📍 CS2310 or Zoom: uwmadison.zoom.us/j/3081128886
My manifesto on optimizing SQL and DataFrames in query engines (including an explanation of why Apache DataFusion doesn't have a complex join ordering algorithm):
www.influxdata.com/blog/optimiz... www.influxdata.com/blog/optimiz...
New blog post: "Build your own S3-Select in 400 lines of Rust"
Check it out 😉: blog.xiangpeng.systems/posts/build-...
I submitted a PR that cuts average ClickBench latency by 15% for DataFusion! But reviewing it wasn't straightforward due to the nature of complex performance tuning dynamics, so I made a blog post to explain why it works -- check it out: blog.xiangpeng.systems/posts/parque...
We are excited to share Fray Debugger (aoli.al/blogs/deadlo...), an IntelliJ plugin that allows you to control concurrent execution deterministically!
We have translated the Deadlock Empire (deadlockempire.github.io) into Java to demonstrate how to use Fray Debugger.
Meanwhile, as a PhD student, I still feel frustrated comparing my systems to many ideas that seem novel but lack practical impact. That said, I find “feet on the ground, head in the clouds” research very inspiring -- it’s probably what keeps me motivated to stay in academia.
Thanks for the insightful points, Marc! I totally agree that academia is important in many areas. I'm planning a follow-up post discussing the kinds of research that are impactful and beneficial to people, and your examples strongly resonate with what I have in mind!
Thanks for sharing your perspective! It’s always helpful to hear insights from folks who’ve spent time in industry. There’s definitely room for academia to evolve, and I’m hopeful it will :)
@xiangpeng.systems shared a great post about system researchers. I wrote a comment on it and would like to share some thoughts here and offer complementary ideas.
In short: build paper with open source.
xuanwo.io/links/2025/0...
Wrote a blog post reflecting my thoughts on DeepSeek, NSF funding and system research communities in general. Apologies for the bold claims -- hope they can invite some discussions.
blog.xiangpeng.systems/posts/system...
Compile to WASM is a very interesting idea! I think Fray at some point explored this a bit, not sure about the current status
Current approaches need to replace std locks with framework provided locks, like the ones in shuttle: docs.rs/shuttle/late...
I think binary instrumentation like the one in this paper is possible, but I'm not an expert on this. www.microsoft.com/en-us/resear...
I heard from Fray dev that it is getting a built-in interactive debugger, which visualizes what each threads is doing at a given moment, I can see it to be incredibly useful!
Yes, Loom and shuttle: github.com/awslabs/shut...
They are incredibly useful at identifying and reproducing bugs, but I find it quite hard to use them with a debugger, as lldb needs frequently jump to different stacks and I soon lost track of what's going on...
Checkout the underneath framework: github.com/cmu-pasta/fray
Looking forward to a future Rust support😉
It uses Gemini free tier API to translate natural language to SQL: ai.google.dev/pricing#1_5f...