Inside Huskyβs query engine: Real-time access to 100 trillion events | Datadog
See how Husky enables interactive querying across 100 trillion events daily by combining caching, smart indexing, and query pruning.
It's interesting how the Elasticsearch and Datadog (www.datadoghq.com/blog/enginee...) approaches to wildcard search differ. Both use n-gram indexes, but with different strategies to contain storage amplification. Datadog hashes 4-grams while ES aggressively normalizes 3-grams.
13.10.2025 08:27
π 1
π 0
π¬ 0
π 0
Lucene CombinedOrHighMed queries/sec
Ge Song merged a good ~15% speedup for BM25F queries in Lucene benchmarks.mikemccandless.com/CombinedOrHi... (last data point) github.com/apache/lucen...
12.10.2025 07:15
π 3
π 0
π¬ 0
π 0
BM25F from scratch
BM25 run across multiple fields isnβt as simple as summing a bunch of field-level BM25 scores.
BM25F is an adjustment to BM25 that accounts for multiple fields, beating out naive summing of BM25 scores
softwaredoug.com/blog/2025/09...
18.09.2025 17:33
π 2
π 1
π¬ 0
π 0
Good question, I don't have a good intuition for how much comes from SIMD vs. the rest. I could run benchmarks by disabling SIMD instructions.
14.09.2025 11:36
π 2
π 0
π¬ 0
π 0
This is the case here: filtered term queries get a slowdown but they are still pretty fast, while harder queries like OrHighHigh (top-level disjunctive query) or FilteredOrHighHigh (likewise but with a filter) get faster.
11.09.2025 13:57
π 0
π 0
π¬ 0
π 0
In case you wonder how Lucene decides whether a change is good when there is a mix of speedups and slowdowns, the general rule is that it's better to make the slow queries faster and the fast queries slower than the other way around as this results in lower tail latencies.
11.09.2025 13:57
π 0
π 0
π¬ 1
π 0
Lucene just bumped the block size of its postings lists from 128 to 256. This gave very good speedups (up to 45%) to most queries, and up to 10-15% slowdowns to filtered term queries. benchmarks.mikemccandless.com/2025.09.10.1...
11.09.2025 13:46
π 1
π 0
π¬ 1
π 0
I just ran the Tantivy benchmark (tantivy-search.github.io/bench/) on Lucene 10.2 vs a Lucene 10.3 snapshot build. Lucene 10.2 already performed very well, but Lucene 10.3 is on another level. Very exciting.
30.08.2025 20:12
π 1
π 0
π¬ 0
π 0
A: The "if score >= minRequiredScore" branch is hard to predict, so this loop is expensive. This small refactoring helped the JVM take advantage of the cmov (conditional move) instruction and make this code branchless.
09.07.2025 06:31
π 0
π 0
π¬ 1
π 0
This small change yielded a ~5% speedup on several queries of Lucene's nightly benchmarks (see last data point at benchmarks.mikemccandless.com/OrStopWords....). Can you guess why?
09.07.2025 06:24
π 2
π 0
π¬ 1
π 0
Lucene BooleanQuery (OR, high freq, medium freq term) queries/sec
Last month, Lucene changed query evaluation to work in a more term-at-a-time fashion within small-ish windows of doc IDs. This yielded a good speedup on its own (annotation IL benchmarks.mikemccandless.com/OrHighMed.html).
04.07.2025 11:28
π 0
π 0
π¬ 1
π 0
Lucene is getting an increasing number of high-quality contributions from ByteDance employees, especially around performance. Good to see that this project keeps attracting contributors from all around the world.
26.06.2025 15:40
π 1
π 0
π¬ 0
π 0
Thank you Alex
26.06.2025 15:39
π 0
π 0
π¬ 0
π 0
Not that I know of. But we need this kind of benchmark indeed.
26.06.2025 11:17
π 1
π 0
π¬ 0
π 0
Another common point I did not expect: Vespa's strict vs. unstrict iterators is quite similar to Lucene's two-phase iteration. And both projects use this feature to effectively combine dynamic pruning with filtering (a hard and underappreciated problem IMO).
25.06.2025 12:53
π 0
π 0
π¬ 0
π 0
Andrei Dan kindly captured pictures of Luca and I telling the story of how the Lucene 10 release went
16.06.2025 13:43
π 1
π 0
π¬ 0
π 0
Lucene FilteredOrHighMed queries/sec
Via @rmuir.org : Linux 6.15 introduced a big speedup for Lucene on AMD processors benchmarks.mikemccandless.com/FilteredOrHi... (last data point, not annotated yet) thanks to faster TLB invalidation www.phoronix.com/review/amd-i...
16.06.2025 13:34
π 1
π 0
π¬ 0
π 0
Uwe now explains how Lucene takes advantage of the Panama foreign memory and vector support in spite of the fact that these features are still preview/incubating in the JDK
16.06.2025 10:21
π 3
π 0
π¬ 0
π 0
Uwe Schindler gives a short history of Apache Lucene at #bbuzzz
16.06.2025 10:08
π 1
π 0
π¬ 0
π 0