Closing the gap, part 2: Probability and profitability
Welcome back to the second post in this series looking at how we can improve the performance of RISC-V code from LLVM.
One of the nice parts of #llvm is that often times you'll find yourself needing to do some sort of non-trivial analysis, but usually there's already a pass for it.
Here's how you can reuse a block frequency analysis to make a chess engine 7% faster on #riscv: lukelau.me/2026/01/26/c...
27.01.2026 13:49
π 12
π 1
π¬ 0
π 0
A title card with a photo of Mikhail and the same information, but adding 11:50am (in Santa Clara)
We're looking forward to the RISC-V Summit North America next week where Mikhail Gadelha (one of our compiler engineers) will be presenting "Unlocking 15% More Performance: A Case Study in LLVM Optimization for RISC-V". Be sure to catch his talk next Thurs
riscvsummit2025.sched.com/event/28OTp/...
17.10.2025 14:09
π 10
π 5
π¬ 0
π 0
Picture of a presenter showing a slide that details outcomes of RISE funded RISC-V software ecosystem projects.
I'm delighted to see two of @igalia.com's projects for RISE highlighted at the RISC-V Summit Europe.
Find out more about our work on both LLVM optimisation and testing/CI on the RISE blog (with more to come in the future!):
riseproject.dev/2025/05/08/p...
riseproject.dev/2024/10/15/w...
14.05.2025 10:50
π 6
π 3
π¬ 0
π 0
@camel-cdr.bsky.social rvv-bench is used here!
18.04.2025 10:33
π 5
π 0
π¬ 1
π 0
We're looking forward to EuroLLVM next week in Berlin. Be sure to check out talks from my colleague @lukel97.bsky.social and myself on:
* Work to further improve RISC-V vector codegen (extending the VL Optimizer), and
* Work done with the support of RISE to improve RISC-V LLVM testing.
12.04.2025 07:30
π 9
π 4
π¬ 0
π 0
FEX 2503 Tagged
Here we are again, another month and some more cool changes with FEX. Letβs dive in and see what has changed!
What if I told you 3DNow! square root recΓprocals are defined for negative numbers?... Also the amazing FEX 2503 is out. Read about some of my work and the work of other FEX maintainers' in the release notes: fex-emu.com/FEX-2503/ #fex #igalia #gaming #linux #arm64
06.03.2025 15:50
π 4
π 2
π¬ 1
π 0
ccache for LLVM builds across multiple directories
TL;DR: ccache base_dir saves the day
Some notes on ccache+LLVM. Summary: if you do a lot of builds across different checkouts/worktrees/builddirs, be sure to set the base_dir option and -DLLVM_USE_RELATIVE_PATHS_IN_DEBUG_INFO=ON muxup.com/2025q1/ccach...
27.02.2025 18:39
π 9
π 4
π¬ 0
π 0
Inside SiFiveβs P550 Microarchitecture
RISC-V is a relatively young and open source instruction set. So far, it has gained traction in microcontrollers and academic applications. For example, Nvidia replaced the Falcon microcontrollers β¦
Hello you fine Internet folks,
Today's article is on SiFive's P550 microarchitecture. The P550 core is one of the fastest RISC-V cores available currently and is claimed to be comparable to ARM's Cortex A75.
Hope y'all enjoy!
old.chipsandcheese.com/2025/01/26/i...
open.substack.com/pub/chipsand...
26.01.2025 22:14
π 12
π 5
π¬ 0
π 0
256 loads, since itβs an LMUL 8 load with VLEN=256! Iβm not sure how it compares to the scalar equivalent, but my guess is that the vlse8.v is loading one element at a time under the hood
11.12.2024 11:17
π 0
π 0
π¬ 0
π 0
A screenshot of a terminal:
luke@bananapif3:~/slowest-instr$ cat main.S
.section .rodata
str: .asciz "Cycles: %d\n"
foo: .zero 256 * STRIDE
.section .text
.global main
main:
addi sp, sp, -8
sd ra, 0(sp)
rdcycle s1
rdcycle s2
sub s3, s2, s1 # rdcycle overhead
la a0, foo
li a1, STRIDE
vsetvli t0, zero, e8, m8, tu, mu
rdcycle s1
vlse8.v v8, (a0), a1
rdcycle s2
sub s1, s2, s1
sub s1, s1, s3
la a0, str
mv a1, s1
call printf
ld ra, 0(sp)
addi sp, sp, 8
ret
luke@bananapif3:~/slowest-instr$ clang main.S -DSTRIDE=65536 -march=rv64gv
luke@bananapif3:~/slowest-instr$ perf stat -e cycles:u ./a.out
Cycles: 66640979
Performance counter stats for './a.out':
78,064,581 cycles:u
0.049648957 seconds time elapsed
0.000000000 seconds user
0.049907000 seconds sys
Trying to find the slowest possible RISC-V instruction. This single vlse8.v with a stride of 65536 bytes takes 66 million cycles on a Banana Pi F3. That's 0.04 seconds @1.6GHz
#risc-v
11.12.2024 09:40
π 23
π 5
π¬ 4
π 0
The maximum possible vl is 2^16 I think, so that would fit in XLEN=32?
06.12.2024 16:28
π 1
π 0
π¬ 1
π 0
With that said I forgot how confusing the V extension hierarchy can be. After thinking about about EEW=64 on XLEN=32 I think I need to go lie down a bit π΅βπ«
06.12.2024 16:21
π 2
π 0
π¬ 0
π 0
Otherwise EEW=64 is supported as usual, since thereβs also this bit at the bottom:
> The V extension requires the scalar processor implements the F and D extensions
06.12.2024 16:18
π 2
π 0
π¬ 0
π 0
Is it this bit here?
> The V extension supports all vector load and store instructions (Section Vector Loads and Stores), except the V extension
does not support EEW=64 for index values when XLEN=32.
Iβm interpreting that as index values I.e only indices passed to vluxei64.v and friends
06.12.2024 16:16
π 2
π 0
π¬ 3
π 0
Are you talking about zve32x? That doesnβt include any fp support, but zve32f should mandate f and zve64f should mandate d I think
06.12.2024 04:48
π 1
π 0
π¬ 1
π 0
'RVV mask tricks'
# broadcast nth bit
vmand.mm v8, in, mNth
vcpop.m t0, v8
sub t0, x0, t0
vmv.v.x v8, t0
# prefix xor
viota.m v8, in
vand.vi v8, v8, 1
vmsne.vi v8, v8, 0
vmor.mm v0, v8, in # can often be omitted
# move nth bit to first
vmand.mm v8, in, mNth
vcpop.m t0, v8
vmv.v.x v8, t0
vmsof.m v0, v8
# move mask to GPR
vmv.x.s t0, v0
# move GPR to mask
vmv.s.x v0, t0
# assuming vl<=64, set SEW=64 before
# these two should really be dedicated instructions
# shift mask up by 1
vslide1up.vx v8, in, x0
vsrl.vi v8, v8, 7
vmadd.vx v0, 2, v8
# shift mask up by 1
vslide1down.vx v8, in, x0
vadd.vv v0, in, in
vmacc.vx v0, 128, v8
Here are some slightly tricky RVV mask patterns.
03.12.2024 21:37
π 7
π 3
π¬ 1
π 0
Even better is being able to measure the numbers yourself without the need for vendor tables. RISC-V support for llvm-exegesis is landing soon IIUC, with RVV not too far behind either.
03.12.2024 03:02
π 4
π 0
π¬ 0
π 0
RVV benchmark
The RVV Agner Fog is camel-cdr.github.io/rvv-bench-re..., itβs an incredibly useful resource. We use it all the time for LLVM!
03.12.2024 00:52
π 3
π 0
π¬ 1
π 0