Skip to content

Conversation

MaxWipfli
Copy link

@MaxWipfli MaxWipfli commented Jun 20, 2025

To use this, run the following commands from the repository root.

make "$PWD/target/sim/verilator/cheshire_soc.vlt"
./target/sim/verilator/cheshire_soc.vlt +BINARY=/absolute/path/to/binary.elf

Performance

I have benchmarked this simulation (using Verilator 5.034 and Clang 16.0.6) against VCS (2025.06) and VSIM (2025.1) on an AMD Ryzen 9 9900X machine with 64 GB RAM (badile39.ee.ethz.ch), achieving a speed-up of up to 6x (depending on the workload).

Limitations

Sometimes, simulation misbehaves and either (a) locks up or (b) results in wrong behavior which eventually traps the core. This has been observed in particular while attempting to boot Linux.

I suspect it is caused by this issue: verilator/verilator#5350

It looks like this doesn't occur when only using a single thread, so passing CHS_VERILATOR_THREADS=1 to make can be used to (mosty likely) avoid this issue.

This bug seems very dependent on the exact RTL and this behavior can be triggered/changed/avoided using very minor RTL changes that seem entirely unrelated.

Also, the current flow assumes that the IIC-OSIC-TOOLS container is available as oseda, as is the case on ETH Zurich infrastructure. It is possible to use natively installed tools instead by setting VERILATOR_PREFIX to an empty value in target/sim/verilator/verilator.mk.

RTL Optimizations

I have performed a number of RTL optimizations to speed-up Verilator simulation (by around 75% combined according to my measurements), which are not yet fully upstreamed:

The 2 ** X to 1 << X conversion will no longer be required from Verilator 5.040 as I fixed that upstream: verilator/verilator#6203

These changes only affect simulation performance and not functionality (except maybe due to the Verilator issue described above).

Booting Linux

I attempted booting Linux using this simulator, which is described in a series of comments below. This only worked partially, due to wrong (and non-deterministic) behavior based on the Verilator issue described above. Thanks to @HepoH3 for their helpful comments on this.

@MaxWipfli
Copy link
Author

MaxWipfli commented Jul 11, 2025

It is possible to attempt booting Linux using the faster Verilator simulator.

The idea is to bundle Linux image (instead of U-Boot) with OpenSBI and chain-load it directly, as suggested in #214. This requires patching in various places, as the load address of the flattened device tree (FDT) needs to be adjusted out of the way of the Linux payload (which is significantly larger than the U-Boot one). Furthermore, we also boost the UART baud rate from the standard 115200 to 921600 (8x) to speed up the booting process.

Step 1: Patch ZSL

In sw/include/params.h, set __BOOT_BAUDRATE to 921600 and __BOOT_ZSL_DTB to 0x90000000.

Step 2: Build ZSL

make "$PWD"/sw/boot/zsl.rom.elf

Step 3: Clone CVA6 SDK

git submodule update --init --recursive sw/deps/cva6-sdk

(documentation)

Step 4: Patch OpenSBI

In sw/deps/cva6-sdk/platform/generic/config.mk, set FW_JUMP_FDT_ADDR=0x90000000.

Step 5: Build OpenSBI+Linux Payload

make -C sw/deps/cva6-sdk spike_payload

Step 6: Dump Payload Binary

riscv64-unknown-elf-objcopy -O binary --set-start=0x80000000 sw/deps/cva6-sdk/install64/spike_fw_payload.elf spike_fw_payload.bin

Step 7: Patch Device Tree (DTS)

In sw/boot/cheshire.dtsi, update the baud rate (s/115200/921600/) and clock frequency from 50 MHz to 200 MHz (s/50000000/200000000/).

Step 8: Build Device Tree

dtc -I dts -O dtb -o cheshire.dtb sw/boot/cheshire.dtsi

Step 9: Build Simulator Binary

rm -rf target/sim/verilator/obj_dir
make "$PWD"/target/sim/verilator/cheshire_soc.vlt CHS_VERILATOR_UART_BAUD=921600

Step 10: Run Simulation

./target/sim/verilator/cheshire_soc.vlt \
    +BINARY="$PWD"/sw/boot/zsl.rom.elf \
    +FW_PAYLOAD="$PWD"/spike_fw_payload.bin \
    +FW_DTB="$PWD"/cheshire.dtb

Expected Output

The simulator STDOUT includes UART output, which should look as follows (removing all the non-UART output):

[UART]  /\___/\       Boot mode:       0
[UART] ( o   o )      Real-time clock: 32768 Hz
[UART] (  =^=  )      System clock:    200078887 Hz
[UART] (        )     Read global ptr: 0x00000000
[UART] (    P    )    Read pointer:    0x10000000
[UART] (  U # L   )   Read argument:   0x00000000
[UART] (    P      )
[UART] (           ))))))))))
[UART] 
[UART] [ZSL] Launch firmware at 80000000 with device tree at 90000000
[UART] 
[UART] OpenSBI v0.9
[UART]    ____                    _____ ____ _____
[UART]   / __ \                  / ____|  _ \_   _|
[UART]  | |  | |_ __   ___ _ __ | (___ | |_) || |
[UART]  | |  | | '_ \ / _ \ '_ \ \___ \|  _ < | |
[UART]  | |__| | |_) |  __/ | | |____) | |_) || |_
[UART]   \____/| .__/ \___|_| |_|_____/|____/_____|
[UART]         | |
[UART]         |_|
[UART] 
[UART] Platform Name             : eth,cheshire
[UART] Platform Features         : medeleg
[UART] Platform HART Count       : 1
[UART] Platform IPI Device       : aclint-mswi
[UART] Platform Timer Device     : aclint-mtimer @ 1000000Hz
[UART] Platform Console Device   : uart8250
[UART] Platform HSM Device       : ---
[UART] Platform Reboot Device    : ---
[UART] Platform Shutdown Device  : ---
[UART] Firmware Base             : 0x80000000
[UART] Firmware Size             : 248 KB
[UART] Runtime SBI Version       : 0.3
[UART] 
[UART] Domain0 Name              : root
[UART] Domain0 Boot HART         : 0
[UART] Domain0 HARTs             : 0*
[UART] Domain0 Region00          : 0x0000000002040000-0x000000000207ffff (I)
[UART] Domain0 Region01          : 0x0000000080000000-0x000000008003ffff ()
[UART] Domain0 Region02          : 0x0000000000000000-0xffffffffffffffff (R,W,X)
[UART] Domain0 Next Address      : 0x0000000080200000
[UART] Domain0 Next Arg1         : 0x0000000090000000
[UART] Domain0 Next Mode         : S-mode
[UART] Domain0 SysReset          : yes
[UART] 
[UART] Boot HART ID              : 0
[UART] Boot HART Domain          : root
[UART] Boot HART ISA             : rv64imafdcsuh
[UART] Boot HART Features        : scounteren,mcounteren,mcountinhibit
[UART] Boot HART PMP Count       : 0
[UART] Boot HART PMP Granularity : 0
[UART] Boot HART PMP Address Bits: 0
[UART] Boot HART MHPM Count      : 6
[UART] Boot HART MIDELEG         : 0x0000000000000666
[UART] Boot HART MEDELEG         : 0x0000000000f0b509

On a Ryzen 9 9900X and running on 4 threads, this achieves a simulation rate of roughly 98 kHz, taking 170 seconds to progress through the zero-stage loader (ZSL) and OpenSBI until control is passed to Linux.

Linux Behavior

Then, control jumps to the Linux image at 0x80200000, which starts booting but later fails with a fatal page fault while parsing the FDT. It is unclear why this happens, but it may be due to issues Verilator has with packed structs (see verilator/verilator#5350).

@MaxWipfli
Copy link
Author

Further notes regarding booting Linux:

  1. I had to disable BR2_PACKAGE_RAMSPEED in sw/deps/cva6-sdk/buildroot64_defconfig, as the upstream URL of that package seems to be unreachable.
  2. I had to change earlyprintk to earlycon in the kernel command line in sw/deps/cva6-sdk/linux64_defconfig to obtain early serial output. earlyprintk is an architecture-specific feature not supported on RISC-V.

@HepoH3
Copy link
Contributor

HepoH3 commented Jul 23, 2025

Hello!
I tried to reproduce your steps, but I'm facing an issue with the missing oseda tool. Is there a guide on how to install it?
A web search gives me ambiguous results about what it actually is.

@MaxWipfli
Copy link
Author

Hello! I tried to reproduce your steps, but I'm facing an issue with the missing oseda tool. Is there a guide on how to install it? A web search gives me ambiguous results about what it actually is.

As far as I know, this is the IIC-OSIC-TOOLS Docker container. This is installed as oseda on our infrastructure at ETH Zürich. However, it is probably easier for you to set VERILATOR_PREFIX to an empty value (in target/sim/verilator/verilator.mk) and install Verilator natively on your system. FYI, we are using Verilator v5.036.

@HepoH3
Copy link
Contributor

HepoH3 commented Aug 4, 2025

It turned out I never actually posted my reply—it stayed in draft for two weeks 😅.

it is probably easier for you to set VERILATOR_PREFIX to an empty value

Yup, I tried that initially, but ran into unsupported compile options during the build. I thought it might be some kind of wrapper that helps Verilator build Cheshire. Eventually, I realized I had an old version of Verilator installed via apt (v4.038). Cloning and building Verilator v5.036 solved that issue.

However, I still couldn't compile Cheshire due to two problems:

  1. Missing optional template in the std namespace during compilation.
  2. Missing DRAMSys library during linking.

The first issue was resolved by adding the -std=c++17 flag. The second was fixed by building the DRAMSys library. To do that before compilation, I added the chs-dramsys-all target as the first prerequisite of $(CHS_ROOT)/target/sim/verilator/cheshire_soc.vlt. After building the library, it also needs to be added to LD_LIBRARY_PATH (CHS_ROOT/target/sim/dramsys/build/lib).

But even after building DRAMSys, the project still wouldn't link correctly. As far as I understand, the DRAMSys library built by chs-dramsys-all requires an sc_main function (while main.cpp uses the standard main). I added a dummy sc_main function to the file, and after that, the simulation finally started.

During simulation, I was able to see ZSL logs, but not the OpenSBI logs—even after updating the baudrate to 921600 in the device tree:

[ELF] preload complete
[ELF] starting execution, entry point 0x10000000
Mem64Master: emptied write queue
[UART]  /\___/\       Boot mode:       0
[UART] ( o   o )      Real-time clock: 32768 Hz
[UART] (  =^=  )      System clock:    199947815 Hz
[UART] (        )     Read global ptr: 0x00000000
[UART] (    P    )    Read pointer:    0x10000000
[UART] (  U # L   )   Read argument:   0x00000000
[UART] (    P      )
[UART] (           ))))))))))
[UART] 
[UART] [ZSL] Launch firmware at 80000000 with device tree at 82200000

It's hard to debug this because I couldn't find a trace log file (I expected it in the logs directory, but it's empty). I tried uncommenting the "Tracing" section in verilator.mk, but as noted there, it only enables VCD logging.

@paulsc96 paulsc96 mentioned this pull request Aug 5, 2025
11 tasks
@paulsc96 paulsc96 linked an issue Aug 5, 2025 that may be closed by this pull request
@MaxWipfli MaxWipfli force-pushed the verilator branch 2 times, most recently from 29caece to ed71f20 Compare August 13, 2025 13:12
@paulsc96 paulsc96 added this to the v0.4.0 milestone Sep 30, 2025
@paulsc96
Copy link
Member

paulsc96 commented Sep 30, 2025

I finally had some time to look into this, and got to the handoff between OpenSBI and the kernel. Not sure how long output is supposed to take from here with earlycon.

@MaxWipfli how did you diagnose your pagefault? Did you use tracing or attach OCD for debugging? Something else?

@MaxWipfli
Copy link
Author

I finally had some time to look into this, and got to the handoff between OpenSBI and the kernel. Not sure how long output is supposed to take from here with earlycon.

If I recall correctly, the first line of kernel output (the "banner") appears quite quickly, within at most 5 minutes from U-Boot exiting (I think it's even less than a minute).

how did you diagnose your pagefault? Did you use tracing or attach OCD for debugging? Something else?

I used instruction tracing and wrote a dirty script to match the addresses with the symbols in vmlinux. Without instruction tracing, I noticed that the weird Verilator bugs seem to occur more often. I may have also had other kinds of tracing enabled (e.g. printing all the transactions on the DRAM's AR/AW channel). You can also disable multi-threading and enable VCD dumping, which kind of "de-optimizes" things even more.

However, I think even then it could still misbehave, although I am not sure about that. Basically, this is a game of trying to de-optimize in various ways until things work, sadly.

@MaxWipfli
Copy link
Author

Before I forget, here are the slides of my presentation on the work I did for this course project: cheshire_verilator_project.pdf
These mostly focus on the performance aspect, though.

@paulsc96
Copy link
Member

paulsc96 commented Oct 6, 2025

Thanks for the info, that's very helpful 👍

I could reproduce the boot up to the SW IOTLB init, where it gets stuck. Sadly, while verilator/verilator#5350 is closed as completed, the problem still occurs with a current master build.

So the next step is to exclude any possible issues on our side (x-related, testbench memory model, etc.) and then really dig in to find the root cause.

@MaxWipfli
Copy link
Author

Thanks for the info, that's very helpful 👍

I could reproduce the boot up to the SW IOTLB init, where it gets stuck. Sadly, while verilator/verilator#5350 is closed as completed, the problem still occurs with a current master build.

So the next step is to exclude any possible issues on our side (x-related, testbench memory model, etc.) and then really dig in to find the root cause.

Okay, it's great that you could reproduce. I guess one thing that would be really nice (although it makes stuff slower) is to always run with a single thread only, and check what happens there.

In fact, probably even remove --threads 1 completely (which creates thread-safe code but only runs it single-threaded). This should probably get rid of the non-determinism, but if I recall correctly things were still buggy even like this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Verilator Setup

3 participants