-
Notifications
You must be signed in to change notification settings - Fork 80
Add Verilator fast simulation #230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
We have to be careful where to place this, as its reset logic is very flaky with respect to event order. This should be fixed within axi_sim_mem itself.
It is possible to attempt booting Linux using the faster Verilator simulator. The idea is to bundle Linux image (instead of U-Boot) with OpenSBI and chain-load it directly, as suggested in #214. This requires patching in various places, as the load address of the flattened device tree (FDT) needs to be adjusted out of the way of the Linux payload (which is significantly larger than the U-Boot one). Furthermore, we also boost the UART baud rate from the standard 115200 to 921600 (8x) to speed up the booting process. Step 1: Patch ZSL In Step 2: Build ZSL
Step 3: Clone CVA6 SDK
Step 4: Patch OpenSBI In Step 5: Build OpenSBI+Linux Payload
Step 6: Dump Payload Binary
Step 7: Patch Device Tree (DTS) In Step 8: Build Device Tree
Step 9: Build Simulator Binary
Step 10: Run Simulation
Expected Output The simulator STDOUT includes UART output, which should look as follows (removing all the non-UART output):
On a Ryzen 9 9900X and running on 4 threads, this achieves a simulation rate of roughly 98 kHz, taking 170 seconds to progress through the zero-stage loader (ZSL) and OpenSBI until control is passed to Linux. Linux Behavior Then, control jumps to the Linux image at |
Further notes regarding booting Linux:
|
Hello! |
As far as I know, this is the IIC-OSIC-TOOLS Docker container. This is installed as |
It turned out I never actually posted my reply—it stayed in draft for two weeks 😅.
Yup, I tried that initially, but ran into unsupported compile options during the build. I thought it might be some kind of wrapper that helps Verilator build Cheshire. Eventually, I realized I had an old version of Verilator installed via However, I still couldn't compile Cheshire due to two problems:
The first issue was resolved by adding the But even after building DRAMSys, the project still wouldn't link correctly. As far as I understand, the DRAMSys library built by During simulation, I was able to see ZSL logs, but not the OpenSBI logs—even after updating the baudrate to
It's hard to debug this because I couldn't find a trace log file (I expected it in the |
29caece
to
ed71f20
Compare
I finally had some time to look into this, and got to the handoff between OpenSBI and the kernel. Not sure how long output is supposed to take from here with @MaxWipfli how did you diagnose your pagefault? Did you use tracing or attach OCD for debugging? Something else? |
If I recall correctly, the first line of kernel output (the "banner") appears quite quickly, within at most 5 minutes from U-Boot exiting (I think it's even less than a minute).
I used instruction tracing and wrote a dirty script to match the addresses with the symbols in However, I think even then it could still misbehave, although I am not sure about that. Basically, this is a game of trying to de-optimize in various ways until things work, sadly. |
Before I forget, here are the slides of my presentation on the work I did for this course project: cheshire_verilator_project.pdf |
Thanks for the info, that's very helpful 👍 I could reproduce the boot up to the SW IOTLB init, where it gets stuck. Sadly, while verilator/verilator#5350 is closed as completed, the problem still occurs with a current master build. So the next step is to exclude any possible issues on our side ( |
Okay, it's great that you could reproduce. I guess one thing that would be really nice (although it makes stuff slower) is to always run with a single thread only, and check what happens there. In fact, probably even remove |
To use this, run the following commands from the repository root.
Performance
I have benchmarked this simulation (using Verilator 5.034 and Clang 16.0.6) against VCS (2025.06) and VSIM (2025.1) on an AMD Ryzen 9 9900X machine with 64 GB RAM (
badile39.ee.ethz.ch
), achieving a speed-up of up to 6x (depending on the workload).Limitations
Sometimes, simulation misbehaves and either (a) locks up or (b) results in wrong behavior which eventually traps the core. This has been observed in particular while attempting to boot Linux.
I suspect it is caused by this issue: verilator/verilator#5350
It looks like this doesn't occur when only using a single thread, so passing
CHS_VERILATOR_THREADS=1
tomake
can be used to (mosty likely) avoid this issue.This bug seems very dependent on the exact RTL and this behavior can be triggered/changed/avoided using very minor RTL changes that seem entirely unrelated.
Also, the current flow assumes that the IIC-OSIC-TOOLS container is available as
oseda
, as is the case on ETH Zurich infrastructure. It is possible to use natively installed tools instead by settingVERILATOR_PREFIX
to an empty value intarget/sim/verilator/verilator.mk
.RTL Optimizations
I have performed a number of RTL optimizations to speed-up Verilator simulation (by around 75% combined according to my measurements), which are not yet fully upstreamed:
The
2 ** X
to1 << X
conversion will no longer be required from Verilator 5.040 as I fixed that upstream: verilator/verilator#6203These changes only affect simulation performance and not functionality (except maybe due to the Verilator issue described above).
Booting Linux
I attempted booting Linux using this simulator, which is described in a series of comments below. This only worked partially, due to wrong (and non-deterministic) behavior based on the Verilator issue described above. Thanks to @HepoH3 for their helpful comments on this.