Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

syz-fuzzer: add NULL check in supported features #4763

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

alessandrocarminati
Copy link

Kernel supported features are detected using debugfs. However, if the filesystem is not mounted, syz-fuzzer panics without providing any clues as to why.

2024/05/04 10:12:49 connecting to manager...
2024/05/04 10:12:49 fuzzer vm-1 connected
2024/05/04 10:12:49 checking machine...
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x6019a8]

goroutine 1 [running]:
main.main()
        /home/alessandro/go/src/syzkaller/syz-fuzzer/fuzzer.go:169 +0x958
debug1: client_input_channel_req: channel 0 rtype exit-status reply 0
debug1: channel 0: free: client-session, nchannels 2
debug1: channel 1: free: 127.0.0.1, nchannels 1
Transferred: sent 5180, received 6532 bytes, in 2.5 seconds
Bytes per second: sent 2097.7, received 2645.2
debug1: Exit status 2

This simple patch prevents syz-fuzzer from crashing allowing it to terminate cleanly, while provides a possible cause why this issue is occurring.

2024/05/04 10:15:14 connecting to manager...
2024/05/04 10:15:14 fuzzer vm-1 connected
2024/05/04 10:15:14 checking machine...
2024/05/04 10:15:14 SYZFATAL: The currently running kernel image seems not to support any required feature, have you forgotten to mount debugfs?
debug1: client_input_channel_req: channel 0 rtype exit-status reply 0
debug1: channel 0: free: client-session, nchannels 2
debug1: channel 1: free: 127.0.0.1, nchannels 1
Transferred: sent 5160, received 5016 bytes, in 2.4 seconds
Bytes per second: sent 2106.7, received 2047.9
debug1: Exit status 1

Kernel supported features are detected using debugfs.
However, if the filesystem is not mounted, `syz-fuzzer` panics without
providing any clues as to why.

```
2024/05/04 10:12:49 connecting to manager...
2024/05/04 10:12:49 fuzzer vm-1 connected
2024/05/04 10:12:49 checking machine...
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x6019a8]

goroutine 1 [running]:
main.main()
        /home/alessandro/go/src/syzkaller/syz-fuzzer/fuzzer.go:169 +0x958
debug1: client_input_channel_req: channel 0 rtype exit-status reply 0
debug1: channel 0: free: client-session, nchannels 2
debug1: channel 1: free: 127.0.0.1, nchannels 1
Transferred: sent 5180, received 6532 bytes, in 2.5 seconds
Bytes per second: sent 2097.7, received 2645.2
debug1: Exit status 2
```

This simple patch prevents `syz-fuzzer` from crashing and allows it to
terminate cleanly, while provides a possible cause why this issue is
occurring.

```
2024/05/04 10:15:14 connecting to manager...
2024/05/04 10:15:14 fuzzer vm-1 connected
2024/05/04 10:15:14 checking machine...
2024/05/04 10:15:14 SYZFATAL: The currently running kernel image seems not to support any required feature, have you forgotten to mount debugfs?
debug1: client_input_channel_req: channel 0 rtype exit-status reply 0
debug1: channel 0: free: client-session, nchannels 2
debug1: channel 1: free: 127.0.0.1, nchannels 1
Transferred: sent 5160, received 5016 bytes, in 2.4 seconds
Bytes per second: sent 2106.7, received 2047.9
debug1: Exit status 1
```

Signed-off-by: Alessandro Carminati <[email protected]>
@dvyukov
Copy link
Collaborator

dvyukov commented May 6, 2024

Hi Alessandro,

Features are not returned as nil when debugfs is not mounted, the only case I see where we return nil features is this:

return nil, fmt.Errorf("enabling FeatureCoverage requires enabling ExecutorUsesShmem")

Are you trying to enable coverage for OS that does not use shmem for executor communication?
We should disable coverage feature in this case with the explanation, or detect this setting earlier then syz-manager starts.

@alessandrocarminati
Copy link
Author

I hope you will forgive my unconventional debugging approach.
Additionally, please bear in mind my limited knowledge of the code, and the tool in general.
I recently encountered a syzkaller kernel bug in my backlog, and to reproduce it, I decided to delve into syzkaller from scratch.

My initial step was to set up a Linux environment and a filesystem for learning purposes.
Here are the specifications of my testing environment:

  • Architecture: host: x86_64, target: aarch64
  • Default buildroot-2024.02.1 rootfs
  • Linux-6.8.9 built for aarch64, using the default configuration along with suggestions. I also introduced a simple bug to ensure timely results.

My initial syzkaller run resulted in a panic.

2024/05/04 10:12:49 connecting to manager...
2024/05/04 10:12:49 fuzzer vm-1 connected
2024/05/04 10:12:49 checking machine...
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x6019a8]

goroutine 1 [running]:
main.main()
        /home/alessandro/go/src/syzkaller/syz-fuzzer/fuzzer.go:169 +0x958

By inserting print debugs in various locations, similar to what is demonstrated here:

diff --git a/syz-fuzzer/fuzzer.go b/syz-fuzzer/fuzzer.go
index 15b27bb86..e7dd2640a 100644
--- a/syz-fuzzer/fuzzer.go
+++ b/syz-fuzzer/fuzzer.go
@@ -147,16 +147,23 @@ func main() {
        }
        var checkReq *rpctype.CheckArgs
        if r.Features == nil {
+               fmt.Println("r.Features == nil")
                checkArgs.featureFlags = featureFlags
                checkReq, err = checkMachine(checkArgs)
                if err != nil {
+                       fmt.Println("    r.Features == nil && err != nil")
+                       fmt.Printf("   checkReq=%v, errstr=\"%s\"\n", checkReq,err.Error())
                        if checkReq == nil {
+                               fmt.Println("        r.Features == nil && err != nil && checkReq == nil")
                                checkReq = new(rpctype.CheckArgs)
+                               fmt.Printf("        checkReq =%v\n", checkReq)
                        }
                        checkReq.Error = err.Error()
+                       fmt.Printf("   checkReq.Erro=\"%s\"\n", err.Error())
                }
                r.Features = checkReq.Features
        } else {
+               fmt.Println("r.Features != nil")
                if err = host.Setup(target, r.Features, featureFlags, executor); err != nil {
                        log.SyzFatal(err)
                }

I observed that:

2024/05/06 07:06:10 connecting to manager...
r.Features == nil
2024/05/06 07:06:10 checking machine...
    r.Features == nil && err != nil
   checkReq=<nil>, errstr="coverage is not supported (CONFIG_KCOV is not enabled)"
        r.Features == nil && err != nil && checkReq == nil
        checkReq =&{  <nil> map[] []}
   checkReq.Erro="coverage is not supported (CONFIG_KCOV is not enabled)"

Since I was confident that my features were enabled int the kernel, I investigated how syzkaller detected these features. My conclusion was that syzkaller couldn't identify the enabled features because debugfs wasn't mounted.

Creating a new buildroot image with debugfs mounted prevented the panic.
Consequently, I inferred that syzkaller expected debugfs to be mounted.

Have I overlooked anything significant in my analysis?

@dvyukov
Copy link
Collaborator

dvyukov commented May 6, 2024

Yes, use of KCOV requires DEBUGFS, but syzkaller shouldn't crash if DEBUGFS is not mounted.

I see, we get nil r.Features if we get here:

checkReq = new(rpctype.CheckArgs)

and it crashes when printing here:

for _, feat := range r.Features.Supported() {

We probably could remove printing entirely, but I guess it will still crash here:

gateCb := fuzzerTool.useBugFrames(r.Features, r.MemoryLeakFrames, r.DataRaceFrames)

This part of the code is currently in flux and will need to change more significantly, but a reasonable local fix may be to do this:

checkReq = &rpctype.CheckArgs{
	Features: new(host.Features),
}

here:
https://github.com/google/syzkaller/blob/610f2a54d02f8cf4f2454c03bf679b602e6e59b6/syz-fuzzer/fuzzer.go#L154C5-L154C38

then we shouldn't crash and pass the error to the manager to print to user.

@alessandrocarminati
Copy link
Author

I understand.
Since debugfs mounting is not a requirement, my initial assumption may not be accurate.
Furthermore, I've confirmed that simply disabling the feature print does not resolve the panic issue.
I've tested it myself, and the crash still occurs shortly after.

Given this, should I amend my PR with a more suitable solution, such as testing the one you proposed, or would it be more appropriate to close this PR altogether?

What are your thoughts?

@dvyukov
Copy link
Collaborator

dvyukov commented May 6, 2024

Whatever you prefer.
A fix that prevents r.Features from being nil in the first place looks reasonable to merge.

@alessandrocarminati
Copy link
Author

Please don't misunderstand me; I'm only asking because I see this as an opportunity for me to delve deeper into this code.

I've tested the suggestion from your previous message, and here are the results:

2024/05/06 08:23:11 connecting to manager...
2024/05/06 08:23:11 checking machine...
2024/05/06 08:23:11 : 
2024/05/06 08:23:11 : 
2024/05/06 08:23:11 : 
2024/05/06 08:23:11 : 
2024/05/06 08:23:11 : 
2024/05/06 08:23:11 : 
2024/05/06 08:23:11 : 
2024/05/06 08:23:11 : 
2024/05/06 08:23:11 : 
2024/05/06 08:23:11 : 
2024/05/06 08:23:11 : 
2024/05/06 08:23:11 : 
2024/05/06 08:23:11 : 
2024/05/06 08:23:11 : 
2024/05/06 08:23:11 : 
2024/05/06 08:23:11 : 
2024/05/06 08:23:11 : 
2024/05/06 08:23:11 : 
2024/05/06 08:23:11 : 
2024/05/06 08:23:11 starting 1 executor processes
2024/05/06 08:23:11 machine check failed: coverage is not supported (CONFIG_KCOV is not enabled)
2024/05/06 08:23:11 SYZFATAL: Manager.Check call failed: machine check failed: coverage is not supported (CONFIG_KCOV is not enabled)

Indeed, it doesn't panic, which is a positive outcome.

However, it provides a message that might confuse the user:

  • Coverage is not supported (CONFIG_KCOV is not enabled), which is not accurate; In this scenario, CONFIG_KCOV is enabled, and the user is simply using an image that doesn't mount debugfs.

Moreover, the final message the user receives indicates that syzkaller exited because the KCOV feature isn't enabled in the kernel, suggesting it's a requirement.

2024/05/06 08:23:11 SYZFATAL: Manager.Check call failed: machine check failed: coverage is not supported (CONFIG_KCOV is not enabled)

However, based on our interaction, I understood KCOV isn't mandatory for syzkaller to function.
Looking at syz-manager/rpc.go, where the error is generated, it appears that it considers any rpctype.CheckArgs reported error as fatal, is this the intended behavior?

@dvyukov
Copy link
Collaborator

dvyukov commented May 6, 2024

If you want better error message, then it should be done here:

func checkCoverage() string {
if reason := checkDebugFS(); reason != "" {
return reason
}
if !osutil.IsExist("/sys/kernel/debug/kcov") {
return "CONFIG_KCOV is not enabled"
}
if err := osutil.IsAccessible("/sys/kernel/debug/kcov"); err != nil {
return err.Error()
}
return ""
}

I see it already checks for debugfs, so perhaps this function needs improvement:

func checkDebugFS() string {
if err := osutil.IsAccessible("/sys/kernel/debug"); err != nil {
return "debugfs is not enabled or not mounted"
}
return ""
}

Using if suppFeatures == nil { to report this error is wrong, it may be nil for other reasons, and it may be not nil when debugfs is not mounted.

@alessandrocarminati
Copy link
Author

Hello again,
I've taken a bit of time to reflect on the issue that I still have doubts about.
In the fuzzer, there's this nice check that appears to be a placeholder or preparation for something that isn't there yet.
In fact, if you look at the checkMachine() function, you'll notice that if it succeeds, it returns a non-null string and a null error; if it fails, it returns a null string and a non-null error.
Currently, there's no case where both are non-null.
Another aspect that gives me pause for thought is the design of checkMachine().
It's structured in a way that if just one of the features is missing, it gives up without checking the others, and KCOV is the first check.
This suggests to me that the designer might have considered the features mandatory... Probably I'm overthinking.

In any case, if you manually provide an empty feature object as I interpret your suggestion in this thread, the fuzzer doesn't crash but eventually terminates, stating that KCOV is not enabled.
So, assuming syzkaller, as you said, should work without KCOV, I attempted to reverse engineer the piece of code responsible for the premature termination.
It seems that checkReq plays a crucial role, as the manager appears to inspect the error string field in there and terminate execution if anything other than an empty string is found.

In an effort to force execution without KCOV detection, I removed the code that populates the error field in checkReq. However, subsequent code still attempted to utilize KCOV and terminated VM execution upon failure.

syscalls                : 12/4407
                        :
                        :
                        :
                        :
                        :
                        :
                        :
                        :
                        :
                        :
                        :
                        :
                        :
                        :
                        :
                        :
                        :
                        :
                        :

2024/05/07 16:03:58 corpus                  : 44 (0 broken, 0 seeds)
SYZFAIL: open of /sys/kernel/debug/kcov failed
 (errno 2: No such file or directory)
2024/05/07 16:03:57 fuzzer detected executor failure='executor 0: EOF
', retrying #1
SYZFAIL: open of /sys/kernel/debug/kcov failed
 (errno 2: No such file or directory)
2024/05/07 16:03:57 fuzzer detected executor failure='executor 0: EOF
', retrying #2
SYZFAIL: open of /sys/kernel/debug/kcov failed
 (errno 2: No such file or directory)
2024/05/07 16:03:58 fuzzer detected executor failure='executor 0: EOF
', retrying #3
SYZFAIL: open of /sys/kernel/debug/kcov failed
 (errno 2: No such file or directory)
2024/05/07 16:03:58 fuzzer detected executor failure='executor 0: EOF
', retrying #4
SYZFAIL: open of /sys/kernel/debug/kcov failed
 (errno 2: No such file or directory)
2024/05/07 16:03:58 fuzzer detected executor failure='executor 0: EOF
', retrying #5
SYZFAIL: open of /sys/kernel/debug/kcov failed
 (errno 2: No such file or directory)
2024/05/07 16:03:58 fuzzer detected executor failure='executor 0: EOF
', retrying #6
SYZFAIL: open of /sys/kernel/debug/kcov failed
 (errno 2: No such file or directory)
2024/05/07 16:03:58 fuzzer detected executor failure='executor 0: EOF
', retrying #7
SYZFAIL: open of /sys/kernel/debug/kcov failed
 (errno 2: No such file or directory)
2024/05/07 16:03:58 fuzzer detected executor failure='executor 0: EOF
', retrying #8
SYZFAIL: open of /sys/kernel/debug/kcov failed
 (errno 2: No such file or directory)
2024/05/07 16:03:59 fuzzer detected executor failure='executor 0: EOF
', retrying #9
SYZFAIL: open of /sys/kernel/debug/kcov failed
 (errno 2: No such file or directory)
2024/05/07 16:03:59 fuzzer detected executor failure='executor 0: EOF
', retrying #10
SYZFAIL: open of /sys/kernel/debug/kcov failed
 (errno 2: No such file or directory)
2024/05/07 16:03:59 fuzzer detected executor failure='executor 0: EOF
', retrying #11
SYZFAIL: open of /sys/kernel/debug/kcov failed
 (errno 2: No such file or directory)
2024/05/07 16:03:59 fuzzer detected executor failure='executor 0: EOF
', retrying #12
2024/05/07 16:03:59 SYZFATAL: executor 0 failed 11 times: executor 0: EOF

debug1: client_input_channel_req: channel 0 rtype exit-status reply 0
debug1: channel 0: free: client-session, nchannels 2
debug1: channel 1: free: 127.0.0.1, nchannels 1
Transferred: sent 6892, received 26440 bytes, in 17.6 seconds
Bytes per second: sent 392.6, received 1506.3
debug1: Exit status 1

I might be about to say something very wrong, so please forgive me for that, but seems like this code is written assuming KCOV.

In the meantime, I've rewritten the checkDebugFS() function to make it more robust, if you consider this worthy, I can open another PR for this.

index 393617e45..ffd7d66d3 100644
--- a/pkg/host/features_linux.go
+++ b/pkg/host/features_linux.go
@@ -244,10 +244,14 @@ func checkVhciInjection() string {
 }
 
 func checkDebugFS() string {
-       if err := osutil.IsAccessible("/sys/kernel/debug"); err != nil {
-               return "debugfs is not enabled or not mounted"
-       }
-       return ""
+        var stat syscall.Statfs_t
+        if err := syscall.Statfs("/sys/kernel/debug", &stat); err != nil {
+                return err.Error()
+        }
+        if stat.Type != 0x64626720 { //DEBUGFS_MAGIC
+                return "debugfs not mounted"
+        }
+        return ""
 }
 
 func checkKCSAN() string {

Now is able to detect if debugfs is mounted more accurately, but for the overall issue it makes no difference, since having debugfs mounted or not wasn't apparently a blocking issue.
Any thoughts?

@dvyukov
Copy link
Collaborator

dvyukov commented May 15, 2024

In the meantime, I've rewritten the checkDebugFS() function to make it more robust, if you consider this worthy, I can open another PR for this.

As I mentioned this part is being changed almost completely. I finally got it to PR stage: #4790.
Overall better diagnostics for end users are good, but the change will need to fit into the new architecture.

@alessandrocarminati
Copy link
Author

Hello,

I took a quick look at the changes you made to the codebase, and it has indeed changed significantly since the last time I reviewed it. Given these updates, my previous proposals are no longer relevant, so please feel free to close this PR as it is no longer applicable.

Allow me to make one final remark:
In my test rootfs, where debugfs is not mounted, the code no longer crashes and terminates cleanly with the following statement:

2024/05/17 07:41:39 [FATAL] check failed: coverage is not supported: only 0 calls are executed out of 3

Although it is possible to have kcov missing while debugfs is mounted, accessing it from userspace requires debugfs to be mounted. Since kcov is necessary, it is the user's responsibility to provide it.
I agree that it is not strictly necessary to check whether debugfs is mounted or not.
Does it make sense to you to introduce code that tries to mount debugfs if not mounted, or we can just conclude a smart user can figure out what the problem is, and help himself?

@dvyukov
Copy link
Collaborator

dvyukov commented May 21, 2024

I think we could try to mount debugfs during setup procedure in executor in setup_features. But we need to mount it before any features are checked, not just coverage, because debugfs is used by a number of features. We shouldn't fail if the mount fails and instead rely on the check for each feature.
Additionally we could now also improve error message in coverage feature check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants