Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(userspace/libscap): check bound before reading past socket buffer #2271

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

shane-lawrence
Copy link

@shane-lawrence shane-lawrence commented Feb 5, 2025

Signed-off-by: Shane Lawrence [email protected]

What type of PR is this?
/kind bug

Any specific area of the project related to this PR?
/area libscap

Does this PR require a change in the driver versions?
I don't think so.

What this PR does / why we need it:
This PR corrects a bug in libscap where the next character in a buffer is read before checking if it's out of bounds. This can cause a segfault when the 1 MB buffer ends with a TIME_WAIT socket.

Which issue(s) this PR fixes:
Fixes #2272 and #2276.

Special notes for your reviewer:
I had trouble getting the C++ test suite to work with the older C code in scap_fds.c, so I put them in separate files. Please let me know if there's a better way to handle it.

Does this PR introduce a user-facing change?:
no

NONE

@poiana
Copy link
Contributor

poiana commented Feb 5, 2025

Welcome @shane-lawrence! It looks like this is your first PR to falcosecurity/libs 🎉

@poiana poiana added the size/XS label Feb 5, 2025
@FedeDP
Copy link
Contributor

FedeDP commented Feb 5, 2025

Thanks for this contribution; it makes sense to me.
/milestone 0.21.0

@poiana poiana added this to the 0.21.0 milestone Feb 5, 2025
Copy link

github-actions bot commented Feb 5, 2025

Perf diff from master - unit tests

     3.83%     +2.02%  [.] next_event_from_file
     7.27%     +1.25%  [.] sinsp::next
    18.25%     -0.78%  [.] sinsp_threadinfo::get_main_thread
     7.00%     -0.63%  [.] sinsp_parser::reset
     1.40%     +0.39%  [.] sinsp_parser::process_event
     5.02%     +0.35%  [.] sinsp_evt::get_type
     8.22%     +0.35%  [.] std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release
     2.11%     -0.34%  [.] sinsp_evt::load_params
     0.58%     -0.33%  [.] scap_next
     9.59%     -0.29%  [.] sinsp_thread_manager::create_thread_dependencies

Heap diff from master - unit tests

peak heap memory consumption: -14.45K
peak RSS (including heaptrack overhead): 0B
total memory leaked: -14.45K

Heap diff from master - scap file

peak heap memory consumption: -14.45K
peak RSS (including heaptrack overhead): 0B
total memory leaked: -14.45K

Benchmarks diff from master

Comparing gbench_data.json to /root/actions-runner/_work/libs/libs/build/gbench_data.json
Benchmark                                                         Time             CPU      Time Old      Time New       CPU Old       CPU New
----------------------------------------------------------------------------------------------------------------------------------------------
BM_sinsp_split_mean                                            -0.0544         -0.0544           150           142           150           142
BM_sinsp_split_median                                          -0.0490         -0.0490           151           143           151           143
BM_sinsp_split_stddev                                          +0.7722         +0.7721             1             2             1             2
BM_sinsp_split_cv                                              +0.8742         +0.8740             0             0             0             0
BM_sinsp_concatenate_paths_relative_path_mean                  -0.0212         -0.0212            63            61            63            61
BM_sinsp_concatenate_paths_relative_path_median                -0.0287         -0.0287            63            61            63            61
BM_sinsp_concatenate_paths_relative_path_stddev                -0.1798         -0.1803             1             1             1             1
BM_sinsp_concatenate_paths_relative_path_cv                    -0.1621         -0.1625             0             0             0             0
BM_sinsp_concatenate_paths_empty_path_mean                     +0.0321         +0.0321            24            25            24            25
BM_sinsp_concatenate_paths_empty_path_median                   +0.0273         +0.0273            24            24            24            24
BM_sinsp_concatenate_paths_empty_path_stddev                   +4.6197         +4.6495             0             0             0             0
BM_sinsp_concatenate_paths_empty_path_cv                       +4.4450         +4.4738             0             0             0             0
BM_sinsp_concatenate_paths_absolute_path_mean                  +0.0137         +0.0137            62            63            62            63
BM_sinsp_concatenate_paths_absolute_path_median                +0.0140         +0.0140            62            63            62            63
BM_sinsp_concatenate_paths_absolute_path_stddev                -0.4247         -0.4247             0             0             0             0
BM_sinsp_concatenate_paths_absolute_path_cv                    -0.4325         -0.4325             0             0             0             0
BM_sinsp_split_container_image_mean                            -0.0150         -0.0150           395           389           395           389
BM_sinsp_split_container_image_median                          -0.0187         -0.0187           396           388           396           388
BM_sinsp_split_container_image_stddev                          -0.4687         -0.4686             4             2             4             2
BM_sinsp_split_container_image_cv                              -0.4607         -0.4605             0             0             0             0

Copy link

codecov bot commented Feb 5, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 75.32%. Comparing base (0d94d2b) to head (d4391a8).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #2271   +/-   ##
=======================================
  Coverage   75.32%   75.32%           
=======================================
  Files         280      280           
  Lines       34556    34556           
  Branches     5902     5902           
=======================================
  Hits        26031    26031           
  Misses       8525     8525           
Flag Coverage Δ
libsinsp 75.32% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@shane-lawrence
Copy link
Author

I added tests and confirmed that it triggers a segfault without the fix but succeeds with the fix.

@shane-lawrence shane-lawrence marked this pull request as ready for review February 6, 2025 03:30
@poiana poiana requested a review from leogr February 6, 2025 03:30
@shane-lawrence
Copy link
Author

Just rebased on master to pick up the API changes @ekoops made recently.

@leogr
Copy link
Member

leogr commented Feb 6, 2025

Hey @shane-lawrence

Thank you for this PR! Just noticed 👇
image
See https://github.com/falcosecurity/libs/actions/runs/13171693500/job/36769287438?pr=2271

May you fix the code formatting, please?

LucaGuerra
LucaGuerra previously approved these changes Feb 6, 2025
Copy link
Contributor

@LucaGuerra LucaGuerra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great Catch! Thank you!

@poiana
Copy link
Contributor

poiana commented Feb 6, 2025

LGTM label has been added.

Git tree hash: 431f76b8d8276d05cd2821e4cd7cc62d0faf0a32

@LucaGuerra
Copy link
Contributor

I'm restarting the CI, if it passes it's good for me

@shane-lawrence
Copy link
Author

Please note that since drivers CI is pretty heavy, it only triggers when either userspace/libscap, userspace/libpman or driver folders are touched.

Oh that's it! When I ran the new test without the changes to libscap, CI skipped libscap_test. I understand now that's because it didn't see any changes to userspace/libscap.

I agree with you that we should definitely run the libscap_test with asan enabled; feel free to just add -DUSE_ASAN=On -DUSE_UBSAN=On to cmake configure options to the test-scap CI job :)

Great! I added this in 824363c.

@FedeDP
Copy link
Contributor

FedeDP commented Feb 14, 2025

Oh nice, enabling asan found another issue in the test-scap 😆

[ RUN      ] scap_event.empty_clone
/home/runner/work/libs/libs/test/libscap/test_suites/userspace/scap_event.cpp:107:2: runtime error: reference binding to misaligned address 0x6030000056c6 for type 'const unsigned int', which requires 4 byte alignment
0x6030000056c6: note: pointer points here
 00 00 de 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00

EDIT: of course, evt->n_params is 6bit aligned because we pack the struct:

#pragma pack(push, 1)
#endif
struct ppm_evt_hdr {
#ifdef PPM_ENABLE_SENTINEL
	uint32_t sentinel_begin;
#endif
	uint64_t ts;      /* timestamp, in nanoseconds from epoch */
	uint64_t tid;     /* the tid of the thread that generated this event */
	uint32_t len;     /* the event len, including the header */
	uint16_t type;    /* the event type */
	uint32_t nparams; /* the number of parameters of the event */
};
#pragma pack(pop)

This thread is interesting: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51628

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this committed by error?

Copy link
Author

@shane-lawrence shane-lawrence Feb 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's mock data to allow testing. I couldn't use real data from a production host but we need input that meets the specific conditions for triggering the bug, so I generated this. It should mimic a valid sockets table and must have a line with consecutive spaces at the 1MB buffer boundary and the inode number just beyond that.

The mock data is read for the test in scap_fds_impl.c:31-43.

@FedeDP
Copy link
Contributor

FedeDP commented Feb 17, 2025

scap_event.empty_clone

About the test-scap tests failing, cc @LucaGuerra any idea?

@shane-lawrence
Copy link
Author

shane-lawrence commented Feb 17, 2025

About the test-scap tests failing, cc @LucaGuerra any idea?

Most of the tests that trigger this can avoid it by copying the field before evaluating them like this:

-       EXPECT_EQ(evt->nparams, 0);
+       uint16_t nparams_copy;
+       memcpy(&nparams_copy, &evt->nparams, sizeof(uint16_t));
+       EXPECT_EQ(nparams_copy, 0);

That won't work for test_scap_create_event though, since it's asserting that each field is in the correct location in the packed struct. I'm a bit out of my depth here, but it looks like allowing 2 bytes of padding would satisfy the alignment check on 4-byte boundaries, but it would also be a sweeping change and even a couple bytes can be significant when it's on every event.

Would it be okay if I just suppress these for now and open a separate issue?

@FedeDP
Copy link
Contributor

FedeDP commented Feb 18, 2025

I'm a bit out of my depth here, but it looks like allowing 2 bytes of padding would satisfy the alignment check on 4-byte boundaries, but it would also be a sweeping change and even a couple bytes can be significant when it's on every event.

Yep we cannot change the struct padding :/

I am ok with your solution, but at that point it would be better to expose helper functions in scap_event.h like get_ts(scap_event *evt, uint64_t *ts), since that is the correct way to access these fields (perhaps with a big comment :D )

Would it be okay if I just suppress these for now and open a separate issue?

If you agree, i'd just suppress test_scap_create_event for now and fix the others.

Or, you can revert the enablement of ASAN in the libscap test CI and we will re-enable it in a follow up PR trying also to fix these failures :) That works too!

@shane-lawrence
Copy link
Author

I added a getter to safely check nparams, and changed the test to use memcmp with a uint8_t* so we can safely compare bytes in the event without memory alignment issues.

@shane-lawrence
Copy link
Author

shane-lawrence commented Feb 19, 2025

Next I ran into this (unsigned int -1) value triggering an error for undefined behaviour:

[ RUN      ] scap_ppm_sc.scap_ppm_sc_from_name
/usr/src/falcosecurity/libs/build/googletest-src/googletest/include/gtest/gtest.h:1358:11: runtime error: load of value 4294967295, which is not a valid value for type 'ppm_sc_code'

I started to fix this in 6e103ba, but the test fails because it expects -1 for each error condition. We can revert that if UNKNOWN isn't a suitable reply for an error condition. One possible solution would be to add an error value to the enum and return something like PPM_SC_ERROR. What do you think?

@FedeDP
Copy link
Contributor

FedeDP commented Feb 19, 2025

Since the test below (scap_native_id_to_ppm_sc) checks that:

ASSERT_EQ(scap_native_id_to_ppm_sc(80000000), PPM_SC_UNKNOWN);
ASSERT_EQ(scap_native_id_to_ppm_sc(-12), PPM_SC_UNKNOWN);

ie: a wrong syscall ID lead to PPM_SC_UNKNOWN, i think that it's ok that a wrong syscall name returns PPM_SC_UNKNOWN (and not -1).
I'd fix the scap_ppm_sc_from_name test to check for PPM_SC_UNKNOWN instead of -1.

Again, thanks you are doing a terrific job in this PR (and sorry if it is growing a little bit over your expectations!)

EDIT: oh, but looking at some usage of those APIs, we explicitly check for != -1, eg: in libsinsp::events::sc_names_to_sc_set. You'll need to update all of them to use PPM_SC_UNKNOWN ;) (and scap_ppm_sc_to_native_id is used multiple times!)

@shane-lawrence
Copy link
Author

shane-lawrence commented Feb 19, 2025

Thanks for all of your help so far! I would like to get the undefined behaviours fixed but they're out of scope for this PR. All of the address sanitizer issues detected by the tests have been resolved and it will help to ensure that future changes to libscap don't reintroduce a similar tricky segfault. I'm leaving asan enabled but disabling ubsan and moving the alignment and enum fixes to a separate branch for a future PR.

While we can see that some potential bugs are ignored here, I think this leaves us with a resolution to my issue and detection in place for a whole class of bugs, while keeping the PR to a reasonable scope. Let me know if you want me to rebase to clean up the commit history a bit, but hopefully the overall changes are now ready for a final review :).

@FedeDP
Copy link
Contributor

FedeDP commented Feb 20, 2025

I fully agree, and am really sorry that what seemed to be a simple patch became much larger :/
Thanks for your hard work!

Copy link
Contributor

@FedeDP FedeDP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, left a small comment.
Also, do you mind squashing the changes in like 2 commits (the fix + the new test)?
Thanks, then we are good to go for real ahah

@FedeDP FedeDP changed the title Check bound before reading past socket buffer. fix(userspace/libscap): check bound before reading past socket buffer Feb 20, 2025
@shane-lawrence
Copy link
Author

I rebased on master with two clean commits and CI is running now. Let me know if you have any other suggestions.

@FedeDP
Copy link
Contributor

FedeDP commented Feb 20, 2025

Everything 🟢 on my side. Thank you very much once again!

Copy link
Contributor

@FedeDP FedeDP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@poiana
Copy link
Contributor

poiana commented Feb 20, 2025

LGTM label has been added.

Git tree hash: 5f821cb6085f2838efb01fc7576c398fa9df2fb7

@poiana
Copy link
Contributor

poiana commented Feb 20, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: FedeDP, shane-lawrence

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Todo
Development

Successfully merging this pull request may close these issues.

Segfault in libscap reading IPv4 sockets from /proc.
5 participants