No link to story available
████ # This file was generated bot-o-matically! Edit at your own risk. ████
The Dirty Pipe Vulnerability — The Dirty Pipe Vulnerability documentation [cm4all.com]:
Pipes and Buffers and Pages
Why pipes, anyway? In our setup, the web service which generates ZIP files communicates with the web server over pipes; it talks the Web Application Socket [github.com] protocol which we invented because we were not happy with CGI, FastCGI and AJP. Using pipes instead of multiplexing over a socket (like FastCGI and AJP do) has a major advantage: you can use splice() in both the application and the web server for maximum efficiency. This reduces the overhead for having web applications out-of-process (as opposed to running web services inside the web server process, like Apache modules do). This allows privilege separation without sacrificing (much) performance.
Short detour on Linux memory management [kernel.org]: The smallest unit of memory managed by the CPU is a page (usually 4 kB). Everything in the lowest layer of Linux’s memory management is about pages. If an application requests memory from the kernel, it will get a number of (anonymous) pages. All file I/O is also about pages: if you read data from a file, the kernel first copies a number of 4 kB chunks from the hard disk into kernel memory, managed by a subsystem called the page cache. From there, the data will be copied to userspace. The copy in the page cache remains for some time, where it can be used again, avoiding unnecessary hard disk I/O, until the kernel decides it has a better use for that memory (“reclaim”). Instead of copying file data to userspace memory, pages managed by the page cache can be mapped directly into userspace using the mmap() system call (a trade-off for reduced memory bandwidth at the cost of increased page faults and TLB flushes). The Linux kernel has more tricks: the sendfile() system call allows an application to send file contents into a socket without a roundtrip to userspace (an optimization popular in web servers serving static files over HTTP). The splice() system call is kind of a generalization of sendfile(): It allows the same optimization if either side of the transfer is a pipe; the other side can be almost anything (another pipe, a file, a socket, a block device, a character device). The kernel implements this by passing page references around, not actually copying anything (zero-copy).
A pipe is a tool for unidirectional inter-process communication. One end is for pushing data into it, the other end can pull that data. The Linux kernel implements this by a ring [github.com] of struct pipe_buffer [github.com], each referring to a page. The first write to a pipe allocates a page (space for 4 kB worth of data). If the most recent write does not fill the page completely, a following write may append to that existing page instead of allocating a new one. This is how “anonymous” pipe buffers work (anon_pipe_buf_ops [github.com]).
If you, however, splice() data from a file into the pipe, the kernel will first load the data into the page cache. Then it will create a structpipe_buffer pointing inside the page cache (zero-copy), but unlike anonymous pipe buffers, additional data written to the pipe must not be appended to such a page because the page is owned by the page cache, not by the pipe.
History of the check for whether new data can be appended to an existing pipe buffer:
Long ago, structpipe_buf_operations had a flag called can_merge.
Commit 5274f052e7b3 “Introduce sys_splice() system call” (Linux 2.6.16, 2006) [github.com] featured the splice() system call, introducing page_cache_pipe_buf_ops, a structpipe_buf_operations implementation for pipe buffers pointing into the page cache, the first one with can_merge=0 (not mergeable).
Commit 01e7187b4119 “pipe: stop using -can_merge” (Linux 5.0, 2019) [github.com] converted the can_merge flag into a structpipe_buf_operations pointer comparison because only anon_pipe_buf_ops has this flag set.
Commit f6dd975583bd “pipe: merge anon_pipe_buf*_ops” (Linux 5.8, 2020) [github.com] converted this pointer comparison to per-buffer flag PIPE_BUF_FLAG_CAN_MERGE.
Over the years, this check was refactored back and forth, which was okay. Or was it?
patch-your-kernels-and-reboot-soon dept.