It's pseudo-code, it doesn't search for anything specific. Pseudo code is only an outline of the steps, not a full implementation. The function is called 'findNextMarker'. A marker is defined in the file format as: the type is defined by a marker: 2 bytes, FF then a non-zero byte (*).
So no, it's not supposed to look for just any 0xFF. It supposed to look for 0xFF followed by any byte other than 0x00.
It's not skipping SOI or EOI. It's just not parsing them for a data segment because they do not have one. Again, from the file format reference:
A string is effectively just an array of characters. There's no difference between the two from a security perspective. PHP code also isn't subject to something like a buffer overflow error leading to arbitrary code execution (unless there's some problem in the PHP engine itself). You can also analyze a string just as easily (or easier) as an array of individual characters.
Then ignore that step, it doesn't change the others.
You haven't even seen my actual code yet, so I'm not sure how you're able to judge it. Since I said I'd share after letting you ponder the advice for a while though, here it is:
<?php
function parseJpeg(string $file) : array{
$fp = fopen($file, 'rb');
if (!$fp){
throw new \RuntimeException('Unable to open file');
}
//First two bytes should be \xFFD8.
if (fread($fp, 2) !== "\xFF\xD8"){
throw new \RuntimeException('Invalid image file.');
}
$output = [];
//Find each segment by looking for the marker values \xFF??
while (!feof($fp) && ($marker = findNextMarker($fp))){
$blockData = [
'marker' => $marker
];
//If the marker is not the end of image marker.
if ($marker !== 0xD9){
//Parse the segment data for this marker.
$blockData['segmentData'] = parseSegmentData($fp);
if ($marker === 0xDA){ //If the marker is a start of scan marker.
//Parse the image data that follows.
$blockData['imageData'] = parseImageData($fp);
} else if ($marker === 0xE0){ //If the marker is the app0 header.
$blockData['headerData'] = parseApp0Header($blockData['segmentData']);
}
$output[] = $blockData;
} else {
$output[] = $blockData;
break;
}
}
fclose($fp);
return $output;
}
function parseSegmentData($fp) : string{
//Markers indicate the start of a segment which is composed of <length><data> sections.
//The length is two bytes and is the length of the entire segment including the two bytes
//used to define the length.
//Extract the length from the next two bytes.
$dataLength = unpack('nlength', fread($fp, 2))['length'];
//Read the remaining data using that length value. Subtract 2 because
//$dataLength includes the two bytes we just read to obtain the length
$data = fread($fp, $dataLength - 2);
return $data;
}
function parseImageData($fp) : string{
$imageData = [];
//Read data until we find another marker or hit end of file.
while (!feof($fp)){
$c = fread($fp, 1);
//We might have found another marker.
if ($c === "\xFF"){
//Save our position in the file
//If we found a marker, we need to rewind to just before it.
$pos = ftell($fp);
//We only found a marker if the next byte is not \x00
$next = fread($fp, 1);
if ($next !== "\x00"){
//Rewind the file to just before the marker we just found and exit the loop.
fseek($fp, $pos - 1);
break;
}
}
$imageData[] = $c;
}
$imageData = implode('', $imageData);
return $imageData;
}
function parseApp0Header(string $data) : ?array{
$unpacked = unpack('Z5id/c2version/cunits/n2dpi/c2thumb', $data);
if (!$unpacked){
return null;
}
return [
'id' => $unpacked['id']
, 'version' => $unpacked['version1'] . '.' . $unpacked['version2']
, 'units' => $unpacked['units']
, 'density' => $unpacked['dpi1'] . 'x' . $unpacked['dpi2']
, 'thumbnail' => $unpacked['thumb1'] . 'x' . $unpacked['thumb2']
];
}
function findNextMarker($fp) : int{
//Scan the file content for the next \xFF?? marker.
//This scans one byte at a time which is terrible for
//performance but easy. Loading more data into memory
//and using strpos would be better, but since you like
//low-memory...
do {
$markerIndicator = fread($fp, 1);
if ($markerIndicator === "\xFF"){
$marker = fread($fp, 1);
if ($marker !== "\x00"){
return ord($marker);
}
}
} while (!feof($fp));
throw new \RuntimeException('Marker not found');
}