Jump to content

seeking tips with reading files


Go to solution Solved by kicken,

Recommended Posts

1 hour ago, jodunno said:

I'm sorry but i do not see the difference between 0xFFx?? and checking it in the array.

The list of marker codes you are checking for is only a limited subset of possible marker codes.  That means you'll be missing a bunch of potential markers. For example,

  • 0xFFFE - Comment
  • 0xFFDD - Restart interval
  • 0xFFE2 - 0xFFEF - App specific markers.
  • and more

You may not care what those markers do, but you need to correctly identify them to ensure the file as a whole is parsed correctly.

1 hour ago, jodunno said:

Is the marker data different using my code than the output using your code?

By marker data I am referring to the data attached to the marker, ie. the hexdump of stuff shown after a marker in my script's output.  The next two byte after a marker define a length value, then after that is arbitrary data.  You need to skip over all that as you're searching for markers or you might catch something that looks like a marker but actually is not.  Skipping over that would save a bunch of time reading bytes as well.

Your idea of just finding the markers then using ftell to record their offsets is fine.  As you noted you can then go back and fseek to that position and read the marker data.  You just want to ensure you're finding the markers correctly by skipping past that data and the image data during your initial scan.

1 hour ago, jodunno said:

C0 is supposed to hold 17bytes of data.

Just make sure you are using the encoded length value to and not try to hard-code data lengths.  The length of the data attached to a marker isn't necessarily fixed.

Edited by kicken
33 minutes ago, kicken said:

The list of marker codes you are checking for is only a limited subset of possible marker codes

By design i am skipping those markers. I can easily add a function to check each character for code and base64 encoded code, which will include those markers. I can also add them if it is perceived as being a complete scan but correct is dependent upon what one is seeking and for what purpose. Comments are not defined in the specification as being a valid jpeg, they are optional markers.

34 minutes ago, kicken said:

Just make sure you are using the encoded length value

I know that the data lengths exist but if you trust the data in the file, then you are essentially trusting user input. The app0/signature is 16 bytes according to every document that i have read: FF E0 0 10, followed by the 10 byte signature. It is good to check the data in the image but the overall acceptance should be ruled by the code and not the image, since it should not be more than 16 bytes total as a valid app0 signature. The same can be said of c0 only it is mentioned in the specification as being 17bytes. DB and C4 are variable size. I am aware of it but i am not willing to allow the image to control the size or i will blindly accept an invalid header.

34 minutes ago, kicken said:

The next two byte after a marker define a length value, then after that is arbitrary data.  You need to skip over all that as you're searching for markers or you might catch something that looks like a marker but actually is not

I have yet to see any arbitrary data scanning for markers even with a simple script like the one that i posted to begin this thread. In fact, when i use a function with if 0xFF return next marker, there is 16 zeroes after the app0 marker and 17 zeroes after the C0 marker etc. I have only seen arbitrary data when i overused functions and if blocks to force discovery of markers. I will check all of my photos using both of our scripts to compare the output this weekend. I will see if there is something missing in my markers. I'll tweak the code to look for other markers but my final website code will exclude comments.

I have alot of work to do on my code as it is only in early creation stages. I have alot of testing, tweaking and hacking to do with this code. I want to make it better, faster, smarter and stronger. I have a long way to go. I will offer two modes and a flag now: search for all markers or search for select markers. That should satisfy complete parsing.

Also, i have added the mime types and extensions for the missing jpeg possibilities in my file upload script. I read over the rfc for jpeg200 today. Good lord, i am missing 6 extensions and three mime types. I have a ways to go to get this script polished and ready for useage.

Thank you for all of the help and tips, Kicken. You're a good person to take time to help an amateur. I truly appreciate you and this forum. I cannot say Thank You enough times.

Best wishes,

John

31 minutes ago, jodunno said:

I know that the data lengths exist but if you trust the data in the file, then you are essentially trusting user input.

It's not trusting user input, it's the format.  Unless you find some specification that explicitly says "Marker C0 has 17 bytes of data" then you shouldn't hard-code that condition.  The validation would go a little something like this:

  • Read 0xFF?? marker
  • Read length value (ie, $length=0x0011 / 17 bytes)
  • Read $length bytes
  • loop

If the file is valid, each iteration of the loop would read a marker, read it's length, then read it's data.  If someone tried to sneak in a file with an invalid length, the first read of the loop would not be a valid marker and you can abort.

Once you have the marker list, then for each marker you're interested in you'd have to parse the associated data.  If that data is invalid, then you can fail during that parsing.  Parsing data for validity can get complex and time consuming (which is why I'd just use a trusted library if possible).

1 hour ago, kicken said:

You need to skip over all that as you're searching for markers or you might catch something that looks like a marker but actually is not.

As an example of this, try an image that contains an embedded thumbnail with your code, such as this one.

50 minutes ago, kicken said:

As an example of this, try an image that contains an embedded thumbnail with your code, such as this one.

if you tweak my code it reads the thumbnails. hello? i am not interested in these markers. My algorithm for enforcing maximum 96% on the jpeg quality will block images with overloaded data (thumbnails, exif and comments). I am not interested in that but i yweaked my code and i read them and much faster than your code. Here is a comparison of output:

your script (copy and paste)

blacksmith
Found marker D3
Found marker D4
Found marker D7
Found marker D6
Found marker D6
Found marker D7
Found marker D7
Found marker D3
Found marker D6
Found marker D1
Found marker D4
Found marker D0
Found marker DA
Found marker C4
Found marker C0
Found marker EE
Found marker DD
Found marker DB
Found marker E1
Found marker E2
Found marker ED
Found marker E1

362888 bytes of memory usage

my script:

ffe1 4
ffd8 318
ffdb 320
ffdd 454
ffee 460
ffc0 476
ffc4 495
ffda 659
ffd0 1103
ffd1 1502
ffd2 1932
ffd3 2336
ffd4 2746
ffd5 3091
ffd6 3520
ffd7 4020
ffd0 4750
ffd1 5610
ffd2 6404
ffd3 7091
ffd4 7597
ffd5 8017
ffd6 8271
ffd7 8477
ffd0 8700

Array ( [0] => Array ( [d8] => 1 ) [1] => Array ( [e1] => 4 ) [2] => Array ( [d8] => 318 ) [3] => Array ( [db] => 320 ) [4] => Array ( [dd] => 454 ) [5] => Array ( [ee] => 460 ) [6] => Array ( [c0] => 476 ) [7] => Array ( [c4] => 495 ) [8] => Array ( [da] => 659 ) [9] => Array ( [d0] => 1103 ) [10] => Array ( [d1] => 1502 ) [11] => Array ( [d2] => 1932 ) [12] => Array ( [d3] => 2336 ) [13] => Array ( [d4] => 2746 ) [14] => Array ( [d5] => 3091 ) [15] => Array ( [d6] => 3520 ) [16] => Array ( [d7] => 4020 ) [17] => Array ( [d0] => 4750 ) [18] => Array ( [d1] => 5610 ) [19] => Array ( [d2] => 6404 ) [20] => Array ( [d3] => 7091 ) [21] => Array ( [d4] => 7597 ) [22] => Array ( [d5] => 8017 ) [23] => Array ( [d6] => 8271 ) [24] => Array ( [d7] => 8477 ) [25] => Array ( [d0] => 8700 ) [26] => Array ( [d9] => 1 ) )

1

1736 bytes of memory usage

also, i tried your script on a 4mb photo of mine and php choked with the following information:

Fatal error: Allowed memory size of 134217728 bytes
exhausted (tried to allocate 67108872 bytes) in
parseJpeg.php on line 78 [$imageData[] = $c;]

my script scanned the file no problem and here is the results (including the tweaked code markers)

Array ( [0] => Array ( [d8] => 1 ) [1] => Array ( [e1] => 4 ) [2] => Array ( [fe] => 6 ) [3] => Array ( [ff] => 1813 ) [4] => Array ( [8a] => 3038 ) [5] => Array ( [c7] => 3341 ) [6] => Array ( [cd] => 3899 ) [7] => Array ( [5e] => 4016 ) [8] => Array ( [cd] => 4411 ) [9] => Array ( [5e] => 4528 ) [10] => Array ( [8a] => 4574 ) [11] => Array ( [47] => 4877 ) [12] => Array ( [8a] => 5086 ) [13] => Array ( [6e] => 5443 ) [14] => Array ( [8a] => 5598 ) [15] => Array ( [47] => 5901 ) [16] => Array ( [cd] => 6459 ) [17] => Array ( [5e] => 6576 ) [18] => Array ( [cd] => 6971 ) [19] => Array ( [5e] => 7088 ) [20] => Array ( [cd] => 7483 ) [21] => Array ( [5e] => 7600 ) [22] => Array ( [5e] => 8624 ) [23] => Array ( [cd] => 9019 ) [24] => Array ( [5e] => 9136 ) [25] => Array ( [cd] => 9531 ) [26] => Array ( [5e] => 9648 ) [27] => Array ( [cd] => 10043 ) [28] => Array ( [5e] => 10160 ) [29] => Array ( [cd] => 10555 ) [30] => Array ( [5e] => 10672 ) [31] => Array ( [5e] => 11184 ) [32] => Array ( [cd] => 11579 ) [33] => Array ( [df] => 11642 ) [34] => Array ( [5e] => 11696 ) [35] => Array ( [cd] => 12091 ) [36] => Array ( [5e] => 12208 ) [37] => Array ( [5e] => 12720 ) [38] => Array ( [cf] => 13115 ) [39] => Array ( [4c] => 13581 ) [40] => Array ( [59] => 14425 ) [41] => Array ( [39] => 14613 ) [42] => Array ( [8a] => 14634 ) [43] => Array ( [ff] => 15050 ) [44] => Array ( [d8] => 15362 ) [45] => Array ( [db] => 15364 ) [46] => Array ( [c0] => 15498 ) [47] => Array ( [c4] => 15517 ) [48] => Array ( [da] => 15937 ) [49] => Array ( [d9] => 1 ) )


22384 bytes memory usage !!!

i also compared  a 1mb photo.

1.06mb image: 6298016 bytes memory usage ! = not good
1.06mb image: 4568 bytes

i like my script and i can easily include the useless data if i so choose to do so but my script doesn't crash. And when i analyze the data, i will yield it to the script to prevent a crash.

1 hour ago, jodunno said:

also, i tried your script on a 4mb photo of mine and php choked with the following information:

My code uses memory because it captures the data to generate the hexdumps.  Remove that, and it'd probably use barely any more than yours.  I know ways it could certainly use less, but memory usage isn't generally a huge concern for me.  Memory exists to be used, not using it is a waste of resources.

Your code:

ffe1 4
ffdb 320
ffc0 476
ffc4 495
ffda 659
ffd9 8921
Peak memory usage: 429.30KB

(pre-mature end of image due to thumbnail)


My code, without the hexdump stuff.

Found marker 0xE1 @ 0x00000002 (2)
Found marker 0xED @ 0x000022D9 (8921)
Found marker 0xE2 @ 0x000046DD (18141)
Found marker 0xE1 @ 0x0000491F (18719)
Found marker 0xDB @ 0x00007545 (30021)
Found marker 0xDD @ 0x000075CB (30155)
Found marker 0xEE @ 0x000075D1 (30161)
Found marker 0xC0 @ 0x000075E1 (30177)
Found marker 0xC4 @ 0x000075F4 (30196)
Found marker 0xDA @ 0x000076BC (30396)
Found marker 0xD9 @ 0x000CBC21 (834593)
Peak memory usage: 433.62KB

The extra memory is probably just due to the code structure differences.

Edited by kicken
12 hours ago, kicken said:

pre-mature end of image due to thumbnail

I mentioned that my code needs tweaked. Adjust it but at the end of the day, you have too many functions and too much hamster wheeling happening. I prefer a single handler for the markers. Get in, get out and the data is available for analysis (stored positions for gathering the data.) My tweak might be gathering false markers. I have not examined the data at those markers. The Blacksmith image has a ton of markers.

mt tweaked code (also checks for null before FF not just following FF)

<?php
  $startMemory = memory_get_usage();

    $SID_filePointer = fopen("Blacksmith.jpg", 'rb');
    $SID_JPEGmarkers = (array) []; $SID_eoi = 0;

    switch (fread($SID_filePointer, 2)) {
        case "\xFF\xD8": // SOI found
            $SID_JPEGmarkers[] = (array) ['d8' => 1];
            while (!feof($SID_filePointer)/*.*/) {
                $SID_markerID = fread($SID_filePointer, 1);
                switch ($SID_markerID) {
                  case "\xFF":
                      $SID_nextMarkerID = fread($SID_filePointer, 1);
                      if ($SID_nextMarkerID === "\x0" || $SID_nextMarkerID === "\xFF") { break; }
                      if ($SID_nextMarkerID === "\xD9") {
                          $SID_offset = ftell($SID_filePointer);
                          $SID_thirdMarkerID = fread($SID_filePointer, 1);
                          if ($SID_thirdMarkerID === "\x0" || $SID_thirdMarkerID === "\xFF") { fseek($SID_filePointer, $SID_offset); break; }
                          if (!$SID_thirdMarkerID) { $SID_eoi = 1; }
                      }
                      $SID_JPEGmarkers[] = (array) [dechex(ord($SID_nextMarkerID)) => ftell($SID_filePointer)];
                      echo dechex(ord($SID_markerID)) . dechex(ord($SID_nextMarkerID)) . ' ' . sprintf("0x%'.08X", ftell($SID_filePointer)) . ' (' . ftell($SID_filePointer) . ')<br>';
                      if ($SID_eoi) { break(2); }
                  break;
                  case "\x0":
                      $SID_offset = ftell($SID_filePointer);
                      if (fread($SID_filePointer, 1) === "\xFF") { break; }
                      fseek($SID_filePointer, $SID_offset);
                }
            }

        break;
        default:
            echo 'The file is not readable.'; 
    }

    fclose($SID_filePointer);
?>

<html>
<head>
  <title></title>
</head>
<body>
<p><?php print_r($SID_JPEGmarkers); ?></p>
<div style="background: #ffffff; color: #000000;"><?php echo memory_get_usage() - $startMemory, ' bytes'; ?></div>

</body>
</html>

you could actually be useful in tweaking my code to accomplish what you think is not working rather than insisting on coding it with your idealogy. I think that the tweaked code is not prematurely finding the eoi. But the code is probably reading false markers as it finds a ton of data in Blacksmith.jpg

I tweaked my code and it shows the following for Blacksmith image

ffe1 0x00000004 (4)
ffdb 0x00000140 (320)
ffdd 0x000001C6 (454)
ffee 0x000001CC (460)
ffc0 0x000001DC (476)
ffc4 0x000001EF (495)
ffda 0x00000293 (659)
ffd0 0x0000044F (1103)
ffd1 0x000005DE (1502)
ffd2 0x0000078C (1932)
ffd3 0x00000920 (2336)
ffd4 0x00000ABA (2746)
ffd5 0x00000C13 (3091)
ffd6 0x00000DC0 (3520)
ffd7 0x00000FB4 (4020)
ffd0 0x0000128E (4750)
ffd1 0x000015EA (5610)
ffd2 0x00001904 (6404)
ffd3 0x00001BB3 (7091)
ffd4 0x00001DAD (7597)
ffd5 0x00001F51 (8017)
ffd6 0x0000204F (8271)
ffd7 0x0000211D (8477)
ffd0 0x000021FC (8700)
ffed 0x000022DB (8923)
ffd8 0x00002525 (9509)
ffdb 0x00002527 (9511)
ffdd 0x000025AD (9645)
ffee 0x000025B3 (9651)
ffc0 0x000025C3 (9667)
ffc4 0x000025D6 (9686)
ffda 0x0000267A (9850)
ffd0 0x00002836 (10294)
ffd1 0x000029C5 (10693)
ffd2 0x00002B73 (11123)
ffd3 0x00002D07 (11527)
ffd4 0x00002EA1 (11937)
ffd5 0x00002FFA (12282)
ffd6 0x000031A7 (12711)
ffd7 0x0000339B (13211)
ffd0 0x00003675 (13941)
ffd1 0x000039D1 (14801)
ffd2 0x00003CEB (15595)
ffd3 0x00003F9A (16282)
ffd4 0x00004194 (16788)
ffd5 0x00004338 (17208)
ffd6 0x00004436 (17462)
ffd7 0x00004504 (17668)
ffd0 0x000045E3 (17891)
ffe2 0x000046DF (18143)
ffe1 0x00004921 (18721)
ffdb 0x00007547 (30023)
ffdd 0x000075CD (30157)
ffee 0x000075D3 (30163)
ffc0 0x000075E3 (30179)
ffc4 0x000075F6 (30198)
ffda 0x000076BE (30398)
ffd0 0x00008DBC (36284)
ffd1 0x0000A42C (42028)
ffd2 0x0000BAAD (47789)
ffd3 0x0000D199 (53657)
ffd4 0x0000E7F0 (59376)
ffd5 0x0000FDEB (65003)
ffd6 0x0001140D (70669)
ffd7 0x00012950 (76112)
ffd0 0x00013ED0 (81616)
ffd1 0x000154A9 (87209)
ffd2 0x000169E2 (92642)
ffd3 0x00017F3B (98107)
ffd4 0x000193FA (103418)
ffd5 0x0001A893 (108691)
ffd6 0x0001BD88 (114056)
ffd7 0x0001D20F (119311)
ffd0 0x0001E65C (124508)
ffd1 0x0001FA60 (129632)
ffd2 0x00020ED1 (134865)
ffd3 0x0002238A (140170)
ffd4 0x00023930 (145712)
ffd5 0x00024E24 (151076)
ffd6 0x00026282 (156290)
ffd7 0x000276E6 (161510)
ffd0 0x00028AD5 (166613)
ffd1 0x00029EA4 (171684)
ffd2 0x0002B2B8 (176824)
ffd3 0x0002C630 (181808)
ffd4 0x0002D9C1 (186817)
ffd5 0x0002EDBD (191933)
ffd6 0x000301A4 (197028)
ffd7 0x00031568 (202088)
ffd0 0x00032A54 (207444)
ffd1 0x00033E82 (212610)
ffd2 0x000352E1 (217825)
ffd3 0x0003684F (223311)
ffd4 0x00037C6E (228462)
ffd5 0x00039026 (233510)
ffd6 0x0003A491 (238737)
ffd7 0x0003B7FA (243706)
ffd0 0x0003CB84 (248708)
ffd1 0x0003E025 (253989)
ffd2 0x0003F3DB (259035)
ffd3 0x0004081C (264220)
ffd4 0x00041D17 (269591)
ffd5 0x000431A8 (274856)
ffd6 0x000446D3 (280275)
ffd7 0x00045D4C (286028)
ffd0 0x0004725A (291418)
ffd1 0x000486B4 (296628)
ffd2 0x00049BDA (302042)
ffd3 0x0004B056 (307286)
ffd4 0x0004C554 (312660)
ffd5 0x0004DB3B (318267)
ffd6 0x0004F0E7 (323815)
ffd7 0x000506E2 (329442)
ffd0 0x00051EA4 (335524)
ffd1 0x000535E4 (341476)
ffd2 0x00054DBE (347582)
ffd3 0x0005657B (353659)
ffd4 0x00057CED (359661)
ffd5 0x00059477 (365687)
ffd6 0x0005AD39 (372025)
ffd7 0x0005C9C3 (379331)
ffd0 0x0005E7F8 (387064)
ffd1 0x000606AD (394925)
ffd2 0x000628A8 (403624)
ffd3 0x00064D67 (413031)
ffd4 0x0006733E (422718)
ffd5 0x00069A78 (432760)
ffd6 0x0006C461 (443489)
ffd7 0x0006EFAA (454570)
ffd0 0x000718D7 (465111)
ffd1 0x00073FF1 (475121)
ffd2 0x00076644 (484932)
ffd3 0x00078CF9 (494841)
ffd4 0x0007B2A9 (504489)
ffd5 0x0007D426 (513062)
ffd6 0x0007F3E8 (521192)
ffd7 0x000813E5 (529381)
ffd0 0x0008341D (537629)
ffd1 0x00085447 (545863)
ffd2 0x000875B0 (554416)
ffd3 0x000896E0 (562912)
ffd4 0x0008B959 (571737)
ffd5 0x0008DB57 (580439)
ffd6 0x0008FDA6 (589222)
ffd7 0x00091F2E (597806)
ffd0 0x0009402B (606251)
ffd1 0x0009603A (614458)
ffd2 0x00097F5D (622429)
ffd3 0x00099D30 (630064)
ffd4 0x0009B79D (636829)
ffd5 0x0009D0B8 (643256)
ffd6 0x0009E7FB (649211)
ffd7 0x0009FE20 (654880)
ffd0 0x000A13FA (660474)
ffd1 0x000A2908 (665864)
ffd2 0x000A3E2C (671276)
ffd3 0x000A541E (676894)
ffd4 0x000A692F (682287)
ffd5 0x000A7E5E (687710)
ffd6 0x000A9425 (693285)
ffd7 0x000AA79E (698270)
ffd0 0x000ABA70 (703088)
ffd1 0x000ACDE5 (708069)
ffd2 0x000ADFB9 (712633)
ffd3 0x000AF1B3 (717235)
ffd4 0x000B0422 (721954)
ffd5 0x000B153C (726332)
ffd6 0x000B2632 (730674)
ffd7 0x000B37DC (735196)
ffd0 0x000B4917 (739607)
ffd1 0x000B59DD (743901)
ffd2 0x000B6BA0 (748448)
ffd3 0x000B7C95 (752789)
ffd4 0x000B8D4F (757071)
ffd5 0x000B9EBD (761533)
ffd6 0x000BAFB5 (765877)
ffd7 0x000BC03C (770108)
ffd0 0x000BD152 (774482)
ffd1 0x000BE20A (778762)
ffd2 0x000BF29C (783004)
ffd3 0x000C035D (787293)
ffd4 0x000C1444 (791620)
ffd5 0x000C24F4 (795892)
ffd6 0x000C35D0 (800208)
ffd7 0x000C4699 (804505)
ffd0 0x000C57AE (808878)
ffd1 0x000C688F (813199)
ffd2 0x000C7906 (817414)
ffd3 0x000C89A3 (821667)
ffd4 0x000C9A7E (825982)
ffd5 0x000CAB50 (830288)
ffd9 0x000CBC23 (834595)
Array ( [0] => Array ( [d8] => 1 ) [1] => Array ( [e1] => 4 ) [2] => Array ( [db] => 320 ) [3] => Array ( [dd] => 454 ) [4] => Array ( [ee] => 460 ) [5] => Array ( [c0] => 476 ) [6] => Array ( [c4] => 495 ) [7] => Array ( [da] => 659 ) [8] => Array ( [d0] => 1103 ) [9] => Array ( [d1] => 1502 ) [10] => Array ( [d2] => 1932 ) [11] => Array ( [d3] => 2336 ) [12] => Array ( [d4] => 2746 ) [13] => Array ( [d5] => 3091 ) [14] => Array ( [d6] => 3520 ) [15] => Array ( [d7] => 4020 ) [16] => Array ( [d0] => 4750 ) [17] => Array ( [d1] => 5610 ) [18] => Array ( [d2] => 6404 ) [19] => Array ( [d3] => 7091 ) [20] => Array ( [d4] => 7597 ) [21] => Array ( [d5] => 8017 ) [22] => Array ( [d6] => 8271 ) [23] => Array ( [d7] => 8477 ) [24] => Array ( [d0] => 8700 ) [25] => Array ( [ed] => 8923 ) [26] => Array ( [d8] => 9509 ) [27] => Array ( [db] => 9511 ) [28] => Array ( [dd] => 9645 ) [29] => Array ( [ee] => 9651 ) [30] => Array ( [c0] => 9667 ) [31] => Array ( [c4] => 9686 ) [32] => Array ( [da] => 9850 ) [33] => Array ( [d0] => 10294 ) [34] => Array ( [d1] => 10693 ) [35] => Array ( [d2] => 11123 ) [36] => Array ( [d3] => 11527 ) [37] => Array ( [d4] => 11937 ) [38] => Array ( [d5] => 12282 ) [39] => Array ( [d6] => 12711 ) [40] => Array ( [d7] => 13211 ) [41] => Array ( [d0] => 13941 ) [42] => Array ( [d1] => 14801 ) [43] => Array ( [d2] => 15595 ) [44] => Array ( [d3] => 16282 ) [45] => Array ( [d4] => 16788 ) [46] => Array ( [d5] => 17208 ) [47] => Array ( [d6] => 17462 ) [48] => Array ( [d7] => 17668 ) [49] => Array ( [d0] => 17891 ) [50] => Array ( [e2] => 18143 ) [51] => Array ( [e1] => 18721 ) [52] => Array ( [db] => 30023 ) [53] => Array ( [dd] => 30157 ) [54] => Array ( [ee] => 30163 ) [55] => Array ( [c0] => 30179 ) [56] => Array ( [c4] => 30198 ) [57] => Array ( [da] => 30398 ) [58] => Array ( [d0] => 36284 ) [59] => Array ( [d1] => 42028 ) [60] => Array ( [d2] => 47789 ) [61] => Array ( [d3] => 53657 ) [62] => Array ( [d4] => 59376 ) [63] => Array ( [d5] => 65003 ) [64] => Array ( [d6] => 70669 ) [65] => Array ( [d7] => 76112 ) [66] => Array ( [d0] => 81616 ) [67] => Array ( [d1] => 87209 ) [68] => Array ( [d2] => 92642 ) [69] => Array ( [d3] => 98107 ) [70] => Array ( [d4] => 103418 ) [71] => Array ( [d5] => 108691 ) [72] => Array ( [d6] => 114056 ) [73] => Array ( [d7] => 119311 ) [74] => Array ( [d0] => 124508 ) [75] => Array ( [d1] => 129632 ) [76] => Array ( [d2] => 134865 ) [77] => Array ( [d3] => 140170 ) [78] => Array ( [d4] => 145712 ) [79] => Array ( [d5] => 151076 ) [80] => Array ( [d6] => 156290 ) [81] => Array ( [d7] => 161510 ) [82] => Array ( [d0] => 166613 ) [83] => Array ( [d1] => 171684 ) [84] => Array ( [d2] => 176824 ) [85] => Array ( [d3] => 181808 ) [86] => Array ( [d4] => 186817 ) [87] => Array ( [d5] => 191933 ) [88] => Array ( [d6] => 197028 ) [89] => Array ( [d7] => 202088 ) [90] => Array ( [d0] => 207444 ) [91] => Array ( [d1] => 212610 ) [92] => Array ( [d2] => 217825 ) [93] => Array ( [d3] => 223311 ) [94] => Array ( [d4] => 228462 ) [95] => Array ( [d5] => 233510 ) [96] => Array ( [d6] => 238737 ) [97] => Array ( [d7] => 243706 ) [98] => Array ( [d0] => 248708 ) [99] => Array ( [d1] => 253989 ) [100] => Array ( [d2] => 259035 ) [101] => Array ( [d3] => 264220 ) [102] => Array ( [d4] => 269591 ) [103] => Array ( [d5] => 274856 ) [104] => Array ( [d6] => 280275 ) [105] => Array ( [d7] => 286028 ) [106] => Array ( [d0] => 291418 ) [107] => Array ( [d1] => 296628 ) [108] => Array ( [d2] => 302042 ) [109] => Array ( [d3] => 307286 ) [110] => Array ( [d4] => 312660 ) [111] => Array ( [d5] => 318267 ) [112] => Array ( [d6] => 323815 ) [113] => Array ( [d7] => 329442 ) [114] => Array ( [d0] => 335524 ) [115] => Array ( [d1] => 341476 ) [116] => Array ( [d2] => 347582 ) [117] => Array ( [d3] => 353659 ) [118] => Array ( [d4] => 359661 ) [119] => Array ( [d5] => 365687 ) [120] => Array ( [d6] => 372025 ) [121] => Array ( [d7] => 379331 ) [122] => Array ( [d0] => 387064 ) [123] => Array ( [d1] => 394925 ) [124] => Array ( [d2] => 403624 ) [125] => Array ( [d3] => 413031 ) [126] => Array ( [d4] => 422718 ) [127] => Array ( [d5] => 432760 ) [128] => Array ( [d6] => 443489 ) [129] => Array ( [d7] => 454570 ) [130] => Array ( [d0] => 465111 ) [131] => Array ( [d1] => 475121 ) [132] => Array ( [d2] => 484932 ) [133] => Array ( [d3] => 494841 ) [134] => Array ( [d4] => 504489 ) [135] => Array ( [d5] => 513062 ) [136] => Array ( [d6] => 521192 ) [137] => Array ( [d7] => 529381 ) [138] => Array ( [d0] => 537629 ) [139] => Array ( [d1] => 545863 ) [140] => Array ( [d2] => 554416 ) [141] => Array ( [d3] => 562912 ) [142] => Array ( [d4] => 571737 ) [143] => Array ( [d5] => 580439 ) [144] => Array ( [d6] => 589222 ) [145] => Array ( [d7] => 597806 ) [146] => Array ( [d0] => 606251 ) [147] => Array ( [d1] => 614458 ) [148] => Array ( [d2] => 622429 ) [149] => Array ( [d3] => 630064 ) [150] => Array ( [d4] => 636829 ) [151] => Array ( [d5] => 643256 ) [152] => Array ( [d6] => 649211 ) [153] => Array ( [d7] => 654880 ) [154] => Array ( [d0] => 660474 ) [155] => Array ( [d1] => 665864 ) [156] => Array ( [d2] => 671276 ) [157] => Array ( [d3] => 676894 ) [158] => Array ( [d4] => 682287 ) [159] => Array ( [d5] => 687710 ) [160] => Array ( [d6] => 693285 ) [161] => Array ( [d7] => 698270 ) [162] => Array ( [d0] => 703088 ) [163] => Array ( [d1] => 708069 ) [164] => Array ( [d2] => 712633 ) [165] => Array ( [d3] => 717235 ) [166] => Array ( [d4] => 721954 ) [167] => Array ( [d5] => 726332 ) [168] => Array ( [d6] => 730674 ) [169] => Array ( [d7] => 735196 ) [170] => Array ( [d0] => 739607 ) [171] => Array ( [d1] => 743901 ) [172] => Array ( [d2] => 748448 ) [173] => Array ( [d3] => 752789 ) [174] => Array ( [d4] => 757071 ) [175] => Array ( [d5] => 761533 ) [176] => Array ( [d6] => 765877 ) [177] => Array ( [d7] => 770108 ) [178] => Array ( [d0] => 774482 ) [179] => Array ( [d1] => 778762 ) [180] => Array ( [d2] => 783004 ) [181] => Array ( [d3] => 787293 ) [182] => Array ( [d4] => 791620 ) [183] => Array ( [d5] => 795892 ) [184] => Array ( [d6] => 800208 ) [185] => Array ( [d7] => 804505 ) [186] => Array ( [d0] => 808878 ) [187] => Array ( [d1] => 813199 ) [188] => Array ( [d2] => 817414 ) [189] => Array ( [d3] => 821667 ) [190] => Array ( [d4] => 825982 ) [191] => Array ( [d5] => 830288 ) [192] => Array ( [d9] => 834595 ) )

95136 bytes

so i guess that a restart marker is being read.

42 minutes ago, jodunno said:

My tweak might be gathering false markers. I have not examined the data at those markers. The Blacksmith image has a ton of markers.

The markers seem to all be valid, the list just includes markers that belong to embedded images as well.

ffe1 0x00000004 (4)
--------------------- Embedded image markers:
ffdb 0x00000140 (320)
ffdd 0x000001C6 (454)
ffee 0x000001CC (460)
ffc0 0x000001DC (476)
ffc4 0x000001EF (495)
ffda 0x00000293 (659)
ffd0 0x0000044F (1103)
ffd1 0x000005DE (1502)
ffd2 0x0000078C (1932)
ffd3 0x00000920 (2336)
ffd4 0x00000ABA (2746)
ffd5 0x00000C13 (3091)
ffd6 0x00000DC0 (3520)
ffd7 0x00000FB4 (4020)
ffd0 0x0000128E (4750)
ffd1 0x000015EA (5610)
ffd2 0x00001904 (6404)
ffd3 0x00001BB3 (7091)
ffd4 0x00001DAD (7597)
ffd5 0x00001F51 (8017)
ffd6 0x0000204F (8271)
ffd7 0x0000211D (8477)
ffd0 0x000021FC (8700)
---------------------------
ffed 0x000022DB (8923)
--------------------------- Another embedded image:
ffd8 0x00002525 (9509)
ffdb 0x00002527 (9511)
ffdd 0x000025AD (9645)
ffee 0x000025B3 (9651)
ffc0 0x000025C3 (9667)
ffc4 0x000025D6 (9686)
ffda 0x0000267A (9850)
ffd0 0x00002836 (10294)
ffd1 0x000029C5 (10693)
ffd2 0x00002B73 (11123)
ffd3 0x00002D07 (11527)
ffd4 0x00002EA1 (11937)
ffd5 0x00002FFA (12282)
ffd6 0x000031A7 (12711)
ffd7 0x0000339B (13211)
ffd0 0x00003675 (13941)
ffd1 0x000039D1 (14801)
ffd2 0x00003CEB (15595)
ffd3 0x00003F9A (16282)
ffd4 0x00004194 (16788)
ffd5 0x00004338 (17208)
ffd6 0x00004436 (17462)
ffd7 0x00004504 (17668)
ffd0 0x000045E3 (17891)
---------------------------
ffe2 0x000046DF (18143)
ffe1 0x00004921 (18721)
ffdb 0x00007547 (30023)
ffdd 0x000075CD (30157)
ffee 0x000075D3 (30163)
ffc0 0x000075E3 (30179)
ffc4 0x000075F6 (30198)
ffda 0x000076BE (30398)
--------------------------- Image data reset markers.
ffd0 0x00008DBC (36284)
ffd1 0x0000A42C (42028)
ffd2 0x0000BAAD (47789)
ffd3 0x0000D199 (53657)
ffd4 0x0000E7F0 (59376)
ffd5 0x0000FDEB (65003)
ffd6 0x0001140D (70669)
ffd7 0x00012950 (76112)
ffd0 0x00013ED0 (81616)
ffd1 0x000154A9 (87209)
ffd2 0x000169E2 (92642)
ffd3 0x00017F3B (98107)
ffd4 0x000193FA (103418)
ffd5 0x0001A893 (108691)
ffd6 0x0001BD88 (114056)
ffd7 0x0001D20F (119311)
ffd0 0x0001E65C (124508)
ffd1 0x0001FA60 (129632)
ffd2 0x00020ED1 (134865)
ffd3 0x0002238A (140170)
ffd4 0x00023930 (145712)
ffd5 0x00024E24 (151076)
ffd6 0x00026282 (156290)
ffd7 0x000276E6 (161510)
ffd0 0x00028AD5 (166613)
ffd1 0x00029EA4 (171684)
ffd2 0x0002B2B8 (176824)
ffd3 0x0002C630 (181808)
ffd4 0x0002D9C1 (186817)
ffd5 0x0002EDBD (191933)
ffd6 0x000301A4 (197028)
ffd7 0x00031568 (202088)
ffd0 0x00032A54 (207444)
ffd1 0x00033E82 (212610)
ffd2 0x000352E1 (217825)
ffd3 0x0003684F (223311)
ffd4 0x00037C6E (228462)
ffd5 0x00039026 (233510)
ffd6 0x0003A491 (238737)
ffd7 0x0003B7FA (243706)
ffd0 0x0003CB84 (248708)
ffd1 0x0003E025 (253989)
ffd2 0x0003F3DB (259035)
ffd3 0x0004081C (264220)
ffd4 0x00041D17 (269591)
ffd5 0x000431A8 (274856)
ffd6 0x000446D3 (280275)
ffd7 0x00045D4C (286028)
ffd0 0x0004725A (291418)
ffd1 0x000486B4 (296628)
ffd2 0x00049BDA (302042)
ffd3 0x0004B056 (307286)
ffd4 0x0004C554 (312660)
ffd5 0x0004DB3B (318267)
ffd6 0x0004F0E7 (323815)
ffd7 0x000506E2 (329442)
ffd0 0x00051EA4 (335524)
ffd1 0x000535E4 (341476)
ffd2 0x00054DBE (347582)
ffd3 0x0005657B (353659)
ffd4 0x00057CED (359661)
ffd5 0x00059477 (365687)
ffd6 0x0005AD39 (372025)
ffd7 0x0005C9C3 (379331)
ffd0 0x0005E7F8 (387064)
ffd1 0x000606AD (394925)
ffd2 0x000628A8 (403624)
ffd3 0x00064D67 (413031)
ffd4 0x0006733E (422718)
ffd5 0x00069A78 (432760)
ffd6 0x0006C461 (443489)
ffd7 0x0006EFAA (454570)
ffd0 0x000718D7 (465111)
ffd1 0x00073FF1 (475121)
ffd2 0x00076644 (484932)
ffd3 0x00078CF9 (494841)
ffd4 0x0007B2A9 (504489)
ffd5 0x0007D426 (513062)
ffd6 0x0007F3E8 (521192)
ffd7 0x000813E5 (529381)
ffd0 0x0008341D (537629)
ffd1 0x00085447 (545863)
ffd2 0x000875B0 (554416)
ffd3 0x000896E0 (562912)
ffd4 0x0008B959 (571737)
ffd5 0x0008DB57 (580439)
ffd6 0x0008FDA6 (589222)
ffd7 0x00091F2E (597806)
ffd0 0x0009402B (606251)
ffd1 0x0009603A (614458)
ffd2 0x00097F5D (622429)
ffd3 0x00099D30 (630064)
ffd4 0x0009B79D (636829)
ffd5 0x0009D0B8 (643256)
ffd6 0x0009E7FB (649211)
ffd7 0x0009FE20 (654880)
ffd0 0x000A13FA (660474)
ffd1 0x000A2908 (665864)
ffd2 0x000A3E2C (671276)
ffd3 0x000A541E (676894)
ffd4 0x000A692F (682287)
ffd5 0x000A7E5E (687710)
ffd6 0x000A9425 (693285)
ffd7 0x000AA79E (698270)
ffd0 0x000ABA70 (703088)
ffd1 0x000ACDE5 (708069)
ffd2 0x000ADFB9 (712633)
ffd3 0x000AF1B3 (717235)
ffd4 0x000B0422 (721954)
ffd5 0x000B153C (726332)
ffd6 0x000B2632 (730674)
ffd7 0x000B37DC (735196)
ffd0 0x000B4917 (739607)
ffd1 0x000B59DD (743901)
ffd2 0x000B6BA0 (748448)
ffd3 0x000B7C95 (752789)
ffd4 0x000B8D4F (757071)
ffd5 0x000B9EBD (761533)
ffd6 0x000BAFB5 (765877)
ffd7 0x000BC03C (770108)
ffd0 0x000BD152 (774482)
ffd1 0x000BE20A (778762)
ffd2 0x000BF29C (783004)
ffd3 0x000C035D (787293)
ffd4 0x000C1444 (791620)
ffd5 0x000C24F4 (795892)
ffd6 0x000C35D0 (800208)
ffd7 0x000C4699 (804505)
ffd0 0x000C57AE (808878)
ffd1 0x000C688F (813199)
ffd2 0x000C7906 (817414)
ffd3 0x000C89A3 (821667)
ffd4 0x000C9A7E (825982)
ffd5 0x000CAB50 (830288)
---------------------------
ffd9 0x000CBC23 (834595)

All the 0xFFD0 - 0xFFD7 markers are reset markers within the image data.  My code skips those since they are only relevant to decoding the image data and not the overall file structure.

46 minutes ago, jodunno said:

you have too many functions

If you prefer to code in spaghetti style, go ahead.  Functions are not some horrible thing to be avoided though, one would generally be hard pressed to ever have "too many".

Functions let you name a block of code, which can make the overall code much easier to understand what is happening. Your code requires careful reading of each line to know what is happening.  My code can be easily skimmed through and still know what's happening thanks to the function names.

While not used much here, functions allow you to easily re-use code rather than copy/pasting it and having to maintain separate instances.  Main reason I have a parseJpeg function to start with instead of just a block of code was so I could re-use it in separate web (easier to share) and cli (easier to develop/debug) versions of the script.

 

22 minutes ago, kicken said:

The markers seem to all be valid

I don't know if it is accurate. I have not found an algorithm for accurately detecting markers. I only ever read xFF followed by x?? and not null x00 but that is false in code because it detects rescan and restart data and perhaps more! I have a compact camera and i checked a photo from that camera and i am seeing strange markers which lead me to believe that my code is incorrect (ff b5, ff ee, ff dd and so forth). How can i get the accurate markers and not all of the dn rescan markers? what is the algorithm?

i found a document that also mentions idf markers and i wonder if the b5 ee dd is idf data.

i am getting frustrated.

edit: there should not be more than one c0 but my code for blacksmith shows several c0 entries. I sometimes see ff d8 in a scan with this code which seems to be incorrect.

Edited by jodunno
added info
4 minutes ago, jodunno said:

How can i get the accurate markers and not all of the dn rescan markers? what is the algorithm?

You are getting the reset markers (and likely the other odd markers) because as I mentioned earlier.

19 hours ago, kicken said:

Since you're not yet scanning properly for the marker and image data (which is fine, baby steps remember) you might be getting some false markers in your output.

Each marker has associated data.  Remember the general format? \xFF\x??<marker>\x????<length>\x??...<data>

You need to read the two byte length that follows the marker, then skip over $length bytes.   That $length bytes of data is arbitrary and may contain bytes that look like markers.  That's where the embedded images are stored for example, so if you skipped over that data you'd stop finding those markers as well which would fix that pre-mature end-of-image marker problem.

I'm not sure how to implement the data length to skip over it in my code but i notice that this data is accurate compared to your data accept all of the data contained in the e1 metadata marker. I looked at my code but adding an echo to see where this data is coming from. I can see that it is contained in the e1 marker. Then my code picks up at e2 where yours picks up at. I'll have to think about how to skip the metadata in e1, i suppose.

1 hour ago, jodunno said:

I'm not sure how to implement the data length to skip over it in my code

You essential just need to do what I did in my parseSegmentData function.  If you don't want to use a function, you can just inline it like the rest of your code, it's only two lines.

function parseSegmentData($fp) : string{
    //Markers indicate the start of a segment which is composed of <length><data> sections.
    //The length is two bytes and is the length of the entire segment including the two bytes
    //used to define the length.

    //Extract the length from the next two bytes.
    $dataLength = unpack('nlength', fread($fp, 2))['length'];

    //Read the remaining data using that length value.  Subtract 2 because
    //$dataLength includes the two bytes we just read to obtain the length
    $data = fread($fp, $dataLength - 2);

    return $data;
}

After you've identified a marker, read two more bytes and use unpack to convert them into an integer value.  Then you can either read, or fseek() forward, $length-2 bytes.

Skipping past the image data is more complicated as there's no length value.  You just need to read the data until you find another marker (excluding the reset markers).  The original parseImageData function I posted here failed to skip the reset markers, but the updated code on my working example has that bug fixed.  Storing the bytes read into $imageData is unnecessary if you don't want to do anything with the image data, so code your version without that.

I have the answer. LOL. I thought about it now for two hours and it dawned on me how to erase the thumbnail data from my code. I'm not telling my solution because you have no desire to accept reality that is not your own. The world doesn't need to code the way you code and inaccuarately naming code that does not conform to your ideology as being spaghetti code is downright ridiculous. Spaghetti code is php weaving in-and-out of html like strings of spaghetti noodles. My code is pure php. The added html is for dislay of the memory usage only. Simply remove it and echo memory usage to the screen. 0.0 I doubt that you could fix my code to exclude the thumbnails because you believe that data size is somewhere in the TIFF header. I know where the end can be found and i just watched it disappear from my code with one tweak. It is quite brilliant because most people would not think of it.

So my current code displays the same markerids as your code but simpler. I am happy with my code and now i have positions to all of the data as well. I'm moving on from this project now.

Best wishes, John

4 hours ago, jodunno said:

I have the answer. LOL. I thought about it now for two hours and it dawned on me how to erase the thumbnail data from my code. I'm not telling my solution because you have no desire to accept reality that is not your own.

Seriously?

I have a weak code injection scanner that was using lines via a generator function that yields each line. I didn't know how to read a file at the time of creation as i am not a programmer yet. Anyway, i decided to try that Blacksmith image and my code reports that extra data was found, which means that php or javascript code was found. I am not sure if it is a false positive or not. I will have to tweak my code to find the exact position of the match and decode it. I may not have enough time to rewrite the function today but i will keep working on it. Just a notice incase the scanner is not a false positive. I wouldn't want anyone to run that image through insecure php code if it really does contain encoded code injection. None of my other images return extra data alerts except for the known injection images, so i find it to be quite interesting.

  • 2 weeks later...

update: I appreciate your help Kicken but maintaining two copies of the image in memory is not what i was seeking (if you strip your functions and look at the output, then you can see that the data is being copied into an array. Thus, two copies of the image are now in memory.) Also, i am not trying to build a hex viewer, so this hex dump is beyond my usage. I simply wanted to read the necessary data from the file while scanning it for code injection and weak stenography. I have spent alot of time learning how to do it myself and i am now reading the exif data and extracting the data that i would like to view. Your code lacks all of this data when parsing an App1 segment. I have uploaded a video to my youtubechannel so that you can see my new code in action. I used the Blacksmith image in this video. The video will need to be adjusted in youtube to see the results better.

I have now extracted over half of the data that i was seeking. I only have to grab image dimensions, comments. I also have to calculate the dpicm when the unit is 3. I will do this eventually. In the video, you will see that i stop at the first encounter of a thumbnail in the Blacksmith photo. I still have to get the offset of the thumbnail and its length. Then i will also scan the thumbnail for code injection while reporting thumbnail count, an thumbnail image type.

I am not a programmer, so this has been a long process for me. I am happy that i have written my own EXIF reader.

Best wishes,

John

35 minutes ago, jodunno said:

update: I appreciate your help Kicken but maintaining two copies of the image in memory is not what i was seeking (if you strip your functions and look at the output, then you can see that the data is being copied into an array. Thus, two copies of the image are now in memory.) Also, i am not trying to build a hex viewer, so this hex dump is beyond my usage.

I'm not sure what you're talking about with storing two copies of the image.  None of my code does that, at worst the version with the hex dump stores one copy.  The version without the hex dump doesn't store anything, as shown by the low memory usage above.  The hex dump stuff was there purely as part of the example/demo to help see how the data is structured, of course it's not something you'd be putting into a final result so why complain about it?

Maybe you think it's storing multiple copies because the functions read data into an array and return it, which then gets copied into another array, but that's not how PHP works.  PHP uses something called copy-on-write, which means if you just assign a value to another variable then both variables refer to the same data in memory.  The data is only copied if you attempt to change one of the variables values in some way.  Since my code never modifies the contents of those array's after their initial creation, they are never duplicated and thus there's only one copy of them in memory.

38 minutes ago, jodunno said:

Your code lacks all of this data when parsing an App1 segment.

Of course, my code was never intended to be a full parser, just a small demo for your benefit to show how one goes about reading and parsing the binary data of a file.  Since the file reference guide only showed details on the JFIF header, that's all I bothered to implement.  It is enough of a demo that you should be able to extrapolate from it how to parse whatever other structures you want.

43 minutes ago, jodunno said:

I am happy that i have written my own EXIF reader.

Congrats, it's always nice when you can finally see something working.

1 hour ago, kicken said:

Congrats, it's always nice when you can finally see something working.

well, Thank you, Kicken. I have worked very hard on this project. Honestly, i learned on my own how to do this reader/scanner. I did not use anything from your code. Your code does not read addresses and jump to their offsets. I had to think about it and i came up wih a solution. I am proud of my work because i am not a programmer. I am very happy that i am reading and scanning a jpeg and accurately extracting data. I laugh because my Windows 10 does not show a date for the Blacksmith image but i have the datetime stamp from the exif data. I beat MS at this game, LOL. I am not arrogant or anything like that. I am surprised that i am able to do this. I still have to extract a few things, then i need to optimize the code etc. I started from scratch because i tried to upload an image that my Wife made using an old Motorola phone and your script almost crashed my Edge browser. I found out that the image has extra data after the EOI marker, so your code took almost 12 seconds to display the data. I decided that i need to build my own script from scratch. I needed the Motorola image to learn how the byte order works. I now process her image with no problems and even extract the exif data.

the scanner is meant to be used with my upload script. I made another video to show the script in action. I tested with several code injected images and tried various file violations to see how my code handles the situation.

anyway, i have learned alot by reading exif metadata turorials and thinking about how to extract this data. I finally learned how to unpack the various data types. I still need to learn how to unpack the cm resolution since it is different. I will do that eventually, my next step is getting the image dimensions, then processing the thumbnail data.

Best wishes, John

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.