RohanH Posted October 15, 2021 Author Share Posted October 15, 2021 1 hour ago, Barand said: if (ctype_digit($sdata[0])) { // does it lok like a student number? $res[$kc]['students'][] = $sdata; } If they want me to add the N/A, No Record etc, in the same array along with the dates what do we change here? And when they talk about test here what can possibly be the test? it says : " Validates the output with a test, executable from command line along with the result." Quote Link to comment https://forums.phpfreaks.com/topic/313994-web-scraping-unstructured-html-table/page/2/#findComment-1591090 Share on other sites More sharing options...
Barand Posted October 15, 2021 Share Posted October 15, 2021 Test if it's a student nnumber or "N/A"? if (ctype_digit($sdata[0]) || $sdata[0]=='N/A') { $res[$kc]['students'][] = $sdata; } If you want those with "AAVL", "CBSE" etc as well, then if (!empty($sdata[0]) && $sdata[0] != ' ') { $res[$kc]['students'][] = $sdata; } Quote Link to comment https://forums.phpfreaks.com/topic/313994-web-scraping-unstructured-html-table/page/2/#findComment-1591092 Share on other sites More sharing options...
RohanH Posted October 15, 2021 Author Share Posted October 15, 2021 12 minutes ago, Barand said: If you want those with "AAVL", "CBSE" etc as well, then FINALLY !! Thank you ... It works... !!!😀 Now, one last doubt if I want to perform test in this(also get the output in command line) how can I do that? Quote Link to comment https://forums.phpfreaks.com/topic/313994-web-scraping-unstructured-html-table/page/2/#findComment-1591094 Share on other sites More sharing options...
gw1500se Posted October 15, 2021 Share Posted October 15, 2021 (edited) From command line: <path to php.exe> -f "<path to script.php>" -- -arg1 -arg2 -arg3 Edited October 15, 2021 by gw1500se Quote Link to comment https://forums.phpfreaks.com/topic/313994-web-scraping-unstructured-html-table/page/2/#findComment-1591095 Share on other sites More sharing options...
RohanH Posted October 15, 2021 Author Share Posted October 15, 2021 (edited) 1 hour ago, gw1500se said: From command line: <path to php.exe> -f "<path to script.php>" -- -arg1 -arg2 -arg3 But dont i need to write some script to perform the test on the output ? Edited October 15, 2021 by RohanH Quote Link to comment https://forums.phpfreaks.com/topic/313994-web-scraping-unstructured-html-table/page/2/#findComment-1591097 Share on other sites More sharing options...
gw1500se Posted October 15, 2021 Share Posted October 15, 2021 You want to process the output? Why not do that in PHP as well? Quote Link to comment https://forums.phpfreaks.com/topic/313994-web-scraping-unstructured-html-table/page/2/#findComment-1591098 Share on other sites More sharing options...
RohanH Posted October 15, 2021 Author Share Posted October 15, 2021 19 minutes ago, gw1500se said: You want to process the output? Why not do that in PHP as well? Yes, that is what I intend to do basically but I am not sure how! Final task says I will need to validate the output with a test. Quote Link to comment https://forums.phpfreaks.com/topic/313994-web-scraping-unstructured-html-table/page/2/#findComment-1591099 Share on other sites More sharing options...
gw1500se Posted October 15, 2021 Share Posted October 15, 2021 Only you can tell what output is valid. You need to tell us what criteria you want to use. What happened to the json part you wanted? Quote Link to comment https://forums.phpfreaks.com/topic/313994-web-scraping-unstructured-html-table/page/2/#findComment-1591100 Share on other sites More sharing options...
RohanH Posted October 15, 2021 Author Share Posted October 15, 2021 (edited) 21 minutes ago, gw1500se said: Only you can tell what output is valid. You need to tell us what criteria you want to use. What happened to the json part you wanted? {"0":{"date":"May29Tue","student_details":"N\/A"},"1":{"date":"May30Wed","student_details":[{"student_id":"AAVL","report_time":"06:00","leaving_time":"14:00","office":"null","start_time":"null","destination":"null"}]},"2":{"date":"May31Thu","student_details":[{"student_id":"8751","report_time":"03:55","leaving_time":"04:55","office":"WFH","start_time":"COMP AVL","destination":"08:00"},{"student_id":"8752","report_time":"08:35","leaving_time":"COMP AVL","office":"WFH","start_time":"11:55","destination":"12:25"}]},"3":{"date":"Jun01Fri","student_details":[{"student_id":"8462","report_time":"04:30","leaving_time":"05:30","office":"WFH","start_time":"COMP NOT AVL","destination":"07:10"},{"student_id":"8465","report_time":"07:45","leaving_time":"COMP NOT AVL","office":"WFH","start_time":"09:20","destination":"09:50"}]},"4":{"date":"Jun02Sat","student_details":[{"student_id":"CBSE","report_time":"02:00","leaving_time":"10:00","office":"null","start_time":"null","destination":"null"}]},"5":{"date":"Jun03Sun","student_details":"N\/A"},"6":{"date":"Jun04Mon","student_details":"N\/A"},"7":{"date":"Jun05Tue","student_details":"N\/A"},"8":{"date":"Jun06Wed","student_details":"N\/A"},"9":{"date":"Jun07Thu","student_details":"N\/A"},"10":{"date":"Jun08Fri","student_details":[{"student_id":"8113","report_time":"05:05","leaving_time":"06:05","office":"WFH","start_time":"ZRH","destination":"07:50"},{"student_id":"8114","report_time":"08:25","leaving_time":"ZRH","office":"WFH","start_time":"10:10","destination":"null"},{"student_id":"8277","report_time":"11:05","leaving_time":"WFH","office":"MAD","start_time":"13:40","destination":"14:10"}]},"11":{"date":"Jun09Sat","student_details":[{"student_id":"8274","report_time":"04:00","leaving_time":"05:00","office":"MAD","start_time":"WFH","destination":"07:25"},{"student_id":"8221","report_time":"08:10","leaving_time":"WFH","office":"VLC","start_time":"10:30","destination":"null"},{"student_id":"8222","report_time":"11:05","leaving_time":"VLC","office":"WFH","start_time":"14:00","destination":"14:30"}]},"12":{"date":"Jun10Sun","student_details":"N\/A"},"13":{"date":"Jun11Mon","student_details":"N\/A"},"14":{"date":"Jun12Tue","student_details":[{"student_id":"AAVL","report_time":"05:15","leaving_time":"13:15","office":"null","start_time":"null","destination":"null"}]},"15":{"date":"Jun13Wed","student_details":[{"student_id":"8973","report_time":"04:05","leaving_time":"05:05","office":"WFH","start_time":"SOF","destination":"08:05"},{"student_id":"8974","report_time":"08:50","leaving_time":"SOF","office":"WFH","start_time":"12:10","destination":"12:40"}]},"16":{"date":"Jun14Thu","student_details":[{"student_id":"ADTY","report_time":"09:30","leaving_time":"16:30","office":"null","start_time":"null","destination":"null"}]},"17":{"date":"Jun15Fri","student_details":[{"student_id":"8233","report_time":"12:25","leaving_time":"13:25","office":"WFH","start_time":"SSP","destination":"15:40"},{"student_id":"8237","report_time":"16:10","leaving_time":"SSP","office":"WFH","start_time":"18:25","destination":"18:55"}]},"18":{"date":"Jun16Sat","student_details":"N\/A"},"19":{"date":"Jun17Sun","student_details":"N\/A"},"20":{"date":"Jun18Mon","student_details":[{"student_id":"807","report_time":"11:35","leaving_time":"12:35","office":"WFH","start_time":"OMV","destination":"14:10"},{"student_id":"808","report_time":"14:35","leaving_time":"OMV","office":"WFH","start_time":"16:15","destination":"null"},{"student_id":"837","report_time":"16:50","leaving_time":"WFH","office":"BFS","start_time":"18:25","destination":"null"},{"student_id":"840","report_time":"18:55","leaving_time":"BFS","office":"WFH","start_time":"20:25","destination":"20:55"}]},"21":{"date":"Jun19Tue","student_details":[{"student_id":"8551","report_time":"10:50","leaving_time":"11:50","office":"WFH","start_time":"MJV","destination":"14:30"},{"student_id":"8552","report_time":"15:00","leaving_time":"null","office":"WFH","start_time":"17:40","destination":"null"},{"student_id":"8187","report_time":"18:55","leaving_time":"WFH","office":"LIN","start_time":"20:50","destination":"21:20"}]},"23":{"date":"N\/A","student_details":[{"student_id":"06:00","report_time":"14:00","leaving_time":"null","office":"null","start_time":"null","destination":"null"}]},"24":{"date":"AAVL","student_details":[{"student_id":"03:55","report_time":"04:55","leaving_time":"WFH","office":"COMP AVL","start_time":"08:00","destination":"null"},{"student_id":"08:35","report_time":"COMP AVL","leaving_time":"WFH","office":"11:55","start_time":"12:25","destination":"null"}]},"25":{"date":"8751","student_details":[{"student_id":"04:30","report_time":"05:30","leaving_time":"WFH","office":"COMP NOT AVL","start_time":"07:10","destination":"null"},{"student_id":"07:45","report_time":"COMP NOT AVL","leaving_time":"null","office":"09:20","start_time":"09:50","destination":"null"}]},"26":{"date":"8462","student_details":[{"student_id":"02:00","report_time":"10:00","leaving_time":"null","office":"null","start_time":"null","destination":"null"}]}} This here is the output, which I parsed to json that is the format they want for the output. And there is no such criteria for the test is defined, it simply says that the output must be validated by a test. Edited October 15, 2021 by RohanH Quote Link to comment https://forums.phpfreaks.com/topic/313994-web-scraping-unstructured-html-table/page/2/#findComment-1591101 Share on other sites More sharing options...
gw1500se Posted October 15, 2021 Share Posted October 15, 2021 Without criteria there can be no test. Quote Link to comment https://forums.phpfreaks.com/topic/313994-web-scraping-unstructured-html-table/page/2/#findComment-1591102 Share on other sites More sharing options...
RohanH Posted October 15, 2021 Author Share Posted October 15, 2021 9 minutes ago, gw1500se said: Without criteria there can be no test. What can possibly be the criteria to test the json output in this case, can you suggest? so that I can be aware of them and maybe show an example too? Quote Link to comment https://forums.phpfreaks.com/topic/313994-web-scraping-unstructured-html-table/page/2/#findComment-1591103 Share on other sites More sharing options...
Barand Posted October 15, 2021 Share Posted October 15, 2021 [{"student_id":"AAVL","report_time":"06:00","leaving_time":"14:00", ... ^^^^^^^^^^ ^^^^^^^^^^^ ^^^^^^^^^^^^ How are you getting those field names into the data? There is no consistent pattern in the data blocks and no labels. For example [date] => May31Thu [students] => Array ( [0] => Array ( [0] => 8751 Student [1] => 03:55 2 times [2] => 04:55 [3] => WFH 2 texts [4] => COMP AVL [5] => 08:00 1 time ) [1] => Array ( [0] => 8752 Student [1] => 08:35 1 time [2] => COMP AVL 2 texts [3] => WFH [4] => 11:55 2 times [5] => 12:25 ) ) Quote Link to comment https://forums.phpfreaks.com/topic/313994-web-scraping-unstructured-html-table/page/2/#findComment-1591104 Share on other sites More sharing options...
RohanH Posted October 15, 2021 Author Share Posted October 15, 2021 On 10/14/2021 at 10:47 PM, RohanH said: 9th of June Student Id:8274 Report Time:4:00 Leaving Time:5:00 OFFICE:MAD Start Time:7:00 DESTINATION:WFH So, the sequence was given as an example i am just using that as a reference. for ($i = 1; $i <= 4; $i++) { foreach ($results[$i] as $key => $sdata) { if (!empty($sdata[0]) && $sdata[0] != ' ') { if ($sdata[0] == 'N/A') { $res[$key]['student_details'] = $sdata[0]; } else { foreach ($sdata as $rkey => $rdata) { $replace_nbsp = str_replace(" ", "null", $rdata); if ($rkey == 0) { $new_arr['student_id'] = ($rdata != ' ') ? $rdata : $replace_nbsp; } else if ($rkey == 1) { $new_arr['report_time'] = ($rdata != ' ') ? $rdata : $replace_nbsp; } else if ($rkey == 2) { $new_arr['leaving_time'] = ($rdata != ' ') ? $rdata : $replace_nbsp; } else if ($rkey == 3) { $new_arr['office'] = ($rdata != ' ') ? $rdata : $replace_nbsp; } else if ($rkey == 4) { $new_arr['start_time'] = ($rdata != ' ') ? $rdata : $replace_nbsp; } else if ($rkey == 5) { $new_arr['destination'] = ($rdata != ' ') ? $rdata : $replace_nbsp; } } $res[$key]['student_details'][] = $new_arr; } } } } Quote Link to comment https://forums.phpfreaks.com/topic/313994-web-scraping-unstructured-html-table/page/2/#findComment-1591105 Share on other sites More sharing options...
Barand Posted October 15, 2021 Share Posted October 15, 2021 The point is you can't apply the same set of field names because the data appears to be randomly ordered for each block of students. Where they the same text (like "COMP AVL") sometimes it's the third item, sometimes it's the fifth. (Why do you think you got the Award Certificate?) See example in my previous post. Quote Link to comment https://forums.phpfreaks.com/topic/313994-web-scraping-unstructured-html-table/page/2/#findComment-1591107 Share on other sites More sharing options...
RohanH Posted October 15, 2021 Author Share Posted October 15, 2021 (edited) 6 minutes ago, Barand said: The point is you can't apply the same set of field names because the data appears to be randomly ordered for each block of students. Where they the same text (like "COMP AVL") sometimes it's the third item, sometimes it's the fifth. (Why do you think you got the Award Certificate?) See example in my previous post. And just realized that. Also realized that no matter what we do because there is no identifiable pattern we cannot get the field names and data organized. They got the table so messed up! I am not sure how they want us to figure out which field belongs to which data, do you suggest me to submit the task without the field names? Because, from where I can see the data is ultimately going to be incorrect given the field names, also they want us to perform the test what test can we really perform in this json data model? Edited October 15, 2021 by RohanH Quote Link to comment https://forums.phpfreaks.com/topic/313994-web-scraping-unstructured-html-table/page/2/#findComment-1591108 Share on other sites More sharing options...
Barand Posted October 15, 2021 Share Posted October 15, 2021 The only way I can see of doing it is to supply the labels to be applied when you define the data blocks For example [ 'start' => [ 'r'=>6, 'c'=>0 ], 'end' => [ 'r'=>11, 'c'=>21 ], 'labels'=> [ 'student_id', 'report_time', 'leaving-time', 'office', 'comp_avl', 'first_race'] ] 4 minutes ago, RohanH said: And just realized that. Thats strange. It's one of things I pointed out right at the start - I said there was no consistency in the data. Quote Link to comment https://forums.phpfreaks.com/topic/313994-web-scraping-unstructured-html-table/page/2/#findComment-1591109 Share on other sites More sharing options...
gw1500se Posted October 15, 2021 Share Posted October 15, 2021 Sounds to me like you need to backup and rewrite the original data schema. From where is the data coming? Quote Link to comment https://forums.phpfreaks.com/topic/313994-web-scraping-unstructured-html-table/page/2/#findComment-1591110 Share on other sites More sharing options...
Barand Posted October 15, 2021 Share Posted October 15, 2021 I'd assumed they'd given someone 15 minutes html training and told them to construct the table of student data manually. Quote Link to comment https://forums.phpfreaks.com/topic/313994-web-scraping-unstructured-html-table/page/2/#findComment-1591111 Share on other sites More sharing options...
RohanH Posted October 15, 2021 Author Share Posted October 15, 2021 10 minutes ago, Barand said: The only way I can see of doing it is to supply the labels to be applied when you define the data blocks Will it still not be an issue to detect the data pattern and assign the field names? Quote Link to comment https://forums.phpfreaks.com/topic/313994-web-scraping-unstructured-html-table/page/2/#findComment-1591113 Share on other sites More sharing options...
Barand Posted October 15, 2021 Share Posted October 15, 2021 The label arrays define the data pattern for each block Quote Link to comment https://forums.phpfreaks.com/topic/313994-web-scraping-unstructured-html-table/page/2/#findComment-1591114 Share on other sites More sharing options...
RohanH Posted October 15, 2021 Author Share Posted October 15, 2021 13 minutes ago, gw1500se said: Sounds to me like you need to backup and rewrite the original data schema. From where is the data coming? No clue, they just skyped me the zip which had this broken html code. Quote Link to comment https://forums.phpfreaks.com/topic/313994-web-scraping-unstructured-html-table/page/2/#findComment-1591115 Share on other sites More sharing options...
RohanH Posted October 15, 2021 Author Share Posted October 15, 2021 1 minute ago, Barand said: The label arrays define the data pattern for each block okay, and as per the random order of data in our table will it still work? Quote Link to comment https://forums.phpfreaks.com/topic/313994-web-scraping-unstructured-html-table/page/2/#findComment-1591116 Share on other sites More sharing options...
Barand Posted October 15, 2021 Share Posted October 15, 2021 The proof of the pudding is in the eating... (Line 72 applies the labels to the student data) $html = file_get_contents('c:/inetpub/wwwroot/test/doc1355/rohan.html'); // the table html $ranges= [[ 'start' => [ 'r'=>5, 'c'=>0 ], // DATES 'end' => [ 'r'=>5, 'c'=>21 ], 'labels'=> ['date'] ], [ 'start' => [ 'r'=>6, 'c'=>0 ], 'end' => [ 'r'=>11, 'c'=>21 ], 'labels'=> [ 'student_id', 'report_time', 'leaving-time', 'office', 'other', 'first_race'] ], [ 'start' => [ 'r'=>13, 'c'=>0 ], 'end' => [ 'r'=>18, 'c'=>21 ], 'labels'=> [ 'student_id', 'report_time', 'other', 'office', 'leaving-time', 'first_race'] ], [ 'start' => [ 'r'=>19, 'c'=>0 ], 'end' => [ 'r'=>24, 'c'=>21 ], 'labels'=> [ 'student_id', 'report_time', 'other', 'office', 'leaving-time', 'first_race'] ], [ 'start' => [ 'r'=>25, 'c'=>0 ], 'end' => [ 'r'=>30, 'c'=>21 ], 'labels'=> [ 'student_id', 'report_time', 'other', 'office', 'leaving-time', 'first_race'] ]]; foreach ($ranges as $range) { $results[] = getColumns($html, $range); } $results_by_date = getResultsByDate($results); function getResultsByDate($results) { $res = []; foreach ($results[0] as $kc => $date) { $res[$kc] = [ 'date' => $date['date'], 'students' => [] ]; } for ($i=1; $i<=4; $i++) { foreach ( $results[$i] as $kc => $sdata) { #if (ctype_digit($sdata[0]) || $sdata[0]=='N/A') { if (!empty($sdata['student_id']) && $sdata['student_id'] != ' ') { $res[$kc]['students'][] = $sdata; } } } // remove dates with no students $res = array_filter($res, function($v) { return !empty($v['students']); }); return $res; } function getColumns(&$html, $range) { $rows = []; $kr = 0; $p1 = 0; // find first row in out range for ($r=0; $r<=$range['start']['r']; $r++) { $p1 = strpos($html, '<tr', $p1); ++$p1; } $p1--; for ($kr=$range['start']['r']; $kr<=$range['end']['r']; $kr++) { $rows[$kr] = getCells($html, $range, $p1); $p1 = strpos($html, '<tr', $p1+1); } $cols = []; for ($kc=$range['start']['c']; $kc<=$range['end']['c']; $kc++) { $cols[] = array_combine($range['labels'], array_column($rows, $kc)); // ASSIGN THE LABELS TO THE DATA } return $cols; } function getCells(&$html, $range, $p1) { $cells = []; for ($kc=$range['start']['c']; $kc<=$range['end']['c']; $kc++) { $p1 = strpos($html, '<td', $p1+1); $p1 = strpos($html, '>', $p1+1); $p2 = strpos($html, '<td', $p1); $cells[$kc] = trim(strip_tags(substr($html, $p1+1, $p2-$p1-1))); } return $cells; } echo '<pre>' . print_r($results_by_date, 1) . '</pre>'; Gives Array ( [0] => Array ( [date] => May29Tue [students] => Array ( [0] => Array ( [student_id] => N/A [report_time] => [leaving-time] => [office] => [other] => [first_race] => ) ) ) [1] => Array ( [date] => May30Wed [students] => Array ( [0] => Array ( [student_id] => AAVL [report_time] => 06:00 [leaving-time] => 14:00 [office] => [other] => [first_race] => ) ) ) [2] => Array ( [date] => May31Thu [students] => Array ( [0] => Array ( [student_id] => 8751 [report_time] => 03:55 [leaving-time] => 04:55 [office] => WFH [other] => COMP AVL [first_race] => 08:00 ) [1] => Array ( [student_id] => 8752 [report_time] => 08:35 [other] => COMP AVL [office] => WFH [leaving-time] => 11:55 [first_race] => 12:25 ) ) ) ... etc ... or if you want the json output echo json_encode($results_by_date); [{"date":"May29Tue","students":[{"student_id":"N\/A","report_time":" ","leaving-time":" ","office":" ","other":" ","first_race":" "}]},{"date":"May30Wed","students":[{"student_id":"AAVL","report_time":"06:00","leaving-time":"14:00","office":" ","other":" ","first_race":" "}]},{"date":"May31Thu","students":[{"student_id":"8751","report_time":"03:55","leaving-time":"04:55","office":"WFH","other":"COMP AVL","first_race":"08:00"},{"student_id":"8752","report_time":"08:35","other":"COMP AVL","office":"WFH","leaving-time":"11:55","first_race":"12:25"}]},{"date":"Jun01Fri","students":[{"student_id":"8462","report_time":"04:30","leaving-time":"05:30","office":"WFH","other":"COMP NOT AVL","first_race":"07:10"},{"student_id":"8465","report_time":"07:45","other":"COMP NOT AVL","office":"WFH","leaving-time":"09:20","first_race":"09:50"}]},{"date":"Jun02Sat","students":[{"student_id":"CBSE","report_time":"02:00","leaving-time":"10:00","office":" ","other":" ","first_race":" "}]},{"date":"Jun03Sun","students":[{"student_id":"N\/A","report_time":" ","leaving-time":" ","office":" ","other":" ","first_race":" "}]},{"date":"Jun04Mon","students":[{"student_id":"N\/A","report_time":" ","leaving-time":" ","office":" ","other":" ","first_race":" "}]},{"date":"Jun05Tue","students":[{"student_id":"N\/A","report_time":" ","leaving-time":" ","office":" ","other":" ","first_race":" "}]},{"date":"Jun06Wed","students":[{"student_id":"N\/A","report_time":" ","leaving-time":" ","office":" ","other":" ","first_race":" "}]},{"date":"Jun07Thu","students":[{"student_id":"N\/A","report_time":" ","leaving-time":" ","office":" ","other":" ","first_race":" "}]},{"date":"Jun08Fri","students":[{"student_id":"8113","report_time":"05:05","leaving-time":"06:05","office":"WFH","other":"ZRH","first_race":"07:50"},{"student_id":"8114","report_time":"08:25","other":"ZRH","office":"WFH","leaving-time":"10:10","first_race":" "},{"student_id":"8277","report_time":"11:05","other":"WFH","office":"MAD","leaving-time":"13:40","first_race":"14:10"}]},{"date":"Jun09Sat","students":[{"student_id":"8274","report_time":"04:00","leaving-time":"05:00","office":"MAD","other":"WFH","first_race":"07:25"},{"student_id":"8221","report_time":"08:10","other":"WFH","office":"VLC","leaving-time":"10:30","first_race":" "},{"student_id":"8222","report_time":"11:05","other":"VLC","office":"WFH","leaving-time":"14:00","first_race":"14:30"}]},{"date":"Jun10Sun","students":[{"student_id":"N\/A","report_time":" ","leaving-time":" ","office":" ","other":" ","first_race":" "}]},{"date":"Jun11Mon","students":[{"student_id":"N\/A","report_time":" ","leaving-time":" ","office":" ","other":" ","first_race":" "}]},{"date":"Jun12Tue","students":[{"student_id":"AAVL","report_time":"05:15","leaving-time":"13:15","office":" ","other":" ","first_race":" "}]},{"date":"Jun13Wed","students":[{"student_id":"8973","report_time":"04:05","leaving-time":"05:05","office":"WFH","other":"SOF","first_race":"08:05"},{"student_id":"8974","report_time":"08:50","other":"SOF","office":"WFH","leaving-time":"12:10","first_race":"12:40"}]},{"date":"Jun14Thu","students":[{"student_id":"ADTY","report_time":"09:30","leaving-time":"16:30","office":" ","other":" ","first_race":" "}]},{"date":"Jun15Fri","students":[{"student_id":"8233","report_time":"12:25","leaving-time":"13:25","office":"WFH","other":"SSP","first_race":"15:40"},{"student_id":"8237","report_time":"16:10","other":"SSP","office":"WFH","leaving-time":"18:25","first_race":"18:55"}]},{"date":"Jun16Sat","students":[{"student_id":"N\/A","report_time":" ","leaving-time":" ","office":" ","other":" ","first_race":" "}]},{"date":"Jun17Sun","students":[{"student_id":"N\/A","report_time":" ","leaving-time":" ","office":" ","other":" ","first_race":" "}]},{"date":"Jun18Mon","students":[{"student_id":"807","report_time":"11:35","leaving-time":"12:35","office":"WFH","other":"OMV","first_race":"14:10"},{"student_id":"808","report_time":"14:35","other":"OMV","office":"WFH","leaving-time":"16:15","first_race":" "},{"student_id":"837","report_time":"16:50","other":"WFH","office":"BFS","leaving-time":"18:25","first_race":" "},{"student_id":"840","report_time":"18:55","other":"BFS","office":"WFH","leaving-time":"20:25","first_race":"20:55"}]},{"date":"Jun19Tue","students":[{"student_id":"8551","report_time":"10:50","leaving-time":"11:50","office":"WFH","other":"MJV","first_race":"14:30"},{"student_id":"8552","report_time":"15:00","other":" ","office":"WFH","leaving-time":"17:40","first_race":" "},{"student_id":"8187","report_time":"18:55","other":"WFH","office":"LIN","leaving-time":"20:50","first_race":"21:20"}]}] Quote Link to comment https://forums.phpfreaks.com/topic/313994-web-scraping-unstructured-html-table/page/2/#findComment-1591117 Share on other sites More sharing options...
Barand Posted October 15, 2021 Share Posted October 15, 2021 (edited) 1 hour ago, RohanH said: okay, and as per the random order of data in our table will it still work? As far as I can see, they appear consistent within each block. But there is no way of knowing - if the times were reversed, who could know which was which? The best processor for this table is a document shredder. Edited October 15, 2021 by Barand Quote Link to comment https://forums.phpfreaks.com/topic/313994-web-scraping-unstructured-html-table/page/2/#findComment-1591118 Share on other sites More sharing options...
RohanH Posted October 16, 2021 Author Share Posted October 16, 2021 7 hours ago, Barand said: [{"date":"May29Tue","students":[{"student_id":"N\/A","report_time":" ","leaving-time":" ","office":" ","other":" ","first_race":" "}]},{"date":"May30Wed","students":[{"student_id":"AAVL","report_time":"06:00","leaving-time":"14:00","office":" ","other":" ","first_race":" "}]},{"date":"May31Thu","students":[{"student_id":"8751","report_time":"03:55","leaving-time":"04:55","office":"WFH","other":"COMP AVL","first_race":"08:00"},{"student_id":"8752","report_time":"08:35","other":"COMP AVL","office":"WFH","leaving-time":"11:55","first_race":"12:25"}]},{"date":"Jun01Fri","students":[{"student_id":"8462","report_time":"04:30","leaving-time":"05:30","office":"WFH","other":"COMP NOT AVL","first_race":"07:10"},{"student_id":"8465","report_time":"07:45","other":"COMP NOT AVL","office":"WFH","leaving-time":"09:20","first_race":"09:50"}]},{"date":"Jun02Sat","students":[{"student_id":"CBSE","report_time":"02:00","leaving-time":"10:00","office":" ","other":" ","first_race":" "}]},{"date":"Jun03Sun","students":[{"student_id":"N\/A","report_time":" ","leaving-time":" ","office":" ","other":" ","first_race":" "}]},{"date":"Jun04Mon","students":[{"student_id":"N\/A","report_time":" ","leaving-time":" ","office":" ","other":" ","first_race":" "}]},{"date":"Jun05Tue","students":[{"student_id":"N\/A","report_time":" ","leaving-time":" ","office":" ","other":" ","first_race":" "}]},{"date":"Jun06Wed","students":[{"student_id":"N\/A","report_time":" ","leaving-time":" ","office":" ","other":" ","first_race":" "}]},{"date":"Jun07Thu","students":[{"student_id":"N\/A","report_time":" ","leaving-time":" ","office":" ","other":" ","first_race":" "}]},{"date":"Jun08Fri","students":[{"student_id":"8113","report_time":"05:05","leaving-time":"06:05","office":"WFH","other":"ZRH","first_race":"07:50"},{"student_id":"8114","report_time":"08:25","other":"ZRH","office":"WFH","leaving-time":"10:10","first_race":" "},{"student_id":"8277","report_time":"11:05","other":"WFH","office":"MAD","leaving-time":"13:40","first_race":"14:10"}]},{"date":"Jun09Sat","students":[{"student_id":"8274","report_time":"04:00","leaving-time":"05:00","office":"MAD","other":"WFH","first_race":"07:25"},{"student_id":"8221","report_time":"08:10","other":"WFH","office":"VLC","leaving-time":"10:30","first_race":" "},{"student_id":"8222","report_time":"11:05","other":"VLC","office":"WFH","leaving-time":"14:00","first_race":"14:30"}]},{"date":"Jun10Sun","students":[{"student_id":"N\/A","report_time":" ","leaving-time":" ","office":" ","other":" ","first_race":" "}]},{"date":"Jun11Mon","students":[{"student_id":"N\/A","report_time":" ","leaving-time":" ","office":" ","other":" ","first_race":" "}]},{"date":"Jun12Tue","students":[{"student_id":"AAVL","report_time":"05:15","leaving-time":"13:15","office":" ","other":" ","first_race":" "}]},{"date":"Jun13Wed","students":[{"student_id":"8973","report_time":"04:05","leaving-time":"05:05","office":"WFH","other":"SOF","first_race":"08:05"},{"student_id":"8974","report_time":"08:50","other":"SOF","office":"WFH","leaving-time":"12:10","first_race":"12:40"}]},{"date":"Jun14Thu","students":[{"student_id":"ADTY","report_time":"09:30","leaving-time":"16:30","office":" ","other":" ","first_race":" "}]},{"date":"Jun15Fri","students":[{"student_id":"8233","report_time":"12:25","leaving-time":"13:25","office":"WFH","other":"SSP","first_race":"15:40"},{"student_id":"8237","report_time":"16:10","other":"SSP","office":"WFH","leaving-time":"18:25","first_race":"18:55"}]},{"date":"Jun16Sat","students":[{"student_id":"N\/A","report_time":" ","leaving-time":" ","office":" ","other":" ","first_race":" "}]},{"date":"Jun17Sun","students":[{"student_id":"N\/A","report_time":" ","leaving-time":" ","office":" ","other":" ","first_race":" "}]},{"date":"Jun18Mon","students":[{"student_id":"807","report_time":"11:35","leaving-time":"12:35","office":"WFH","other":"OMV","first_race":"14:10"},{"student_id":"808","report_time":"14:35","other":"OMV","office":"WFH","leaving-time":"16:15","first_race":" "},{"student_id":"837","report_time":"16:50","other":"WFH","office":"BFS","leaving-time":"18:25","first_race":" "},{"student_id":"840","report_time":"18:55","other":"BFS","office":"WFH","leaving-time":"20:25","first_race":"20:55"}]},{"date":"Jun19Tue","students":[{"student_id":"8551","report_time":"10:50","leaving-time":"11:50","office":"WFH","other":"MJV","first_race":"14:30"},{"student_id":"8552","report_time":"15:00","other":" ","office":"WFH","leaving-time":"17:40","first_race":" "},{"student_id":"8187","report_time":"18:55","other":"WFH","office":"LIN","leaving-time":"20:50","first_race":"21:20"}]}] And again, if we want to test this json? Quote Link to comment https://forums.phpfreaks.com/topic/313994-web-scraping-unstructured-html-table/page/2/#findComment-1591126 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.