Jump to content

Web Scraping - Unstructured html table


RohanH
Go to solution Solved by Barand,

Recommended Posts

One way to test it would be to work backwards from the JSON. Decode the JSON string and recreate a table from its data so you can visually compare the results with a display or print of the original. It would be too tricky to recreate something as bad as the original so go for something simpler, say

image.png.f02e3d6303861bb18af2b0dfb5f06ceb.png

If you want to test it in CLI mode, then, unless you want the extra fun of drawing grids on the screen, you could go for a simple column display, such as

image.png.bf78b6fd3991e70724b1a774a17029ac.png

Link to comment
Share on other sites

54 minutes ago, Barand said:

you could go for a simple column display

how we do so using assertions or it will be simple try and catch (I mean to ask is how to we realize that the test failed)?
Another question is to run the test do we create the test function (using the foreach loops and print the column value?) and then call the file like

Quote

php test.php

?

Link to comment
Share on other sites

52 minutes ago, RohanH said:

(I mean to ask is how to we realize that the test failed)?

It fails if the output values from the the new test script don't match the original table's values. For example, the output from my test script, built from the JSON data, looks like this (which can be easily compared with the original) ...

image.thumb.png.8e2f83dff6b75f8a2be670b8adce7d9b.png

You aren't compelled to use my method. Feel free to think up your own.

Link to comment
Share on other sites

4 hours ago, Barand said:

simple column display

I want to display the column in the cli but I fail to do so! 
This is what I tried to print in cli.
 

foreach($results_by_date as $name => $work)
{
    foreach($work as $kdate => $date)
    {
        if($kdate == 'date'){
            echo $date. "\n";
        }else{
            foreach($date as $kdata => $date_data){
                foreach($date_data as $kdatakey => $data){
                    echo " " . $data. "\n";   
                }
            }
        }
    }
}

 

Link to comment
Share on other sites

  • Solution

The data looks OK. It could do with some separation between the blocks of student data to make it easier to read. Perhaps...

foreach($results_by_date as $name => $work)
{
    foreach($work as $kdate => $date)
    {
        
        if($kdate == 'date'){
            echo str_repeat('-', 20)."\n" ;
            echo $date. "\n\n";
        }else{
            foreach($date as $kdata => $date_data){
                foreach($date_data as $kdatakey => $data){
                    echo " " . $data. "\n";   
                }
                echo " \n" ;
            }
        }
    }
}  

FYI, this is my version for cli output

// JSON from previous script saved in file
$results = json_decode(file_get_contents('c:/inetpub/wwwroot/test/doc1355/rohan.json'), 1);

$pad = str_repeat(' ', 8);
$divs = $pad . str_repeat('-', 12) . PHP_EOL;
$divd = str_repeat('-', 20) . PHP_EOL;


echo '<pre>';  // test only - not required in CLI mode

foreach ($results as $day) {
    echo $day['date'] . PHP_EOL . PHP_EOL;
    foreach ($day['students'] as $k => $sdata) {
        if ($k) echo $divs;
        foreach ($sdata as $val) {
            echo $pad . $val . PHP_EOL;
        }
    }
    echo $divd;
}

 

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.