Jump to content

Archived

This topic is now archived and is closed to further replies.

kpn320

Help with file_get_html

Recommended Posts

The following code gives the expected results:

 

<html>

    <head>

        <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

        <title>Generate Description and Tags</title>

    </head>

    <body>

        <?php

        include 'simple_html_dom.php';

        $entityURL = "http://google.com";

        echo "Entity URL: ", $entityURL,"<br>";

        $html = file_get_html($entityURL);

        $tot_html_str = $html->save();

        $tot_html_strlen = strlen($tot_html_str);

        echo "Strlen of total HTML: ",$tot_html_strlen,"<br>";

        ?>

    </body>

</html>

 

results in:

 

Entity URL: http://google.com

Strlen of total HTML: 7403

 

However, if I read the URL from a file, I am not able to scrape the HTML:

 

<html>

    <head>

        <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

        <title>Generate Description and Tags</title>

    </head>

    <body>

        <?php

        include 'simple_html_dom.php';

        $myFile = "/home/prakash/foo.txt";

        $src_handle = fopen($myFile, "r");

        $entityURL = fgets($src_handle, 4096);

        echo "Entity URL: ", $entityURL,"<br>";

        $html = file_get_html($entityURL);

        $tot_html_str = $html->save();

        $tot_html_strlen = strlen($tot_html_str);

        echo "Strlen of total HTML: ",$tot_html_strlen,"<br>";

        ?>

    </body>

</html>

 

results in:

 

Entity URL: http://google.com

Strlen of total HTML: 0

 

What do I need to do differently to read the URL from a file.

Share this post


Link to post
Share on other sites

Can you not just use

file_get_contents("http://www.google.com");

to get the html code?

[EDIT]

I think im misunderstanding actually. DW

Share this post


Link to post
Share on other sites

Even when I change to use file_get_contents (instead of file_get_html), the result is the same.

Share this post


Link to post
Share on other sites

file_get_contents will only work in that context if fopen_wrappers is enabled. A better alternative would be cURL

 

Edit: Wait.. I misunderstood your problem..

 

Your problem might be caused by whitespace returned by $entityURL = fgets($src_handle, 4096);. Try $entityURL = trim(fgets($src_handle, 4096));

Share this post


Link to post
Share on other sites

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.