kpn320 Posted November 27, 2009 Share Posted November 27, 2009 The following code gives the expected results: <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> <title>Generate Description and Tags</title> </head> <body> <?php include 'simple_html_dom.php'; $entityURL = "http://google.com"; echo "Entity URL: ", $entityURL,"<br>"; $html = file_get_html($entityURL); $tot_html_str = $html->save(); $tot_html_strlen = strlen($tot_html_str); echo "Strlen of total HTML: ",$tot_html_strlen,"<br>"; ?> </body> </html> results in: Entity URL: http://google.com Strlen of total HTML: 7403 However, if I read the URL from a file, I am not able to scrape the HTML: <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> <title>Generate Description and Tags</title> </head> <body> <?php include 'simple_html_dom.php'; $myFile = "/home/prakash/foo.txt"; $src_handle = fopen($myFile, "r"); $entityURL = fgets($src_handle, 4096); echo "Entity URL: ", $entityURL,"<br>"; $html = file_get_html($entityURL); $tot_html_str = $html->save(); $tot_html_strlen = strlen($tot_html_str); echo "Strlen of total HTML: ",$tot_html_strlen,"<br>"; ?> </body> </html> results in: Entity URL: http://google.com Strlen of total HTML: 0 What do I need to do differently to read the URL from a file. Link to comment https://forums.phpfreaks.com/topic/183153-help-with-file_get_html/ Share on other sites More sharing options...
plznty Posted November 27, 2009 Share Posted November 27, 2009 Can you not just use file_get_contents("http://www.google.com"); to get the html code? [EDIT] I think im misunderstanding actually. DW Link to comment https://forums.phpfreaks.com/topic/183153-help-with-file_get_html/#findComment-966606 Share on other sites More sharing options...
kpn320 Posted November 27, 2009 Author Share Posted November 27, 2009 Even when I change to use file_get_contents (instead of file_get_html), the result is the same. Link to comment https://forums.phpfreaks.com/topic/183153-help-with-file_get_html/#findComment-966656 Share on other sites More sharing options...
Alex Posted November 27, 2009 Share Posted November 27, 2009 file_get_contents will only work in that context if fopen_wrappers is enabled. A better alternative would be cURL Edit: Wait.. I misunderstood your problem.. Your problem might be caused by whitespace returned by $entityURL = fgets($src_handle, 4096);. Try $entityURL = trim(fgets($src_handle, 4096)); Link to comment https://forums.phpfreaks.com/topic/183153-help-with-file_get_html/#findComment-966659 Share on other sites More sharing options...
kpn320 Posted November 27, 2009 Author Share Posted November 27, 2009 That was it. Thanks, AlexWD. Link to comment https://forums.phpfreaks.com/topic/183153-help-with-file_get_html/#findComment-966682 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.