kpn320 Posted November 27, 2009 Share Posted November 27, 2009 The following code gives the expected results: <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> <title>Generate Description and Tags</title> </head> <body> <?php include 'simple_html_dom.php'; $entityURL = "http://google.com"; echo "Entity URL: ", $entityURL,"<br>"; $html = file_get_html($entityURL); $tot_html_str = $html->save(); $tot_html_strlen = strlen($tot_html_str); echo "Strlen of total HTML: ",$tot_html_strlen,"<br>"; ?> </body> </html> results in: Entity URL: http://google.com Strlen of total HTML: 7403 However, if I read the URL from a file, I am not able to scrape the HTML: <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> <title>Generate Description and Tags</title> </head> <body> <?php include 'simple_html_dom.php'; $myFile = "/home/prakash/foo.txt"; $src_handle = fopen($myFile, "r"); $entityURL = fgets($src_handle, 4096); echo "Entity URL: ", $entityURL,"<br>"; $html = file_get_html($entityURL); $tot_html_str = $html->save(); $tot_html_strlen = strlen($tot_html_str); echo "Strlen of total HTML: ",$tot_html_strlen,"<br>"; ?> </body> </html> results in: Entity URL: http://google.com Strlen of total HTML: 0 What do I need to do differently to read the URL from a file. Quote Link to comment https://forums.phpfreaks.com/topic/183153-help-with-file_get_html/ Share on other sites More sharing options...
plznty Posted November 27, 2009 Share Posted November 27, 2009 Can you not just use file_get_contents("http://www.google.com"); to get the html code? [EDIT] I think im misunderstanding actually. DW Quote Link to comment https://forums.phpfreaks.com/topic/183153-help-with-file_get_html/#findComment-966606 Share on other sites More sharing options...
kpn320 Posted November 27, 2009 Author Share Posted November 27, 2009 Even when I change to use file_get_contents (instead of file_get_html), the result is the same. Quote Link to comment https://forums.phpfreaks.com/topic/183153-help-with-file_get_html/#findComment-966656 Share on other sites More sharing options...
Alex Posted November 27, 2009 Share Posted November 27, 2009 file_get_contents will only work in that context if fopen_wrappers is enabled. A better alternative would be cURL Edit: Wait.. I misunderstood your problem.. Your problem might be caused by whitespace returned by $entityURL = fgets($src_handle, 4096);. Try $entityURL = trim(fgets($src_handle, 4096)); Quote Link to comment https://forums.phpfreaks.com/topic/183153-help-with-file_get_html/#findComment-966659 Share on other sites More sharing options...
kpn320 Posted November 27, 2009 Author Share Posted November 27, 2009 That was it. Thanks, AlexWD. Quote Link to comment https://forums.phpfreaks.com/topic/183153-help-with-file_get_html/#findComment-966682 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.