dunkhippo33 Posted June 14, 2007 Share Posted June 14, 2007 Hi everyone, I'm a beginner at at PHP and would appreciate any advice. In trying to learn PHP, I've been trying some scripts that read the source code of webpages and save them to text files. While I have gotten various example scripts that I've found on the web to work some of the time, I've noticed that none of the scripts I've tried work properly all of the time. One example: <html> <head><title>Test</title></head> <BODY> <? $myFile = "test.txt"; $fh = fopen($myFile, 'w') or die("can't open file"); $fextpg = fopen("http://www.yahoo.com", "r"); if ($fextpg) { while (!feof($fextpg)) { $buffer = fgets($fextpg, 4096); echo $buffer; fwrite($fh, $buffer); } fclose($fextpg); } fclose($fh); ?> </body> </html> In this script, I've used someone else's example code to take in each line of the source code at yahoo.com and save to test.txt. However, if you look at the webpage that you get after copying each line of yahoo's source code, you get a very different Yahoo! page than the "real" yahoo.com main page. However, when I try other websites such as google.com, this script works perfectly. Are there some sites that block functions such as fgets? Or is this script not reading in every line of yahoo's source code page? Any help would be much appreciated! Thanks so much! Best, Elizabeth Quote Link to comment https://forums.phpfreaks.com/topic/55618-solved-difficulty-with-fgets-do-some-websites-block-fgets-from-working-properly/ Share on other sites More sharing options...
Full-Demon Posted June 14, 2007 Share Posted June 14, 2007 They cant block it, you just read the HTML output of the server, just as your browser does. fgets($fextpg, 4096); 4096, perhaps some lines are longer? FD Quote Link to comment https://forums.phpfreaks.com/topic/55618-solved-difficulty-with-fgets-do-some-websites-block-fgets-from-working-properly/#findComment-274833 Share on other sites More sharing options...
dunkhippo33 Posted June 19, 2007 Author Share Posted June 19, 2007 Thanks, FD. Unfortunately, even after increasing this number significantly, there is still a parsing problem. In particular, pages with huge amounts of javascript or css seem to be problems--the javascript and css just simply get cut out after the page has been parsed! I don't know why this is, because I thought fgets would work for all strings. However, interestingly, if I save the source code of a webpage locally, regardless of how much js or css these pages have, parsing these saved pages works just fine. What is the difference between parsing pages locally and remotely? Thanks! Elizabeth Quote Link to comment https://forums.phpfreaks.com/topic/55618-solved-difficulty-with-fgets-do-some-websites-block-fgets-from-working-properly/#findComment-277926 Share on other sites More sharing options...
GingerRobot Posted June 19, 2007 Share Posted June 19, 2007 I would guess that the problem is that the content of yahoo's pages depends significantly on the information it gathers from the user. For instance, i would expect that yahoo can be accessed from a mobile phone, but it would look very differant. Yahoo is probably getting information including browser, OS etc to configure the page for optimal display. I dont think that using fgets will be giving yahoo any of this information, so i would expect you get the cut down version - perhaps something similar to what would be displayed on a mobile phone. This also makes sense as to why it works when you try it with the locally saved copy. The HTML, CSS and javascript has all already been created in your saved page. I wonder if you would achieve better results using cURL - as you can pass many of the things that it might want ( i seem to remember you can send along a user agent for instance) Try looking into the uses of curl in php. Quote Link to comment https://forums.phpfreaks.com/topic/55618-solved-difficulty-with-fgets-do-some-websites-block-fgets-from-working-properly/#findComment-277934 Share on other sites More sharing options...
dunkhippo33 Posted June 19, 2007 Author Share Posted June 19, 2007 Awesome, Ben! You were right, and we were able to solve the prob. Thanks! Elizabeth Quote Link to comment https://forums.phpfreaks.com/topic/55618-solved-difficulty-with-fgets-do-some-websites-block-fgets-from-working-properly/#findComment-278022 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.