muppet77 Posted March 2, 2015 Share Posted March 2, 2015 My script tells me that there is not enough memory when I try to interrogate a webpage. I run a loop on the page to get certain sections. I have extended my php5.ini file so that the memory is 512m and this seemed to allow for a few more loops, but not many. I then tried obs flushes which I believe may be redundant. Anyway, would writing the output to a file help at all rather than echoing to a browser? Thank you. Quote Link to comment Share on other sites More sharing options...
mac_gyver Posted March 2, 2015 Share Posted March 2, 2015 your program likely has a logic error in it, so that it will consume all available memory, no matter how much you make available. 512M bytes of memory is a substantial amount of memory. you would need to debug what the code is doing in order to find the problem. if you want us to help, you would need to post the code needed to reproduce the problem, less any sort of private values it may contain. 1 Quote Link to comment Share on other sites More sharing options...
Psycho Posted March 2, 2015 Share Posted March 2, 2015 Without knowing how your code is structured, it's kind of hard to give any definitive answers. I suspect you are creating new variables (or array indexes) or appending to variables on each iteration. If you define a value on an iteration of the loop and don't need it on subsequent loops, then unset it. Writing to a file could help - depending on what you are actually doing. But, I have a hard time believing that the data on an external page is so large that it would cause this problem. You likely have some unnecessary inefficiency in your process that is the root of the problem. 1 Quote Link to comment Share on other sites More sharing options...
muppet77 Posted March 2, 2015 Author Share Posted March 2, 2015 Ok thanks. I will post it up and desensitise the content. Thank you. Quote Link to comment Share on other sites More sharing options...
muppet77 Posted March 2, 2015 Author Share Posted March 2, 2015 (edited) <?php $id=755303; // this is a unique number that helps to make up the url $maxnumber = 5; // number of loops or web pages to look at $count=1; // a way to count how many loops completed $url = "??????"; // removed for now $username="??????????"; // removed for now $password = "????????"; // removed for now $data_to_send = "username=".$username."&password=".$password; include "simple_html_dom.php"; $cookie_jar = tempnam('/tmp','cookie'); $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch,CURLOPT_POST, 2); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch,CURLOPT_POSTFIELDS, $data_to_send); curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13'); curl_setopt( $ch, CURLOPT_COOKIEJAR, $cookie_jar ); curl_setopt( $ch, CURLOPT_COOKIEFILE, $cookie_jar ); $output = curl_exec($ch); $data= array(); while($count <= $maxnumber) { $url="?????????????????????????????".$id."/"; // removed for now curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13'); $output = curl_exec($ch); $html = str_get_html($output); $find = strpos($html, "some text to find "); $trim = substr($html,$find+18, 4); echo $trim; echo "</br>"; $count++; $id++; $data[] =$html; usleep(1000000); } ?> so the code logs into a site, goes to the url which is a page with data on it. It looks for "some text to find" and then echoes this. It then loops to the next id or page and does the same. It gives a memory error when it has to do more than 12 loops. It never does 13. any suggestions of sharpening it up please? Edited March 2, 2015 by muppet77 Quote Link to comment Share on other sites More sharing options...
mac_gyver Posted March 2, 2015 Share Posted March 2, 2015 the simple html dom object is huge, even for a small amount of html, like the str_get_html('<html><body>Hello!</body></html>') example in the documentation (don't know if this is a bug or intentional.) edit: the str_get_html example from the documentation uses 11K bytes of memory. creating the html dom object for the thread on this forum that we are looking at, for a non-logged in visitor, uses 2M+ (2,416,456) bytes of memory. you need to extract out the data you need inside of the loop and only store the data, not the simple html dom object itself. 1 Quote Link to comment Share on other sites More sharing options...
ginerjm Posted March 2, 2015 Share Posted March 2, 2015 First let me say I have never used curl and don't fully understand it. Now - how your script operates. 1 - you setup and execute a curl call but you don't do anything with the returned value(s). 2 - you then start a loop to do more curl calls 3 - with each curl call you do some unknown function (I can't find it) named str_get_html 4 - with the results of this function you do a search and then you attempt to handle the contents found by this search. BAD CODE! What if there was nothing found? 5 - you append the just-read html (from the unknown function call) to an array as a new element. Is $data the reason for you memory overload? Since count starts at 1 and maxnumber is set to 5, you should only do this 5 times. Where is 12 coming into play as you mentioned? Quote Link to comment Share on other sites More sharing options...
muppet77 Posted March 2, 2015 Author Share Posted March 2, 2015 Ok thanks. Despite this code I got help writing it. Please can you suggest what I need to change? Thank you. Quote Link to comment Share on other sites More sharing options...
muppet77 Posted March 2, 2015 Author Share Posted March 2, 2015 I alter the 5 to 12 in the code. 5 is what I just left it at when copying to this thread. Any suggestions in newbie language? Thank you. Quote Link to comment Share on other sites More sharing options...
ginerjm Posted March 2, 2015 Share Posted March 2, 2015 You got someone to write this for you? So - you just wanted to browse a bunch of websites and find some data in each of them and you got someone to write it for you cause you couldn't. And now you want US to solve your problem. Hmmm... Seems like you should get the author to help you out with this. You guys dreamed this up, not us. Or you could listen to mac_gyver and make his suggested changes Quote Link to comment Share on other sites More sharing options...
muppet77 Posted March 2, 2015 Author Share Posted March 2, 2015 Really sorry but yes I did get help. I am just a beginner and am after some help. I don't really understand Mac's post fully. I assume he means just delete that line? Apart from that, I'm in the dark. Sorry, you seem annoyed ! Not my intention to annoy you! Quote Link to comment Share on other sites More sharing options...
ginerjm Posted March 2, 2015 Share Posted March 2, 2015 Do you know why you chose such a tricky meaningless project to start to learn php and programming in general? Beginners usually start with easier tasks. Quote Link to comment Share on other sites More sharing options...
muppet77 Posted March 2, 2015 Author Share Posted March 2, 2015 That's a bit harsh! In what way is it meaningless please? Quote Link to comment Share on other sites More sharing options...
ginerjm Posted March 2, 2015 Share Posted March 2, 2015 You appear to know nothing about php so why choose such a hard project? One usually chooses projects (as I said) that give you a chance to learn with less complex goals. Might I ask what you are searching for in these places you are using curl to extract? Quote Link to comment Share on other sites More sharing options...
muppet77 Posted March 2, 2015 Author Share Posted March 2, 2015 I am afraid that tricky doesn't make it meaningless? Anyway thank you for your advice, sorry to annoy you. Quote Link to comment Share on other sites More sharing options...
ginerjm Posted March 2, 2015 Share Posted March 2, 2015 Good luck with your chosen learning curve. You're trying to read at a 12th grade level but have already stated you are a newcomer, aka, a first grader. Quote Link to comment Share on other sites More sharing options...
muppet77 Posted March 2, 2015 Author Share Posted March 2, 2015 ginergm, i understand and respect your opinion. Quote Link to comment Share on other sites More sharing options...
Psycho Posted March 2, 2015 Share Posted March 2, 2015 As mac_gyver stated,t eh problem is the fact that you are saving the entire results from str_get_html() here $data[] = $html; That is a HUGE object of data. You need to parse the results of $html to the data you need and save that to the array. Quote Link to comment Share on other sites More sharing options...
muppet77 Posted March 3, 2015 Author Share Posted March 3, 2015 (edited) As mac_gyver stated,t eh problem is the fact that you are saving the entire results from str_get_html() here $data[] = $html; That is a HUGE object of data. You need to parse the results of $html to the data you need and save that to the array. ok, i have deleted this line and also $html = str_get_html($output); and include "simple_html_dom.php"; and usleep and $data= array(); have ALL gone. the code still works and chugs aong doing 23 loops rather than the original 12. 24 loops is too many. Are there any other ways to allow it to move more efficiently please? currently it is looking like: <?php $id=755303; // this is a unique number that helps to make up the url $maxnumber = 23; // number of loops or web pages to look at $count=1; // a way to count how many loops completed $url = "??????"; // removed for now $username="??????????"; // removed for now $password = "????????"; // removed for now $data_to_send = "username=".$username."&password=".$password; $cookie_jar = tempnam('/tmp','cookie'); $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch,CURLOPT_POST, 2); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch,CURLOPT_POSTFIELDS, $data_to_send); curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13'); curl_setopt( $ch, CURLOPT_COOKIEJAR, $cookie_jar ); curl_setopt( $ch, CURLOPT_COOKIEFILE, $cookie_jar ); $output = curl_exec($ch); while($count <= $maxnumber) { $url="?????????????????????????????".$id."/"; // removed for now curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13'); $output = curl_exec($ch); $find = strpos($html, "some text to find "); $trim = substr($html,$find+18, 4); echo $trim; echo "</br>"; $count++; $id++; } ?> many thanks Edited March 3, 2015 by muppet77 Quote Link to comment Share on other sites More sharing options...
muppet77 Posted March 5, 2015 Author Share Posted March 5, 2015 Does anyone else have any suggestions? I need to do the first curl to log in and then the second curl inside the loop gets different pages as the url changes each loop. I think that it needs to be in the loop. Thank you. Quote Link to comment Share on other sites More sharing options...
mac_gyver Posted March 5, 2015 Share Posted March 5, 2015 have ALL gone. the code still works and chugs aong doing 23 loops rather than the original 12. 24 loops is too many. you didn't state what sort of problem occurs at that point. is it a memory error or a timeout/maximum execution error? how many id's do you need to retrieve information for? does the site/api you are reading have a way of getting data for multiple id's in one request? Quote Link to comment Share on other sites More sharing options...
muppet77 Posted March 5, 2015 Author Share Posted March 5, 2015 (edited) Hi. It is an internal server error code 500. I can get 20 or so ids done but ideally I'd like to do 30,40,50++ Edited March 5, 2015 by muppet77 Quote Link to comment Share on other sites More sharing options...
muppet77 Posted March 5, 2015 Author Share Posted March 5, 2015 And no I don't think multi requests are possible but I will check. Assume no if I don't post. Quote Link to comment Share on other sites More sharing options...
CroNiX Posted March 6, 2015 Share Posted March 6, 2015 Did you check your server and php error logs? Quote Link to comment Share on other sites More sharing options...
muppet77 Posted March 6, 2015 Author Share Posted March 6, 2015 No I havent. Where abouts would they be and what would they be called in Godaddy? Thanks. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.