Save memory ideas

muppet77 · March 2, 2015

My script tells me that there is not enough memory when I try to interrogate a webpage.

I run a loop on the page to get certain sections.

I have extended my php5.ini file so that the memory is 512m and this seemed to allow for a few more loops, but not many.

I then tried obs flushes which I believe may be redundant.

Anyway, would writing the output to a file help at all rather than echoing to a browser?

Thank you.

mac_gyver · March 2, 2015

your program likely has a logic error in it, so that it will consume all available memory, no matter how much you make available. 512M bytes of memory is a substantial amount of memory. you would need to debug what the code is doing in order to find the problem. if you want us to help, you would need to post the code needed to reproduce the problem, less any sort of private values it may contain.

Psycho · March 2, 2015

Without knowing how your code is structured, it's kind of hard to give any definitive answers. I suspect you are creating new variables (or array indexes) or appending to variables on each iteration. If you define a value on an iteration of the loop and don't need it on subsequent loops, then unset it. Writing to a file could help - depending on what you are actually doing. But, I have a hard time believing that the data on an external page is so large that it would cause this problem. You likely have some unnecessary inefficiency in your process that is the root of the problem.

muppet77 · March 2, 2015

Ok thanks. I will post it up and desensitise the content.

Thank you.

muppet77 · March 2, 2015

<?php

$id=755303; // this is a unique number that helps to make up the url
$maxnumber = 5; // number of loops or web pages to look at
$count=1; // a way to count how many loops completed

$url = "??????"; // removed for now
$username="??????????"; // removed for now
$password = "????????"; // removed for now
$data_to_send = "username=".$username."&password=".$password;

include "simple_html_dom.php";

$cookie_jar = tempnam('/tmp','cookie');
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch,CURLOPT_POST, 2);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
curl_setopt($ch,CURLOPT_POSTFIELDS, $data_to_send);
curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
curl_setopt( $ch, CURLOPT_COOKIEJAR,  $cookie_jar );
curl_setopt( $ch, CURLOPT_COOKIEFILE, $cookie_jar );
$output = curl_exec($ch);

	$data= array();

while($count <= $maxnumber) {
     
		$url="?????????????????????????????".$id."/"; // removed for now
		curl_setopt($ch, CURLOPT_URL, $url);
		curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
		$output = curl_exec($ch);
		$html = str_get_html($output);

$find = strpos($html, "some text to find ");
$trim = substr($html,$find+18, 4);
echo $trim;

echo "</br>";
$count++;
$id++;

		$data[] =$html;

		usleep(1000000);
       	
}
?>

so the code logs into a site, goes to the url which is a page with data on it. It looks for "some text to find" and then echoes this. It then loops to the next id or page and does the same. It gives a memory error when it has to do more than 12 loops. It never does 13.

any suggestions of sharpening it up please?

mac_gyver · March 2, 2015

the simple html dom object is huge, even for a small amount of html, like the str_get_html('<html><body>Hello!</body></html>') example in the documentation (don't know if this is a bug or intentional.)

edit: the str_get_html example from the documentation uses 11K bytes of memory. creating the html dom object for the thread on this forum that we are looking at, for a non-logged in visitor, uses 2M+ (2,416,456) bytes of memory.

you need to extract out the data you need inside of the loop and only store the data, not the simple html dom object itself.

ginerjm · March 2, 2015

First let me say I have never used curl and don't fully understand it.

Now - how your script operates.

1 - you setup and execute a curl call but you don't do anything with the returned value(s).

2 - you then start a loop to do more curl calls

3 - with each curl call you do some unknown function (I can't find it) named str_get_html

4 - with the results of this function you do a search and then you attempt to handle the contents found by this search. BAD CODE! What if there was nothing found?

5 - you append the just-read html (from the unknown function call) to an array as a new element. Is $data the reason for you memory overload?

Since count starts at 1 and maxnumber is set to 5, you should only do this 5 times. Where is 12 coming into play as you mentioned?

muppet77 · March 2, 2015

Ok thanks. Despite this code I got help writing it.

Please can you suggest what I need to change?

Thank you.

muppet77 · March 2, 2015

I alter the 5 to 12 in the code.

5 is what I just left it at when copying to this thread.

Any suggestions in newbie language?

Thank you.

ginerjm · March 2, 2015

You got someone to write this for you? So - you just wanted to browse a bunch of websites and find some data in each of them and you got someone to write it for you cause you couldn't. And now you want US to solve your problem.

Hmmm... Seems like you should get the author to help you out with this. You guys dreamed this up, not us.

Or you could listen to mac_gyver and make his suggested changes

muppet77 · March 2, 2015

Really sorry but yes I did get help. I am just a beginner and am after some help.

I don't really understand Mac's post fully. I assume he means just delete that line?

Apart from that, I'm in the dark. Sorry, you seem annoyed ! Not my intention to annoy you!

ginerjm · March 2, 2015

Do you know why you chose such a tricky meaningless project to start to learn php and programming in general? Beginners usually start with easier tasks.

muppet77 · March 2, 2015

That's a bit harsh! In what way is it meaningless please?

ginerjm · March 2, 2015

You appear to know nothing about php so why choose such a hard project? One usually chooses projects (as I said) that give you a chance to learn with less complex goals.

Might I ask what you are searching for in these places you are using curl to extract?

muppet77 · March 2, 2015

I am afraid that tricky doesn't make it meaningless?

Anyway thank you for your advice, sorry to annoy you.

ginerjm · March 2, 2015

Good luck with your chosen learning curve. You're trying to read at a 12th grade level but have already stated you are a newcomer, aka, a first grader.

muppet77 · March 2, 2015

ginergm, i understand and respect your opinion.

Psycho · March 2, 2015

As mac_gyver stated,t eh problem is the fact that you are saving the entire results from str_get_html() here

$data[] = $html;

That is a HUGE object of data. You need to parse the results of $html to the data you need and save that to the array.

muppet77 · March 3, 2015

As mac_gyver stated,t eh problem is the fact that you are saving the entire results from str_get_html() here
$data[] = $html;
That is a HUGE object of data. You need to parse the results of $html to the data you need and save that to the array.

ok, i have deleted this line and also

$html = str_get_html($output);

and

include "simple_html_dom.php";

and

usleep

and

$data= array();

have ALL gone. the code still works and chugs aong doing 23 loops rather than the original 12. 24 loops is too many.

Are there any other ways to allow it to move more efficiently please?

currently it is looking like:

    <?php
     
    $id=755303; // this is a unique number that helps to make up the url
    $maxnumber = 23; // number of loops or web pages to look at
    $count=1; // a way to count how many loops completed
     
    $url = "??????"; // removed for now
    $username="??????????"; // removed for now
    $password = "????????"; // removed for now
    $data_to_send = "username=".$username."&password=".$password;
     
  
    $cookie_jar = tempnam('/tmp','cookie');
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch,CURLOPT_POST, 2);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch,CURLOPT_POSTFIELDS, $data_to_send);
    curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
    curl_setopt( $ch, CURLOPT_COOKIEJAR, $cookie_jar );
    curl_setopt( $ch, CURLOPT_COOKIEFILE, $cookie_jar );
    $output = curl_exec($ch);
        
    while($count <= $maxnumber) {
    $url="?????????????????????????????".$id."/"; // removed for now
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
    $output = curl_exec($ch);
         
    $find = strpos($html, "some text to find ");
    $trim = substr($html,$find+18, 4);
    echo $trim;
     
    echo "</br>";
    $count++;
    $id++;
         
    }
    ?>

many thanks

muppet77 · March 5, 2015

Does anyone else have any suggestions?

I need to do the first curl to log in and then the second curl inside the loop gets different pages as the url changes each loop.

I think that it needs to be in the loop.

Thank you.

mac_gyver · March 5, 2015

have ALL gone. the code still works and chugs aong doing 23 loops rather than the original 12. 24 loops is too many.

you didn't state what sort of problem occurs at that point. is it a memory error or a timeout/maximum execution error?

how many id's do you need to retrieve information for? does the site/api you are reading have a way of getting data for multiple id's in one request?

muppet77 · March 5, 2015

Hi. It is an internal server error code 500. I can get 20 or so ids done but ideally I'd like to do 30,40,50++

muppet77 · March 5, 2015

And no I don't think multi requests are possible but I will check. Assume no if I don't post.

CroNiX · March 6, 2015

Did you check your server and php error logs?

muppet77 · March 6, 2015

No I havent. Where abouts would they be and what would they be called in Godaddy? Thanks.

Sign In

Save memory ideas

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Archived

Important Information