Jump to content

Save memory ideas


muppet77

Recommended Posts

My script tells me that there is not enough memory when I try to interrogate a webpage.

I run a loop on the page to get certain sections.

 

I have extended my php5.ini file so that the memory is 512m and this seemed to allow for a few more loops, but not many.

 

I then tried obs flushes which I believe may be redundant.

 

Anyway, would writing the output to a file help at all rather than echoing to a browser?

 

Thank you.

Link to comment
Share on other sites

your program likely has a logic error in it, so that it will consume all available memory, no matter how much you make available. 512M bytes of memory is a substantial amount of memory. you would need to debug what the code is doing in order to find the problem. if you want us to help, you would need to post the code needed to reproduce the problem, less any sort of private values it may contain.

  • Like 1
Link to comment
Share on other sites

Without knowing how your code is structured, it's kind of hard to give any definitive answers. I suspect you are creating new variables (or array indexes) or appending to variables on each iteration. If you define a value on an iteration of the loop and don't need it on subsequent loops, then unset it. Writing to a file could help - depending on what you are actually doing. But, I have a hard time believing that the data on an external page is so large that it would cause this problem. You likely have some unnecessary inefficiency in your process that is the root of the problem.

  • Like 1
Link to comment
Share on other sites

<?php

$id=755303; // this is a unique number that helps to make up the url
$maxnumber = 5; // number of loops or web pages to look at
$count=1; // a way to count how many loops completed

$url = "??????"; // removed for now
$username="??????????"; // removed for now
$password = "????????"; // removed for now
$data_to_send = "username=".$username."&password=".$password;

include "simple_html_dom.php";

$cookie_jar = tempnam('/tmp','cookie');
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch,CURLOPT_POST, 2);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
curl_setopt($ch,CURLOPT_POSTFIELDS, $data_to_send);
curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
curl_setopt( $ch, CURLOPT_COOKIEJAR,  $cookie_jar );
curl_setopt( $ch, CURLOPT_COOKIEFILE, $cookie_jar );
$output = curl_exec($ch);

	$data= array();

while($count <= $maxnumber) {
     
		$url="?????????????????????????????".$id."/"; // removed for now
		curl_setopt($ch, CURLOPT_URL, $url);
		curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
		$output = curl_exec($ch);
		$html = str_get_html($output);

$find = strpos($html, "some text to find ");
$trim = substr($html,$find+18, 4);
echo $trim;

echo "</br>";
$count++;
$id++;

		$data[] =$html;

		usleep(1000000);
       	
}
?>

so the code logs into a site, goes to the url which is a page with data on it. It looks for "some text to find" and then echoes this. It then loops to the next id or page and does the same. It gives a memory error when it has to do more than 12 loops. It never does 13.

 

any suggestions of sharpening it up please?

Edited by muppet77
Link to comment
Share on other sites

the simple html dom object is huge, even for a small amount of html, like the str_get_html('<html><body>Hello!</body></html>') example in the documentation (don't know if this is a bug or intentional.)

 

edit: the str_get_html example from the documentation uses 11K bytes of memory. creating the html dom object for the thread on this forum that we are looking at, for a non-logged in visitor, uses 2M+ (2,416,456) bytes of memory.

 

you need to extract out the data you need inside of the loop and only store the data, not the simple html dom object itself.

  • Like 1
Link to comment
Share on other sites

First let me say I have never used curl and don't fully understand it.

 

Now - how your script operates.

 

1 - you setup and execute a curl call but you don't do anything with the returned value(s).

2 - you then start a loop to do more curl calls

3 - with each curl call you do some unknown function (I can't find it) named str_get_html

4 - with the results of this function you do a search and then you attempt to handle the contents found by this search. BAD CODE! What if there was nothing found?

5 - you append the just-read html (from the unknown function call) to an array as a new element. Is $data the reason for you memory overload?

 

Since count starts at 1 and maxnumber is set to 5, you should only do this 5 times. Where is 12 coming into play as you mentioned?

Link to comment
Share on other sites

You got someone to write this for you? So - you just wanted to browse a bunch of websites and find some data in each of them and you got someone to write it for you cause you couldn't. And now you want US to solve your problem.

 

Hmmm... Seems like you should get the author to help you out with this. You guys dreamed this up, not us.

 

Or you could listen to mac_gyver and make his suggested changes

Link to comment
Share on other sites

Really sorry but yes I did get help. I am just a beginner and am after some help.

I don't really understand Mac's post fully. I assume he means just delete that line?

Apart from that, I'm in the dark. Sorry, you seem annoyed ! Not my intention to annoy you!

Link to comment
Share on other sites

You appear to know nothing about php so why choose such a hard project? One usually chooses projects (as I said) that give you a chance to learn with less complex goals.

 

Might I ask what you are searching for in these places you are using curl to extract?

Link to comment
Share on other sites

As mac_gyver stated,t eh problem is the fact that you are saving the entire results from str_get_html() here

 

$data[] = $html;

 

That is a HUGE object of data. You need to parse the results of $html to the data you need and save that to the array.

Link to comment
Share on other sites

As mac_gyver stated,t eh problem is the fact that you are saving the entire results from str_get_html() here

$data[] = $html;

That is a HUGE object of data. You need to parse the results of $html to the data you need and save that to the array.

 

ok, i have deleted this line and also

 

$html = str_get_html($output);

and

include "simple_html_dom.php";

and

usleep

and

$data= array();

 

have ALL gone. the code still works and chugs aong doing 23 loops rather than the original 12. 24 loops is too many.

 

Are there any other ways to allow it to move more efficiently please?

 

currently it is looking like:

    <?php
     
    $id=755303; // this is a unique number that helps to make up the url
    $maxnumber = 23; // number of loops or web pages to look at
    $count=1; // a way to count how many loops completed
     
    $url = "??????"; // removed for now
    $username="??????????"; // removed for now
    $password = "????????"; // removed for now
    $data_to_send = "username=".$username."&password=".$password;
     
  
    $cookie_jar = tempnam('/tmp','cookie');
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch,CURLOPT_POST, 2);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch,CURLOPT_POSTFIELDS, $data_to_send);
    curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
    curl_setopt( $ch, CURLOPT_COOKIEJAR, $cookie_jar );
    curl_setopt( $ch, CURLOPT_COOKIEFILE, $cookie_jar );
    $output = curl_exec($ch);
        
    while($count <= $maxnumber) {
    $url="?????????????????????????????".$id."/"; // removed for now
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
    $output = curl_exec($ch);
         
    $find = strpos($html, "some text to find ");
    $trim = substr($html,$find+18, 4);
    echo $trim;
     
    echo "</br>";
    $count++;
    $id++;
         
    }
    ?>

many thanks

Edited by muppet77
Link to comment
Share on other sites

have ALL gone. the code still works and chugs aong doing 23 loops rather than the original 12. 24 loops is too many.

 

 

you didn't state what sort of problem occurs at that point. is it a memory error or a timeout/maximum execution error?

 

how many id's do you need to retrieve information for? does the site/api you are reading have a way of getting data for multiple id's in one request?

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.