Jump to content

Automatic recurring/updating script


interpim

Recommended Posts

So, I have a form that I built that will setup a script to parse a website based on an id number...  It works, but it is manual :(  What I want to be able to do is set up a CRON job to run this automattically, updating the number so it steps up each time and not parsing the same pages constantly.  Then when it reaches a certain number I want it to reset to 0 and start over on a different server.

 

here is my form

<table><tr><td align ='right'><form method='post' action='db_build.php'>
Server:</td><td><select name='server'>
<option value='122'>Warpstone</option>
<option value='130'>Tyrion</option>
</select></td></tr><tr><td align='right'>
Starting Player ID:</td><td><input type='text' name='player_start'></td></tr><tr><td>
Only 25 players will be parsed!</td><td>
<input type='submit' value='start parse'></td></tr></table>
</form>

 

I have been putting the numbers in manually and going back and adding 25 to the last number.  But, there are 60,000+ pages for each server and it is quite the effort to sit at a computer and retype the input everytime it loads.

 

My server cannot process much more than 25 players at a time, so I cannot just set it to go and walk away. 

 

I have tried writing a script that will submit post data and update a text file with the last number used, then used a CRON to run that, but for some reason it will not run.

 

I really think the problem is the CRON I have setup, I don't know much about it, and it isn't running at all I think.

 

So if I want to run a php script, how would I actually set up the CRON?

If host matters I use bluehost.com

Link to comment
https://forums.phpfreaks.com/topic/120622-automatic-recurringupdating-script/
Share on other sites

you may not even need cron.  Cron is used if you want to run a script on a set interval (like every 5 minutes).  it sounds like you just want to iterate through one server, then repeat on the next.  I suspect you cannot do more then 25 pages because the php page is timing out.  When you run a script from the command line, there is no time out.  You can have a script that loops forever (essentially how daemons .... services in windows speak... work).

 

so take your db_build.php script and add this to the top, removing the $name = $_POST stuff.

 

<?php

$name = $_SERVER["argv"][1];
if(!$name) die("No server defined!\n"); 

?>

 

Then modify the rest of it to loop through the entire server, instead of just the first 25.

 

Then open a terminal on the server, chmod the file then execute it:

 

chmod a+x db_build.php

 

php -q db_build.php 122 &

 

 

if you need to run this on a schedule (say you want to backup the website every day at noon), then you would insert that command into the crontab.

 

 

Well... I also want to lessen the load on the server I am retrieving from.  And I think iterating through those 60k+ pages for about 50 different servers all at once would put a heck of a load on them.  I was thinking running 25 every 5 minutes would be ideal... I know it would take about 8 days of parsing to get through the first 60k, but I can accept that.  You are right though, my php page is timing out with any more than 25, and still does every once in a while.

 

I am parsing the HTML from a server that isn't mine, the data is updated continuously, so I will need to run the script quite often.  maybe less as time goes on as my database is nearer completion.

 

 

i'm assuming you are using a loop to iterate through the pages.

 

if you are worried about DOSing the site, then insert a sleep() command within the loop.  This will run the loop once, then make the script sleep for however many seconds you tell it to.

 

You could also do something like:

 

<?php

for($i=0;$i<$records;$i++){

    //download page #$i

    if($i % 25) sleep(60);

}

?>

 

 

That would make the script sleep for 1 minute every 25th record.

 

 

In this configuration (specifying the server as a command line arg) one running process will access one server.  With a little trickery you could get the script to alternate servers, so that it never hits the same server twice in a row.

 

You might want to take a look at some articles on daemon programming as a lot of the concepts apply to this sort of thing.

well... I cannot really show the meat of the file... it parses a webpage for a video game this is still under NDA.  The only information I am passing to db_build.php is a server number and a character number.  with that it builds the URL which retrieves each page and parses the data from it and submits it into my database.

I guess it cannot hurt to post how I am getting there, i edited out username/password info.

 

$player_start = $_POST['player_start'];
$server = $_POST['server'];

$pagenum = $player_start;
$pageend = $player_start + 25;


$URL="http://realmwar.warhammeronline.com/realmwar/UserLoginAuthentication.war";
$ch = curl_init();   
curl_setopt($ch, CURLOPT_URL,"$URL"); 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_COOKIEFILE, "cookiefilename");
curl_setopt($ch, CURLOPT_COOKIEJAR, "cookiefilename");
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, "user=xxxxxx&password=xxxxxxx");
$pageResults = curl_exec($ch);    

while($pagenum < $pageend){

curl_setopt($ch, CURLOPT_URL, "http://realmwar.warhammeronline.com/realmwar/CharacterInfo.war?id=$pagenum&server=$server");
$blah = curl_exec($ch);

#### code for parsing the data from the HTML ##### 

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.