Jump to content

Automatic recurring/updating script


interpim

Recommended Posts

So, I have a form that I built that will setup a script to parse a website based on an id number...  It works, but it is manual :(  What I want to be able to do is set up a CRON job to run this automattically, updating the number so it steps up each time and not parsing the same pages constantly.  Then when it reaches a certain number I want it to reset to 0 and start over on a different server.

 

here is my form

<table><tr><td align ='right'><form method='post' action='db_build.php'>
Server:</td><td><select name='server'>
<option value='122'>Warpstone</option>
<option value='130'>Tyrion</option>
</select></td></tr><tr><td align='right'>
Starting Player ID:</td><td><input type='text' name='player_start'></td></tr><tr><td>
Only 25 players will be parsed!</td><td>
<input type='submit' value='start parse'></td></tr></table>
</form>

 

I have been putting the numbers in manually and going back and adding 25 to the last number.  But, there are 60,000+ pages for each server and it is quite the effort to sit at a computer and retype the input everytime it loads.

 

My server cannot process much more than 25 players at a time, so I cannot just set it to go and walk away. 

 

I have tried writing a script that will submit post data and update a text file with the last number used, then used a CRON to run that, but for some reason it will not run.

 

I really think the problem is the CRON I have setup, I don't know much about it, and it isn't running at all I think.

 

So if I want to run a php script, how would I actually set up the CRON?

If host matters I use bluehost.com

Link to comment
Share on other sites

you may not even need cron.  Cron is used if you want to run a script on a set interval (like every 5 minutes).  it sounds like you just want to iterate through one server, then repeat on the next.  I suspect you cannot do more then 25 pages because the php page is timing out.  When you run a script from the command line, there is no time out.  You can have a script that loops forever (essentially how daemons .... services in windows speak... work).

 

so take your db_build.php script and add this to the top, removing the $name = $_POST stuff.

 

<?php

$name = $_SERVER["argv"][1];
if(!$name) die("No server defined!\n"); 

?>

 

Then modify the rest of it to loop through the entire server, instead of just the first 25.

 

Then open a terminal on the server, chmod the file then execute it:

 

chmod a+x db_build.php

 

php -q db_build.php 122 &

 

 

if you need to run this on a schedule (say you want to backup the website every day at noon), then you would insert that command into the crontab.

 

 

Link to comment
Share on other sites

Well... I also want to lessen the load on the server I am retrieving from.  And I think iterating through those 60k+ pages for about 50 different servers all at once would put a heck of a load on them.  I was thinking running 25 every 5 minutes would be ideal... I know it would take about 8 days of parsing to get through the first 60k, but I can accept that.  You are right though, my php page is timing out with any more than 25, and still does every once in a while.

 

I am parsing the HTML from a server that isn't mine, the data is updated continuously, so I will need to run the script quite often.  maybe less as time goes on as my database is nearer completion.

 

 

Link to comment
Share on other sites

i'm assuming you are using a loop to iterate through the pages.

 

if you are worried about DOSing the site, then insert a sleep() command within the loop.  This will run the loop once, then make the script sleep for however many seconds you tell it to.

 

You could also do something like:

 

<?php

for($i=0;$i<$records;$i++){

    //download page #$i

    if($i % 25) sleep(60);

}

?>

 

 

That would make the script sleep for 1 minute every 25th record.

 

 

In this configuration (specifying the server as a command line arg) one running process will access one server.  With a little trickery you could get the script to alternate servers, so that it never hits the same server twice in a row.

 

You might want to take a look at some articles on daemon programming as a lot of the concepts apply to this sort of thing.

Link to comment
Share on other sites

well... I cannot really show the meat of the file... it parses a webpage for a video game this is still under NDA.  The only information I am passing to db_build.php is a server number and a character number.  with that it builds the URL which retrieves each page and parses the data from it and submits it into my database.

Link to comment
Share on other sites

I guess it cannot hurt to post how I am getting there, i edited out username/password info.

 

$player_start = $_POST['player_start'];
$server = $_POST['server'];

$pagenum = $player_start;
$pageend = $player_start + 25;


$URL="http://realmwar.warhammeronline.com/realmwar/UserLoginAuthentication.war";
$ch = curl_init();   
curl_setopt($ch, CURLOPT_URL,"$URL"); 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_COOKIEFILE, "cookiefilename");
curl_setopt($ch, CURLOPT_COOKIEJAR, "cookiefilename");
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, "user=xxxxxx&password=xxxxxxx");
$pageResults = curl_exec($ch);    

while($pagenum < $pageend){

curl_setopt($ch, CURLOPT_URL, "http://realmwar.warhammeronline.com/realmwar/CharacterInfo.war?id=$pagenum&server=$server");
$blah = curl_exec($ch);

#### code for parsing the data from the HTML ##### 

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.