interpim Posted August 20, 2008 Share Posted August 20, 2008 So, I have a form that I built that will setup a script to parse a website based on an id number... It works, but it is manual What I want to be able to do is set up a CRON job to run this automattically, updating the number so it steps up each time and not parsing the same pages constantly. Then when it reaches a certain number I want it to reset to 0 and start over on a different server. here is my form <table><tr><td align ='right'><form method='post' action='db_build.php'> Server:</td><td><select name='server'> <option value='122'>Warpstone</option> <option value='130'>Tyrion</option> </select></td></tr><tr><td align='right'> Starting Player ID:</td><td><input type='text' name='player_start'></td></tr><tr><td> Only 25 players will be parsed!</td><td> <input type='submit' value='start parse'></td></tr></table> </form> I have been putting the numbers in manually and going back and adding 25 to the last number. But, there are 60,000+ pages for each server and it is quite the effort to sit at a computer and retype the input everytime it loads. My server cannot process much more than 25 players at a time, so I cannot just set it to go and walk away. I have tried writing a script that will submit post data and update a text file with the last number used, then used a CRON to run that, but for some reason it will not run. I really think the problem is the CRON I have setup, I don't know much about it, and it isn't running at all I think. So if I want to run a php script, how would I actually set up the CRON? If host matters I use bluehost.com Quote Link to comment Share on other sites More sharing options...
mbeals Posted August 21, 2008 Share Posted August 21, 2008 you may not even need cron. Cron is used if you want to run a script on a set interval (like every 5 minutes). it sounds like you just want to iterate through one server, then repeat on the next. I suspect you cannot do more then 25 pages because the php page is timing out. When you run a script from the command line, there is no time out. You can have a script that loops forever (essentially how daemons .... services in windows speak... work). so take your db_build.php script and add this to the top, removing the $name = $_POST stuff. <?php $name = $_SERVER["argv"][1]; if(!$name) die("No server defined!\n"); ?> Then modify the rest of it to loop through the entire server, instead of just the first 25. Then open a terminal on the server, chmod the file then execute it: chmod a+x db_build.php php -q db_build.php 122 & if you need to run this on a schedule (say you want to backup the website every day at noon), then you would insert that command into the crontab. Quote Link to comment Share on other sites More sharing options...
interpim Posted August 21, 2008 Author Share Posted August 21, 2008 Well... I also want to lessen the load on the server I am retrieving from. And I think iterating through those 60k+ pages for about 50 different servers all at once would put a heck of a load on them. I was thinking running 25 every 5 minutes would be ideal... I know it would take about 8 days of parsing to get through the first 60k, but I can accept that. You are right though, my php page is timing out with any more than 25, and still does every once in a while. I am parsing the HTML from a server that isn't mine, the data is updated continuously, so I will need to run the script quite often. maybe less as time goes on as my database is nearer completion. Quote Link to comment Share on other sites More sharing options...
mbeals Posted August 21, 2008 Share Posted August 21, 2008 i'm assuming you are using a loop to iterate through the pages. if you are worried about DOSing the site, then insert a sleep() command within the loop. This will run the loop once, then make the script sleep for however many seconds you tell it to. You could also do something like: <?php for($i=0;$i<$records;$i++){ //download page #$i if($i % 25) sleep(60); } ?> That would make the script sleep for 1 minute every 25th record. In this configuration (specifying the server as a command line arg) one running process will access one server. With a little trickery you could get the script to alternate servers, so that it never hits the same server twice in a row. You might want to take a look at some articles on daemon programming as a lot of the concepts apply to this sort of thing. Quote Link to comment Share on other sites More sharing options...
trq Posted August 21, 2008 Share Posted August 21, 2008 You can forget about the form, can we see db_build.php? From there we can make some adjustments and show you how to execute a php script via cron. Quote Link to comment Share on other sites More sharing options...
interpim Posted August 21, 2008 Author Share Posted August 21, 2008 well... I cannot really show the meat of the file... it parses a webpage for a video game this is still under NDA. The only information I am passing to db_build.php is a server number and a character number. with that it builds the URL which retrieves each page and parses the data from it and submits it into my database. Quote Link to comment Share on other sites More sharing options...
interpim Posted August 21, 2008 Author Share Posted August 21, 2008 I guess it cannot hurt to post how I am getting there, i edited out username/password info. $player_start = $_POST['player_start']; $server = $_POST['server']; $pagenum = $player_start; $pageend = $player_start + 25; $URL="http://realmwar.warhammeronline.com/realmwar/UserLoginAuthentication.war"; $ch = curl_init(); curl_setopt($ch, CURLOPT_URL,"$URL"); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); curl_setopt($ch, CURLOPT_COOKIEFILE, "cookiefilename"); curl_setopt($ch, CURLOPT_COOKIEJAR, "cookiefilename"); curl_setopt($ch, CURLOPT_POST, 1); curl_setopt($ch, CURLOPT_POSTFIELDS, "user=xxxxxx&password=xxxxxxx"); $pageResults = curl_exec($ch); while($pagenum < $pageend){ curl_setopt($ch, CURLOPT_URL, "http://realmwar.warhammeronline.com/realmwar/CharacterInfo.war?id=$pagenum&server=$server"); $blah = curl_exec($ch); #### code for parsing the data from the HTML ##### Quote Link to comment Share on other sites More sharing options...
interpim Posted August 21, 2008 Author Share Posted August 21, 2008 OK... i don't think I can actually issue command line info. I use bluehost and the Cpanel version they have is 11.23.4-RELEASE if anyone knows how to issue command line info from this please let me know. Quote Link to comment Share on other sites More sharing options...
trq Posted August 21, 2008 Share Posted August 21, 2008 You really need to find out what is available to you on the server. I wouldn't even be using php for such a task (curl has a perfectly good cli interface) and definitely not through apache. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.