cxpzadan Posted July 20, 2008 Share Posted July 20, 2008 There's a website that updates statistics approximately every hour, but they don't archive their pages so I have to physically be at my computer every time they update. I'm looking for a script that I could put on my website that would automatically visit the given URL every hour and save the page in a folder on my server. I'd greatly appreciate it if someone could help. Quote Link to comment Share on other sites More sharing options...
teynon Posted July 20, 2008 Share Posted July 20, 2008 Does it have to download the page images or just the page? Quote Link to comment Share on other sites More sharing options...
trq Posted July 20, 2008 Share Posted July 20, 2008 Look into cron and wget. You don't even need php for this. Quote Link to comment Share on other sites More sharing options...
cxpzadan Posted July 20, 2008 Author Share Posted July 20, 2008 Does it have to download the page images or just the page? Images as well. Look into cron and wget. You don't even need php for this. What is this? Unix? I don't have any experience with this, though it looks fairly straightforward. How would I do this? Quote Link to comment Share on other sites More sharing options...
trq Posted July 20, 2008 Share Posted July 20, 2008 How would I do this? Read the man pages I linked too, its all covered. Cron allows you to execute commands at a given time, and wget downloads complete web pages. Quote Link to comment Share on other sites More sharing options...
cxpzadan Posted July 20, 2008 Author Share Posted July 20, 2008 I read it. I need help with the basics. I don't know where to start. I've never done unix before. Quote Link to comment Share on other sites More sharing options...
trq Posted July 20, 2008 Share Posted July 20, 2008 Before I go posting any examples, is your server Linux and do you have access to cron? Quote Link to comment Share on other sites More sharing options...
cxpzadan Posted July 20, 2008 Author Share Posted July 20, 2008 yes it is linux. i think i do have access to cron. im not 100% sure though. :-\ Quote Link to comment Share on other sites More sharing options...
cxpzadan Posted July 20, 2008 Author Share Posted July 20, 2008 yes, i do have access to cron. checked it now. Quote Link to comment Share on other sites More sharing options...
cxpzadan Posted July 20, 2008 Author Share Posted July 20, 2008 figured out cron jobs, wget is really cofusing though. Quote Link to comment Share on other sites More sharing options...
trq Posted July 20, 2008 Share Posted July 20, 2008 This.... /usr/bin/wget -N -E -H -k -K -p -P /home/${HOME}/domain.com/$(date +%s) http://domain.com/page.html will save a copy of the complete (images and all) web page within a directory called domain.com/<seconds since epoch> in your home directory. Quote Link to comment Share on other sites More sharing options...
cxpzadan Posted July 21, 2008 Author Share Posted July 21, 2008 This.... /usr/bin/wget -N -E -H -k -K -p -P /home/${HOME}/domain.com/$(date +%s) http://domain.com/page.html will save a copy of the complete (images and all) web page within a directory called domain.com/<seconds since epoch> in your home directory. I made a crontab using this code, but nothing happens. It doesn't seem to work. There's no directory created anywhere on my server... How do I proceed? Quote Link to comment Share on other sites More sharing options...
trq Posted July 21, 2008 Share Posted July 21, 2008 Post your crontab line. Quote Link to comment Share on other sites More sharing options...
cxpzadan Posted July 21, 2008 Author Share Posted July 21, 2008 I used the cronjob manager in Cpanel. It's set to update every minute, every hour, day, month... In the command line I copied exactly what you suggested. /usr/bin/wget -N -E -H -k -K -p -P /home/${HOME}/domain.com/$(date +%s) http://domain.com/page.html . Saved it... nothing happens. Quote Link to comment Share on other sites More sharing options...
trq Posted July 21, 2008 Share Posted July 21, 2008 There is no such domain as domain.com, you need to change that to the address of the page your trying to steel. Quote Link to comment Share on other sites More sharing options...
cxpzadan Posted July 21, 2008 Author Share Posted July 21, 2008 I used yahoo.com Quote Link to comment Share on other sites More sharing options...
trq Posted July 21, 2008 Share Posted July 21, 2008 Did you tell cron how often to execute the command? Quote Link to comment Share on other sites More sharing options...
cxpzadan Posted July 21, 2008 Author Share Posted July 21, 2008 what do you mean? i posted the code in the command line and saved it. Quote Link to comment Share on other sites More sharing options...
trq Posted July 21, 2008 Share Posted July 21, 2008 Sorry, Ive never used cPanel or whatever interface it is your using to access cron. Generally though you need to tell cron when to execute a given command. In a crontab file it would look something like.... # minute # # hour # # # day of month # # # # month # # # # # day of week # # # # # 0 * * * * /usr/bin/command The above would execute /usr/bin/command every hour on the hour. Quote Link to comment Share on other sites More sharing options...
cxpzadan Posted July 21, 2008 Author Share Posted July 21, 2008 could it be possible that wget doesn't work on my site (or is in another place), or am i completely wrong? Quote Link to comment Share on other sites More sharing options...
trq Posted July 21, 2008 Share Posted July 21, 2008 Yes, that is always a posability. Try this in a php script and show us the output. <?php echo "<pre>"; echo shell_exec('whereis wget'); echo "</pre>"; ?> Quote Link to comment Share on other sites More sharing options...
cxpzadan Posted July 21, 2008 Author Share Posted July 21, 2008 wget: /usr/bin/wget /usr/share/man/man1/wget.1.gz Quote Link to comment Share on other sites More sharing options...
trq Posted July 21, 2008 Share Posted July 21, 2008 Then it is in the location it should be. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.