Dysfunktional Posted January 26, 2012 Share Posted January 26, 2012 Hi all ,Its my first post here and I'm still very new to PHP . Im trying to wright a Web crawler script except i want this script to just crawl the 1 target website I enter. Basically i want my script to go to ultimateguitar.com or 911tabs.com or any other guitar tabs website and crawl the site and index any guitar tabs they have in there database. This will provid my website with a "phonebook" of guitar tabs. Its not illeagle or in breach of any copyrights im only making a database of links. Any help would be greatly appreciated! Quote Link to comment https://forums.phpfreaks.com/topic/255820-web-spidercrawler-help-please/ Share on other sites More sharing options...
ManiacDan Posted January 26, 2012 Share Posted January 26, 2012 Look into regular expressions or the DOM object. You will probably have to write a separate spider for each site unless you're very talented. Quote Link to comment https://forums.phpfreaks.com/topic/255820-web-spidercrawler-help-please/#findComment-1311390 Share on other sites More sharing options...
QuickOldCar Posted January 26, 2012 Share Posted January 26, 2012 How to make a web crawler/scraper is a lot of information to tell someone how to do it. Basic concept: Designate a url either by input,a list or from a database. Connect to it by using curl, file_get-contents or other. Obtain desired information, could be header info, meta info, some content that matches within that page. Preg_match using regular expressions is a common way to find the related content. Dom or something like simple html dom could also find specific areas in the content. Once links are found you could then insert them into a database, those same links later on could be used to visit that page and acquire more links. You could make yourself a system that knows pages already visited or just don't insert duplicate urls into the database. Have your scraper keep running and visiting these pages in loops. I have seen some example scraper scripts on the net, they could give you an idea of how to do it, but not one of them is a complete solution, you must do lots of work to them for your needs. Consider using this already made search spider or something similar if do not want to invest the time to make your own. http://www.sphider.eu/ Quote Link to comment https://forums.phpfreaks.com/topic/255820-web-spidercrawler-help-please/#findComment-1311397 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.