OM2 Posted December 19, 2014 Share Posted December 19, 2014 I need to scrape pages - I only need one page at a time I'm only looking for 2/3 bits of data within each page Can someone give me some pointers where to start? I've searched and see names like DOMXpath and Xpath mentioned - do I need these? It's important that I can run the script on a standard Linux hosting with nothing extra installed like packages - I'd like to have something I can just use immediately using standard php and functions I've seen plenty of tutorials + youtube videos - just looking for recommendations and pointers for recommended practices Thanks OM Quote Link to comment Share on other sites More sharing options...
QuickOldCar Posted December 19, 2014 Share Posted December 19, 2014 If gave more info the site wanted to scrape or the data might make it easier. Is various ways to connect to websites. This all varies on the type of data and complexity of what are trying to send or retrieve. curl would be my preference over all of them, you have lots more control over everything. A simple way to connect would be using file_get_contents() For xml can use simplexml_load_file() , I prefer to use curl for connection and responses, then simplexml_load_string() to create an object Once you get the raw data, you need to access the items you want. This would be called parsing. json simplexml dom simplehtmldom preg_match() and preg_match_all() Once you have the data you can output it or save to your database. 1 Quote Link to comment Share on other sites More sharing options...
OM2 Posted December 31, 2014 Author Share Posted December 31, 2014 @QuickOldCar, thanks for the reply i didn't know someone had replied to my post - no email notification received i've posted the project on a freelancer website i'm looking to get help writing the code YES: i can write code myself... but would rather get a kickstart for the code + have fully explained + have some guidance on how to adapt for different websites send me a PM if you think you can help (i'll check my PM tonight and tomorrow) the websites i need to use on are eBay and Amazon thanks Quote Link to comment Share on other sites More sharing options...
Psycho Posted December 31, 2014 Share Posted December 31, 2014 Why would you want to scrape their sites as opposed to using one of the APIs they provide? If they make any changes to the format of their pages your code will break as opposed to using one of their provided APIs which they have a vested interest in maintaining. Here are a couple links to get you started. http://go.developer.ebay.com/developers/ebay/php-perl-and-python-developer-center http://aws.amazon.com/sdk-for-php/ 1 Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.