Jump to content

Scraping: need some pointers


OM2

Recommended Posts

I need to scrape pages - I only need one page at a time

I'm only looking for 2/3 bits of data within each page

 

Can someone give me some pointers where to start?

I've searched and see names like DOMXpath and Xpath mentioned - do I need these?

 

It's important that I can run the script on a standard Linux hosting with nothing extra installed like packages - I'd like to have something I can just use immediately using standard php and functions

 

I've seen plenty of tutorials + youtube videos - just looking for recommendations and pointers for recommended practices

 

Thanks

 

 

OM

Link to comment
Share on other sites

If gave more info the site wanted to scrape or the data might make it easier.

 

Is various ways to connect to websites.

This all varies on the type of data and complexity of what are trying to send or retrieve.

 

curl would be my preference over all of them, you have lots more control over everything.

A simple way to connect would be using file_get_contents()

For xml can use simplexml_load_file() , I prefer to use curl for connection and responses, then simplexml_load_string() to create an object

 

Once you get the raw data, you need to access the items you want.

This would be called parsing.

json

simplexml

dom

simplehtmldom

preg_match() and preg_match_all()

 

Once you have the data you can output it or save to your database.

  • Like 1
Link to comment
Share on other sites

  • 2 weeks later...

@QuickOldCar, thanks for the reply

i didn't know someone had replied to my post - no email notification received

 

i've posted the project on a freelancer website

i'm looking to get help writing the code

YES: i can write code myself... but would rather get a kickstart for the code + have fully explained + have some guidance on how to adapt for different websites

send me a PM if you think you can help (i'll check my PM tonight and tomorrow)

 

the websites i need to use on are eBay and Amazon

 

thanks

Link to comment
Share on other sites

Why would you want to scrape their sites as opposed to using one of the APIs they provide? If they make any changes to the format of their pages your code will break as opposed to using one of their provided APIs which they have a vested interest in maintaining.

 

Here are a couple links to get you started.

http://go.developer.ebay.com/developers/ebay/php-perl-and-python-developer-center

http://aws.amazon.com/sdk-for-php/

  • Like 1
Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.