Jump to content

Scraping data


fatmach

Recommended Posts

More information on the job posting.

 

I am looking to fetch information from daily deal website, 

Such as tuango.ca, socialliving.com, groupon.com...ect

 

I want to retrieve data from different daily deal sites, and I want to retrieve all the deals of the day from each different city in the website.

 

For example www.tuango.ca 

Has a deal a day in Montreal, Toronto,...ect

 

I want to be apply to retrieve data from all the different location within the site. 

 

I want the script to fetch the data of deals. To be more clear I want the script to fetch

 

What site the deal was on

What location was it for

What's the tittle of the deal

What price is the deal

What's the value of the deal

What's the saving in percentage of the deal

How much were sold 

What's the minimum amount of the deal before it becomes activated

What's the company who did the deal

Company address 

Company postal code

Company phone number

(there might be more categories..will talk more if you pass this stage of the interview process)

 

Ones all this data is fetched I need it to automatically be store in a database.

 

Every morning at 4:am (eastern time) 

I need it to run the script, because the days deals finish at midnight and it's the only way of getting a number of the total number of coupons sold. you'll usually see the final stats of the deal on their recent deals page of the website.

 

I want to know how a site like http://onespout.com/deals/montreal did it..

I'm not asking somebody to do it for me I'm just asking someone to guide me in takeing the right steps

Link to comment
Share on other sites

There's two basic approaches I would use

 

1.  A regexp to extract the info

2.  Parsing the HTML using a library, either event based or converting the HTML into a structure where the data can be extracted from.

Link to comment
Share on other sites

There's two basic approaches I would use

 

1.  A regexp to extract the info

2.  Parsing the HTML using a library, either event based or converting the HTML into a structure where the data can be extracted from.

 

you think there is any other way, i tried learning regexp and its just to complicated for me to unferstand. ..

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.