crawl sites data?

gangsterwanster1 · June 16, 2009

What would be the option to parse a sites data daily?

Basically the site i am trying to gather data for, every day has new items that are sold. ('6110.html', new entry's = the number goes up)

What i hope to accomplish is to parse the values for:

Retail price:

Sold for:

So that i can eventually use all the data and graph by average price per product which will help immensely when using the site.

Any ideas will be appreciated.

Alex · June 16, 2009

Getting the values specifically would depend on the way the site is setup. Show the source from the area around where the values you wish to get out are, if you want help with that part. For the running it once daily you'll probably want to look into CRON jobs.

gangsterwanster1 · June 16, 2009

http://bidstick.com/latest/wii/6401.html

Data is;

Name: <title>Wii</title>

<div class="bid_info"> <- all the bidding info (sold for, etc)

<div class="time_info"> <- clock

Maq · June 16, 2009

You can accomplish this with either file_get_contents or cURL and preg_match_all. If you do a search in either here, or the PHP Regex section, you should be able to find helpful threads and similar ideas that will assist you. Good luck.

gangsterwanster1 · June 16, 2009

You can accomplish this with either file_get_contents or cURL and preg_match_all. If you do a search in either here, or the PHP Regex section, you should be able to find helpful threads and similar ideas that will assist you. Good luck.

<?php
$html = "<b>bold text</b><a href=howdy.html>click me</a>";

preg_match_all("/(<([\w]+)[^>]*>)(.*)(<\/\\2>)/", $html, $matches, PREG_SET_ORDER);

foreach ($matches as $val) {
    echo "matched: " . $val[0] . "\n";
    echo "part 1: " . $val[1] . "\n";
    echo "part 2: " . $val[3] . "\n";
    echo "part 3: " . $val[4] . "\n\n";
}
?>

So based off of this example would you just do something like this?

change

$html = "<b>bold text</b><a href=howdy.html>click me</a>";

to

$html = "<b>bold text</b><a href=http://bidstick.com/latest/d/6401.html>click me</a>";

Fairly confused, any ideas on making this work?

Maq · June 16, 2009

Kind of. You would have to match on those characters and patterns.

It would look something like:

$html = "bold textclick me";
$pattern = "~^(.*)(.*)~i";
preg_match_all($pattern, $html, $matches);
echo "Bold: " . $matches[1][0] . "
href: " . $matches[2][0];

?>

- The (.*) will capture any 0 or more characters in that position and add it to the $matches array.

- The '\s', takes care of whitespace.

- I had to escape the '.' because they are special characters (wildcards). So by escaping them, the pattern will take the dots as literal dots.

- The tildes (~) are my delimiters and you need them around your pattern.

- Finally the 'i' flag is for case-insensitivity.

For more information read the tutorial here on phpfreaks - Regular Expressions (Part1) - Basic Syntax. You should also take a look at the function documentation from the manual - preg_match_all.

Sign In

crawl sites data?

Recommended Posts

gangsterwanster1

Link to comment

Share on other sites

Alex

Link to comment

Share on other sites

gangsterwanster1

Link to comment

Share on other sites

Maq

Link to comment

Share on other sites

gangsterwanster1

Link to comment

Share on other sites

Maq

Link to comment

Share on other sites

Archived

Browse

Activity

Important Information