gangsterwanster1 Posted June 16, 2009 Share Posted June 16, 2009 What would be the option to parse a sites data daily? Basically the site i am trying to gather data for, every day has new items that are sold. ('6110.html', new entry's = the number goes up) What i hope to accomplish is to parse the values for: Retail price: Sold for: So that i can eventually use all the data and graph by average price per product which will help immensely when using the site. Any ideas will be appreciated. Quote Link to comment https://forums.phpfreaks.com/topic/162313-crawl-sites-data/ Share on other sites More sharing options...
Alex Posted June 16, 2009 Share Posted June 16, 2009 Getting the values specifically would depend on the way the site is setup. Show the source from the area around where the values you wish to get out are, if you want help with that part. For the running it once daily you'll probably want to look into CRON jobs. Quote Link to comment https://forums.phpfreaks.com/topic/162313-crawl-sites-data/#findComment-856696 Share on other sites More sharing options...
gangsterwanster1 Posted June 16, 2009 Author Share Posted June 16, 2009 http://bidstick.com/latest/wii/6401.html Data is; Name: <title>Wii</title> <div class="bid_info"> <- all the bidding info (sold for, etc) <div class="time_info"> <- clock Quote Link to comment https://forums.phpfreaks.com/topic/162313-crawl-sites-data/#findComment-856700 Share on other sites More sharing options...
Maq Posted June 16, 2009 Share Posted June 16, 2009 You can accomplish this with either file_get_contents or cURL and preg_match_all. If you do a search in either here, or the PHP Regex section, you should be able to find helpful threads and similar ideas that will assist you. Good luck. Quote Link to comment https://forums.phpfreaks.com/topic/162313-crawl-sites-data/#findComment-856712 Share on other sites More sharing options...
gangsterwanster1 Posted June 16, 2009 Author Share Posted June 16, 2009 You can accomplish this with either file_get_contents or cURL and preg_match_all. If you do a search in either here, or the PHP Regex section, you should be able to find helpful threads and similar ideas that will assist you. Good luck. <?php $html = "<b>bold text</b><a href=howdy.html>click me</a>"; preg_match_all("/(<([\w]+)[^>]*>)(.*)(<\/\\2>)/", $html, $matches, PREG_SET_ORDER); foreach ($matches as $val) { echo "matched: " . $val[0] . "\n"; echo "part 1: " . $val[1] . "\n"; echo "part 2: " . $val[3] . "\n"; echo "part 3: " . $val[4] . "\n\n"; } ?> So based off of this example would you just do something like this? change $html = "<b>bold text</b><a href=howdy.html>click me</a>"; to $html = "<b>bold text</b><a href=http://bidstick.com/latest/d/6401.html>click me</a>"; Fairly confused, any ideas on making this work? Quote Link to comment https://forums.phpfreaks.com/topic/162313-crawl-sites-data/#findComment-856761 Share on other sites More sharing options...
Maq Posted June 16, 2009 Share Posted June 16, 2009 Kind of. You would have to match on those characters and patterns. It would look something like: $html = "bold textclick me"; $pattern = "~^(.*)(.*)~i"; preg_match_all($pattern, $html, $matches); echo "Bold: " . $matches[1][0] . " href: " . $matches[2][0]; ?> - The (.*) will capture any 0 or more characters in that position and add it to the $matches array. - The '\s', takes care of whitespace. - I had to escape the '.' because they are special characters (wildcards). So by escaping them, the pattern will take the dots as literal dots. - The tildes (~) are my delimiters and you need them around your pattern. - Finally the 'i' flag is for case-insensitivity. For more information read the tutorial here on phpfreaks - Regular Expressions (Part1) - Basic Syntax. You should also take a look at the function documentation from the manual - preg_match_all. Quote Link to comment https://forums.phpfreaks.com/topic/162313-crawl-sites-data/#findComment-856815 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.