hosker Posted June 4, 2011 Share Posted June 4, 2011 I am using an HTML DOM parser that pulls a specific div tag from a website and then puts that information on my site. The data being grabbed is within a table. I want the data within that table that I am grabbing to be stored into my MySQL database. I want it to update/overwrite it each time. The data is constantly changing and I do not need the old data. Here is the portion of the code I am using that actually gets the data: $grabber = new wlWgProcessor( "http://sports.yahoo.com/golf/pga/leaderboard/2011/19", new wlWgParam( '<div id="leaderboardtable">', //the desired tag to be extracted array( "search" => array( //needles ... 'class="title"', 'class="ss"', 'class="download"', 'class="preview"', 'class="getwix"', 'class="templateleft"', 'class="templateright"', '<a>', ), "replace" => array( //replaces ... 'class="your-title-class"', 'class="your-ss-class"', 'class="your-download-class"', 'class="your-preview-class"', 'class="your-getwix-class"', 'class="your-templateleft-class"', 'class="your-templateright-class"', '', ) ), array( //remove tags and their contents that contains ... // '<h1>', //all the <h1> tags including the Free Website Templates header text // '<div class="pages"', //the pages links // '<div class="about">', //the upper paragraph starting with "Website templates are pre-designed websites ..." // '<div style="clear:', //some empty div tag: note that this tag is incomplete, it will remove <div style="clear:both;"> and <div style="clear"> // '<div class="clear">', //some empty div tag // '<div style="margin-left:31px;display:block;">', //the Previous, Next links and the bottom paragraph starting with "All free website templates have been coded ..." // '<div class="templatedaily">' //the Template of the day header ) ), wlWgConfig::CACHE_TIME_1_MIN //the caching time (expressed in minutes) ); $grabber->draw(); //print out the extracted processed content Quote Link to comment https://forums.phpfreaks.com/topic/238421-storing-an-array-help/ Share on other sites More sharing options...
mikesta707 Posted June 4, 2011 Share Posted June 4, 2011 It sounds like what you are looking for is output buffering, and specifically, storing the contents of the buffer into a variable. look into the functions ob_start: http://www.php.net/manual/en/function.ob-start.php and ob_get_contents: http://www.php.net/manual/en/function.ob-get-contents.php it may also be useful to read about ob_end_flush: http://www.php.net/manual/en/function.ob-end-flush.php an example of their usage <?php ob_start(); echo "Hello "; $out1 = ob_get_contents();//$out1 now has hello echo "World"; $out2 = ob_get_contents();//$out2 now has world ob_end_flush();//this ends output buffering and prints whats in the buffer to the screen //alternatively, if you didnt want to output what was in the buffer, use ob_end_clean() ?> Quote Link to comment https://forums.phpfreaks.com/topic/238421-storing-an-array-help/#findComment-1225260 Share on other sites More sharing options...
hosker Posted June 5, 2011 Author Share Posted June 5, 2011 I don't see how using output buffering would help me store the data I pulled from the other website into my database. Below is a sample of the data I want stored: Round Pos Name 1 2 3 4 Playoff Today Total Strokes Purse 1 x-Keegan Bradley 66 71 72 68 4 -2 -3 277 $1,170,000 2 Ryan Palmer 65 67 73 72 5 +2 -3 277 $702,000 T3 Ryuji Imada 69 68 70 71 - +1 -2 278 $377,000 T3 Joe Ogilvie 66 70 72 70 - E -2 278 $377,000 5 Jason Day 72 71 69 67 - -3 -1 279 $260,000 T6 Matt Kuchar 69 71 68 72 - +2 E 280 $225,875 T6 John Rollins 68 70 71 71 - +1 E 280 $225,875 T8 Arjun Atwal 68 72 67 74 - +4 +1 281 $169,000 T8 James Driscoll 70 71 74 66 - -4 +1 281 $169,000 T8 Jason Dufner 70 70 72 69 - -1 +1 281 $169,000 T8 Jeff Overton 64 74 71 72 - +2 +1 281 $169,000 T8 Rod Pampling 70 68 71 72 - +2 +1 281 $169,000 T8 Nick Watney 68 68 73 72 - +2 +1 281 $169,000 T14 Chad Collins 67 69 75 71 - +1 +2 282 $107,250 T14 Steve Flesch 70 69 71 72 - +2 +2 282 $107,250 T14 Harrison Frazar 71 72 71 68 - -2 +2 282 $107,250 T14 Brian Gay 71 72 69 70 - E +2 282 $107,250 T14 Hunter Haas 70 72 69 71 - +1 +2 282 $107,250 T14 Justin Hicks 70 69 76 67 - -3 +2 282 $107,250 T20 Sergio Garcia 66 66 74 77 - +7 +3 283 $70,417 T20 Robert Garrigus 70 69 75 69 - -1 +3 283 $70,417 T20 Charles Howell III 71 70 72 70 - E +3 283 $70,417 T20 Brandt Jobe 67 72 72 72 - +2 +3 283 $70,417 T20 Dustin Johnson 66 75 69 73 - +3 +3 283 $70,417 T20 Tim Petrovic 69 66 74 74 - +4 +3 283 $70,417 26 Scott Piercy 66 69 74 75 - +5 +4 284 $52,000 T27 J.J. Henry 69 72 72 72 - +2 +5 285 $46,150 T27 Fredrik Jacobson 70 73 70 72 - +2 +5 285 $46,150 T27 Jerry Kelly 67 71 75 72 - +2 +5 285 $46,150 T27 Billy Mayfair 72 70 74 69 - -1 +5 285 $46,150 T27 Vijay Singh 68 73 69 75 - +5 +5 285 $46,150 T32 Ricky Barnes 67 72 75 72 - +2 +6 286 $35,193 T32 Chris DiMarco 70 67 75 74 - +4 +6 286 $35,193 T32 William McGirt 69 71 74 72 - +2 +6 286 $35,193 T32 George McNeill 69 74 73 70 - E +6 286 $35,193 T32 Michael Putnam 67 72 75 72 - +2 +6 286 $35,193 T32 Jordan Spieth 69 68 72 77 - +7 +6 286 - T32 Will Strickler 66 76 76 68 - -2 +6 286 $35,193 T32 Brett Wetterich 69 69 72 76 - +6 +6 286 $35,193 T40 Chad Campbell 69 74 71 73 - +3 +7 287 $26,650 T40 K.J. Choi 71 71 74 71 - +1 +7 287 $26,650 T40 Carl Pettersson 70 69 76 72 - +2 +7 287 $26,650 T40 D.A. Points 68 75 71 73 - +3 +7 287 $26,650 T40 Vaughn Taylor 67 73 70 77 - +7 +7 287 $26,650 T45 Greg Chalmers 73 70 75 70 - E +8 288 $20,800 T45 Scott Gordon 70 71 72 75 - +5 +8 288 $20,800 T45 Garth Mulroy 67 74 73 74 - +4 +8 288 $20,800 T45 Chris Riley 66 71 73 78 - +8 +8 288 $20,800 T49 Michael Bradley 68 73 73 75 - +5 +9 289 $16,337 T49 Robert Gamez 68 72 74 75 - +5 +9 289 $16,337 T49 Tim Herron 68 75 74 72 - +2 +9 289 $16,337 T49 Scott McCarron 69 73 76 71 - +1 +9 289 $16,337 T49 Fran Quinn 69 70 73 77 - +7 +9 289 $16,337 T49 Gary Woodland 69 71 68 81 - +11 +9 289 $16,337 T55 Michael Connell 71 70 74 75 - +5 +10 290 $14,820 T55 Martin Piller 68 72 75 75 - +5 +10 290 $14,820 T55 Ted Purdy 68 71 76 75 - +5 +10 290 $14,820 T55 Paul Stankowski 69 70 80 71 - +1 +10 290 $14,820 T55 Kyle Stanley 70 70 73 77 - +7 +10 290 $14,820 T60 Rich Beem 73 70 75 73 - +3 +11 291 $14,300 T60 Steven Bowditch 75 65 80 71 - +1 +11 291 $14,300 T60 D.J. Trahan 72 70 77 72 - +2 +11 291 $14,300 T63 Ben Crane 71 71 74 76 - +6 +12 292 $13,780 T63 Kevin Kisner 72 69 75 76 - +6 +12 292 $13,780 T63 Zack Miller 67 74 73 78 - +8 +12 292 $13,780 T63 Alex Prugh 71 72 72 77 - +7 +12 292 $13,780 T63 Jeff Quinney 66 75 72 79 - +9 +12 292 $13,780 T68 Anthony Kim 72 71 76 74 - +4 +13 293 $13,260 T68 Alexandre Rocha 71 70 78 74 - +4 +13 293 $13,260 T68 Josh Teater 66 71 76 80 - +10 +13 293 $13,260 71 Cameron Percy 71 72 75 77 - +7 +15 295 $13,000 72 Tag Ridings 70 73 81 74 - +4 +18 298 $12,870 73 Tommy Gainey 72 71 76 80 - +10 +19 299 $12,740 74 Tom Gillis 69 72 80 80 - +10 +21 301 $12,610 T75 Woody Austin 71 73 MC MC - - - 144 - T75 Joseph Bramlett 75 69 MC MC - - - 144 - T75 Bob Estes 69 75 MC MC - - - 144 - T75 Todd Fischer 71 73 MC MC - - - 144 - T75 Andres Gonzales 73 71 MC MC - - - 144 - T75 Jim Herman 70 74 MC MC - - - 144 - T75 Sunghoon Kang 71 73 MC MC - - - 144 - T75 Jarrod Lyle 69 75 MC MC - - - 144 - T75 Ben Martin 72 72 MC MC - - - 144 - T75 John Senden 70 74 MC MC - - - 144 - T75 Chris Stroud 69 75 MC MC - - - 144 - T75 Duffy Waldorf 72 72 MC MC - - - 144 - T75 Mike Weir 74 70 MC MC - - - 144 - T88 Briny Baird 70 75 MC MC - - - 145 - T88 Shane Bertsch 70 75 MC MC - - - 145 - T88 Colt Knost 74 71 MC MC - - - 145 - T88 John Mallinger 73 72 MC MC - - - 145 - T88 Shaun Micheel 73 72 MC MC - - - 145 - T88 Bryce Molder 74 71 MC MC - - - 145 - T88 Michael Thompson 73 72 MC MC - - - 145 - T88 Chris Tidland 69 76 MC MC - - - 145 - T88 Charlie Wi 69 76 MC MC - - - 145 - T97 Kevin Chappell 73 73 MC MC - - - 146 - T97 Joe Durant 71 75 MC MC - - - 146 - T97 Kent Jones 72 74 MC MC - - - 146 - T97 David Mathis 71 75 MC MC - - - 146 - T97 Parker McLachlin 74 72 MC MC - - - 146 - T97 Matt McQuillan 73 73 MC MC - - - 146 - T97 John Merrick 70 76 MC MC - - - 146 - T97 Nick O'Hern 69 77 MC MC - - - 146 - T97 Chez Reavie 71 75 MC MC - - - 146 - T97 Michael Sim 71 75 MC MC - - - 146 - T97 Heath Slocum 76 70 MC MC - - - 146 - T97 Jimmy Walker 75 71 MC MC - - - 146 - T97 Dean Wilson 69 77 MC MC - - - 146 - T110 Robert Allenby 69 78 MC MC - - - 147 - T110 Scott Gutschewski 69 78 MC MC - - - 147 - T110 J.P. Hayes 74 73 MC MC - - - 147 - T110 David Hearn 73 74 MC MC - - - 147 - T110 Marc Leishman 70 77 MC MC - - - 147 - T110 Justin Leonard 76 71 MC MC - - - 147 - T110 Andres Romero 70 77 MC MC - - - 147 - T110 Scott Verplank 73 74 MC MC - - - 147 - T110 Charles Warren 75 72 MC MC - - - 147 - T119 Chris Baryla 71 77 MC MC - - - 148 - T119 Brian Davis 72 76 MC MC - - - 148 - T119 Nathan Green 75 73 MC MC - - - 148 - T119 Charley Hoffman 74 74 MC MC - - - 148 - T119 Bio Kim 72 76 MC MC - - - 148 - T119 Michael Letzig 72 76 MC MC - - - 148 - T119 Sean O'Hair 75 73 MC MC - - - 148 - T119 Nate Smith 75 73 MC MC - - - 148 - T119 Sam Smith 75 73 MC MC - - - 148 - T119 Matt Weibring 72 76 MC MC - - - 148 - T119 Garrett Willis 72 76 MC MC - - - 148 - T130 Matt Bettencourt 73 76 MC MC - - - 149 - T130 Kris Blanks 73 76 MC MC - - - 149 - T130 Martin Flores 72 77 MC MC - - - 149 - T130 Aron Price 77 72 MC MC - - - 149 - T130 Jim Renner 78 71 MC MC - - - 149 - T130 Cameron Tringale 71 78 MC MC - - - 149 - T136 Cameron Beckman 71 79 MC MC - - - 150 - T136 Steve Elkington 73 77 MC MC - - - 150 - T136 Richard S. Johnson 75 75 MC MC - - - 150 - T136 Jerod Turner 72 78 MC MC - - - 150 - T140 Fabian Gomez 73 78 MC MC - - - 151 - T140 Todd Hamilton 72 79 MC MC - - - 151 - T140 Rory Sabbatini 69 82 MC MC - - - 151 - 143 Lee Janzen 74 78 MC MC - - - 152 - T144 Stephen Ames 71 82 MC MC - - - 153 - T144 Bobby Gates 71 82 MC MC - - - 153 - T144 Billy Horschel 73 80 MC MC - - - 153 - T144 Derek Lamely 72 81 MC MC - - - 153 - T144 Scott Stallings 76 77 MC MC - - - 153 - T149 Troy Matteson 76 78 MC MC - - - 154 - T149 Daniel Summerhays 72 82 MC MC - - - 154 - T151 D.J. Brigman 80 76 MC MC - - - 156 - T151 Rick Woodson 77 79 MC MC - - - 156 - 153 Jeff Maggert 72 34 WD WD - - - 106 - 154 Chris Kirk 68 75 DQ DQ - - - 143 - 155 Alex Cejka 74 WD WD WD - - - 74 - - Blake Adams - - - - - - - - - Quote Link to comment https://forums.phpfreaks.com/topic/238421-storing-an-array-help/#findComment-1225289 Share on other sites More sharing options...
mikesta707 Posted June 5, 2011 Share Posted June 5, 2011 well based on what you said and the code you posted, I assumed that this line printed the information you get $grabber->draw(); and what you want to do is store what is printed there. Is my assumption incorrect? If so can you reclarify what you wanted Quote Link to comment https://forums.phpfreaks.com/topic/238421-storing-an-array-help/#findComment-1225296 Share on other sites More sharing options...
hosker Posted June 5, 2011 Author Share Posted June 5, 2011 That is the code that outputs the data, but if I use your output buffering method, how can I store each row of the table into its own row in the database? As you can tell I am fairly new to PHP and I am still learning. Your help is much appreciated. Quote Link to comment https://forums.phpfreaks.com/topic/238421-storing-an-array-help/#findComment-1225573 Share on other sites More sharing options...
mikesta707 Posted June 5, 2011 Share Posted June 5, 2011 I see. So you need to basically parse the output table and insert each row into its own row in the table. THere is no easy way to go about this. If you have access to the grabber function, you can alter that. Otherwise, you will have to use some combination of explode, and simple substring matching, or more complicated regular expression matching on the string you catch with output buffering see: explode:http://php.net/manual/en/function.explode.php substr: http://php.net/manual/en/function.substr.php strpos: http://php.net/manual/en/function.strpos.php for regular expression see preg_match: http://php.net/manual/en/function.preg-match.php for a general tutorial on what regular expressions are see: http://www.regular-expressions.info/tutorial.html Hope this helps Quote Link to comment https://forums.phpfreaks.com/topic/238421-storing-an-array-help/#findComment-1225575 Share on other sites More sharing options...
hosker Posted June 5, 2011 Author Share Posted June 5, 2011 I do have complete access to the PHP Grabber application code. By having access to the code itself, which option would you recommend I use? Quote Link to comment https://forums.phpfreaks.com/topic/238421-storing-an-array-help/#findComment-1225579 Share on other sites More sharing options...
mikesta707 Posted June 5, 2011 Share Posted June 5, 2011 well assuming you didn't write it yourself, I wouldn't mess around with it. Perhaps if you posted what methods it has, or at least the function it uses to get the data I would be able to give a more definitive answer Quote Link to comment https://forums.phpfreaks.com/topic/238421-storing-an-array-help/#findComment-1225580 Share on other sites More sharing options...
hosker Posted June 5, 2011 Author Share Posted June 5, 2011 I have posted the code below. Also here is a link to the online documentation: http://wiseloop.net/wiseloop/phpwebgrabber/documentation/html/index.html <?php /** * WiseLoop Web Grabber Processor class definition<br/> * This class is designed to retreive various tag contents from an url and stores them in the $_result variable.<br/> * Also, it is capable to do some processing (string replacements and tags removal) on the extracted contents.<br/> * The information needed to extract and process must be provided into one array consisting of a list of wlWgParam objects. * @note WiseLoop takes no responsibility if the targeted url changes its tag structure or its HTML DOM tree, resulting in unexpected data retrieval; * this will not be considered as malfunction or bug, and you should check the targeted url's HTML DOM tree for changes and modify the code that instatiates this class or any inherited classes.<br/> * Also, WiseLoop assumes no responsibility for any abusive use of this class and/or violation of terms of usage of the target url. * @see wlWgParam * @author WiseLoop */ class wlWgProcessor { /** * String used as a separator when writting to cache the different grabbed contents extracted from the same url */ const DELIMITER = "<!--WLWG-->"; /** * @var wlCurl the real target url to be parsed, scanned and processed */ private $_curl; /** * @var array|wlWgParam the parameters that contains the information to extract and process the full grabbed content of the $_targetUrl */ private $_params; /** * @var int caching time expressed in minutes * @see wlWgConfig */ private $_cacheTime; /** * @var array the resulting processed grabed contents */ private $_result; /** * Constructor.<br/> * Creates a wlWgProcessor object. * @param string $targetUrl real target url to be parsed, scanned and processed * @param array|wlWgParam $params the parameters that contains the information to extract and process the full grabbed content of the $_targetUrl * @param int $cacheTime * @return void */ public function __construct($targetUrl, $params = null, $cacheTime = wlWgConfig::DEFAULT_CACHE_TIME) { $this->setUrl($targetUrl); if (is_array($params)) { $this->_params = $params; }else { $this->_params = array($params); } $this->_cacheTime = $cacheTime; $this->_result = null; } /** * Sets the target url to be parsed, scanned and processed * @param string $targetUrl real target url to be parsed, scanned and processed * @return void */ public function setUrl($targetUrl) { if(!isset($this->_curl)) { $this->_curl = new wlCurl($targetUrl); } $this->_curl->setUrl($targetUrl); } /** * Returns the target url string to be parsed, scanned and processed * @return string */ public function getUrl() { return $this->_curl->getUrl(); } /** * Sets the caching time * @param int $cacheTime the new caching time expressed in minutes * @return void */ public function setCacheTime($cacheTime) { $this->_cacheTime = $cacheTime; } /** * Returns the caching time * @return int the cache time */ public function getCacheTime() { return $this->_cacheTime; } /** * Appends a wlWgParam object to the $_params list * @param wlWgParam $param * @return void */ public function addParam($param) { $this->_params[] = $param; } /** * Removes all the wlWgParam objects from parameters list * @return void */ public function removeParams() { unset($this->_params); $this->_params = null; } /** * Parses the $_targetUrl contents and fills the $_result with the grabbed contents obtained by processing all the parameters founded in $_params against the $_targetUrl's contents. * @return void */ private function process() { $ret = array(); try { $urlContent = $this->loadUrl(); } catch (Exception $ex) { $this->_result = array($ex->getMessage()); return; } /** * @var wlWgParam $param */ if(isset($this->_params)) { foreach ($this->_params as $param) { $content = $urlContent; if (isset($param->tagSlice)) { $content = wlHtmlDom::getTagContent($content, $param->tagSlice); if (false === $content) { $content = htmlentities(sprintf("Tag %s not found.", $param->tagSlice)); } else { if (isset($param->removeTags)) { if (is_array($param->removeTags)) { foreach ($param->removeTags as $rTag) { $rTagContents = wlHtmlDom::getTagContents($content, $rTag); $content = str_replace($rTagContents, '', $content); } } } if (isset($param->stripTags)) { if (is_array($param->stripTags)) { foreach ($param->stripTags as $sTag) { $sTagContentsFull = wlHtmlDom::getTagContents($content, $sTag, false); $sTagContentsStripped = wlHtmlDom::getTagContents($content, $sTag, true); $content = str_replace($sTagContentsFull, $sTagContentsStripped, $content); } } } if (isset($param->replaceStrings)) { $search = wlWgUtils::getArrayValue($param->replaceStrings, array("search", 0), ""); $replace = wlWgUtils::getArrayValue($param->replaceStrings, array("replace", 0), ""); if ('' !== $search) { $content = str_replace($search, $replace, $content); } } } $ret[] = $content; } } } $this->_result = $ret; if ($this->_cacheTime) { $this->saveCache(); } } /** * Reads an entire content of the $_targetUrl * @return string the contens of the $_targetUrl */ private function loadUrl() { if (!$this->_curl->getExists()) { $msg = '<div class="error">'; $msg .= 'URL "'.$this->_curl->getUrl().'" does not exist, is not readable or is protected against scraping.<br/>'; $msg .= 'Check if your IP address "'.$_SERVER["SERVER_ADDR"].'" has access permission to this URL.<br/>'; if(!wlCurl::isCurlEnabled() || !wlCurl::isFopenEnabled()) { $msg .= wlCurl::getUnableMessage(); } $hdrs = $this->_curl->getHeaders(); if(isset($hdrs)) { $msg .= 'Headers received:<br/>'; $msg .= ('<pre>'.print_r($hdrs, true).'</pre>'); } $msg .= '</div>'; throw new Exception($msg); } return $this->_curl->getContents(); } /** * Loads the results form the cache. * @return void */ private function loadCache() { $cache = new wlCurl($this->getCacheFilePath()); $content = $cache->getContents(); $this->_result = explode(self::DELIMITER, $content); } /** * Returns the grabbed results. * @return array the grabbed results */ public function get() { if ($this->_result === null) { if ($this->isCacheUpdated()) { $this->loadCache(); }else { $this->process(); } } return $this->_result; } /** * Prints the grabbed results. * @return void */ public function draw() { $ret = $this->get(); foreach ($ret as $item) { echo $item; } } /** * Saves the grabbed results to the cache. * @return bool if the save was sucesfull */ private function saveCache() { $cacheFilePath = $this->getCacheFilePath(); if (!$cacheFilePath) { return false; } $fh = @fopen($cacheFilePath, "w"); if (!$fh) { return false; } $ret = ""; foreach ($this->_result as $content) { $ret .= ($content . self::DELIMITER); } if (substr($ret, -1 * strlen(self::DELIMITER)) == self::DELIMITER) { $ret = substr($ret, 0, strlen($ret) - strlen(self::DELIMITER)); } fwrite($fh, $ret); fclose($fh); return true; } /** * Tests if the html cache is up to date. * @return bool if html cache is up to date */ private function isCacheUpdated() { $cacheFilePath = $this->getCacheFilePath(); if (!$cacheFilePath) { return false; } if (file_exists($cacheFilePath) && filemtime($cacheFilePath) + ($this->_cacheTime * 60) >= time()) { return true; } return false; } /** * Generates an unique cache file name. * @return string the cache file name */ private function getCacheFileName() { $ret = $this->_curl->getUrl(); if (isset($this->_params)) { if (is_array($this->_params)) { $ret .= serialize($this->_params); } } return md5($ret) . ".html"; } /** * Returns the html cache real path. * @return string the cache file path */ private function getCacheFilePath() { $cacheFileName = $this->getCacheFileName(); if (!$cacheFileName) { return false; } return dirname(__FILE__) . "/../cache/" . $cacheFileName; } } ?> Quote Link to comment https://forums.phpfreaks.com/topic/238421-storing-an-array-help/#findComment-1225583 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.