Jump to content

Recommended Posts

I am using an HTML DOM parser that pulls a specific div tag from a website and then puts that information on my site. The data being grabbed is within a table. I want the data within that table that I am grabbing to be stored into my MySQL database. I want it to update/overwrite it each time. The data is constantly changing and I do not need the old data. Here is the portion of the code I am using that actually gets the data:

 



$grabber = new wlWgProcessor(
    "http://sports.yahoo.com/golf/pga/leaderboard/2011/19",                     
    new wlWgParam(
        '<div id="leaderboardtable">',                                  //the desired tag to be extracted
        array(
            "search" => array(                                  //needles ...
                'class="title"',
                'class="ss"',
                'class="download"',
                'class="preview"',
                'class="getwix"',
                'class="templateleft"',
                'class="templateright"',
			'<a>',

            ),
            "replace" => array(                                 //replaces ...
                'class="your-title-class"',
                'class="your-ss-class"',
                'class="your-download-class"',
                'class="your-preview-class"',
                'class="your-getwix-class"',
                'class="your-templateleft-class"',
                'class="your-templateright-class"',
			'',
            )
        ),
        array(                                                  //remove tags and their contents that contains ...
        //    '<h1>',                                             //all the <h1> tags including the Free Website Templates header text
         //   '<div class="pages"',                               //the pages links
        //    '<div class="about">',                              //the upper paragraph starting with "Website templates are pre-designed websites ..."
        //    '<div style="clear:',                               //some empty div tag: note that this tag is incomplete, it will remove <div style="clear:both;"> and <div style="clear">
        //    '<div class="clear">',                              //some empty div tag
        //    '<div style="margin-left:31px;display:block;">',    //the Previous, Next links and the bottom paragraph starting with "All free website templates have been coded ..."
        //    '<div class="templatedaily">'                       //the Template of the day header
	)
    ),
    wlWgConfig::CACHE_TIME_1_MIN                               //the caching time (expressed in minutes)
);
$grabber->draw();                                               //print out the extracted processed content

Link to comment
https://forums.phpfreaks.com/topic/238421-storing-an-array-help/
Share on other sites

It sounds like what you are looking for is output buffering, and specifically, storing the contents of the buffer into a variable.

 

look into the functions ob_start: http://www.php.net/manual/en/function.ob-start.php

and ob_get_contents: http://www.php.net/manual/en/function.ob-get-contents.php

 

it may also be useful to read about ob_end_flush: http://www.php.net/manual/en/function.ob-end-flush.php

 

an example of their usage

<?php

ob_start();

echo "Hello ";

$out1 = ob_get_contents();//$out1 now has hello

echo "World";

$out2 = ob_get_contents();//$out2 now has world

ob_end_flush();//this ends output buffering and prints whats in the buffer to the screen
//alternatively, if you didnt want to output what was in the buffer, use ob_end_clean()
?>

I don't see how using output buffering would help me store the data I pulled from the other website into my database. Below is a sample of the data I want stored:

 

 

Round

Pos Name 1 2 3 4 Playoff Today Total Strokes Purse

1 x-Keegan Bradley 66 71 72 68 4 -2 -3 277 $1,170,000

2 Ryan Palmer 65 67 73 72 5 +2 -3 277 $702,000

T3 Ryuji Imada 69 68 70 71 - +1 -2 278 $377,000

T3 Joe Ogilvie 66 70 72 70 - E -2 278 $377,000

5 Jason Day 72 71 69 67 - -3 -1 279 $260,000

T6 Matt Kuchar 69 71 68 72 - +2 E 280 $225,875

T6 John Rollins 68 70 71 71 - +1 E 280 $225,875

T8 Arjun Atwal 68 72 67 74 - +4 +1 281 $169,000

T8 James Driscoll 70 71 74 66 - -4 +1 281 $169,000

T8 Jason Dufner 70 70 72 69 - -1 +1 281 $169,000

T8 Jeff Overton 64 74 71 72 - +2 +1 281 $169,000

T8 Rod Pampling 70 68 71 72 - +2 +1 281 $169,000

T8 Nick Watney 68 68 73 72 - +2 +1 281 $169,000

T14 Chad Collins 67 69 75 71 - +1 +2 282 $107,250

T14 Steve Flesch 70 69 71 72 - +2 +2 282 $107,250

T14 Harrison Frazar 71 72 71 68 - -2 +2 282 $107,250

T14 Brian Gay 71 72 69 70 - E +2 282 $107,250

T14 Hunter Haas 70 72 69 71 - +1 +2 282 $107,250

T14 Justin Hicks 70 69 76 67 - -3 +2 282 $107,250

T20 Sergio Garcia 66 66 74 77 - +7 +3 283 $70,417

T20 Robert Garrigus 70 69 75 69 - -1 +3 283 $70,417

T20 Charles Howell III 71 70 72 70 - E +3 283 $70,417

T20 Brandt Jobe 67 72 72 72 - +2 +3 283 $70,417

T20 Dustin Johnson 66 75 69 73 - +3 +3 283 $70,417

T20 Tim Petrovic 69 66 74 74 - +4 +3 283 $70,417

26 Scott Piercy 66 69 74 75 - +5 +4 284 $52,000

T27 J.J. Henry 69 72 72 72 - +2 +5 285 $46,150

T27 Fredrik Jacobson 70 73 70 72 - +2 +5 285 $46,150

T27 Jerry Kelly 67 71 75 72 - +2 +5 285 $46,150

T27 Billy Mayfair 72 70 74 69 - -1 +5 285 $46,150

T27 Vijay Singh 68 73 69 75 - +5 +5 285 $46,150

T32 Ricky Barnes 67 72 75 72 - +2 +6 286 $35,193

T32 Chris DiMarco 70 67 75 74 - +4 +6 286 $35,193

T32 William McGirt 69 71 74 72 - +2 +6 286 $35,193

T32 George McNeill 69 74 73 70 - E +6 286 $35,193

T32 Michael Putnam 67 72 75 72 - +2 +6 286 $35,193

T32 Jordan Spieth 69 68 72 77 - +7 +6 286 -

T32 Will Strickler 66 76 76 68 - -2 +6 286 $35,193

T32 Brett Wetterich 69 69 72 76 - +6 +6 286 $35,193

T40 Chad Campbell 69 74 71 73 - +3 +7 287 $26,650

T40 K.J. Choi 71 71 74 71 - +1 +7 287 $26,650

T40 Carl Pettersson 70 69 76 72 - +2 +7 287 $26,650

T40 D.A. Points 68 75 71 73 - +3 +7 287 $26,650

T40 Vaughn Taylor 67 73 70 77 - +7 +7 287 $26,650

T45 Greg Chalmers 73 70 75 70 - E +8 288 $20,800

T45 Scott Gordon 70 71 72 75 - +5 +8 288 $20,800

T45 Garth Mulroy 67 74 73 74 - +4 +8 288 $20,800

T45 Chris Riley 66 71 73 78 - +8 +8 288 $20,800

T49 Michael Bradley 68 73 73 75 - +5 +9 289 $16,337

T49 Robert Gamez 68 72 74 75 - +5 +9 289 $16,337

T49 Tim Herron 68 75 74 72 - +2 +9 289 $16,337

T49 Scott McCarron 69 73 76 71 - +1 +9 289 $16,337

T49 Fran Quinn 69 70 73 77 - +7 +9 289 $16,337

T49 Gary Woodland 69 71 68 81 - +11 +9 289 $16,337

T55 Michael Connell 71 70 74 75 - +5 +10 290 $14,820

T55 Martin Piller 68 72 75 75 - +5 +10 290 $14,820

T55 Ted Purdy 68 71 76 75 - +5 +10 290 $14,820

T55 Paul Stankowski 69 70 80 71 - +1 +10 290 $14,820

T55 Kyle Stanley 70 70 73 77 - +7 +10 290 $14,820

T60 Rich Beem 73 70 75 73 - +3 +11 291 $14,300

T60 Steven Bowditch 75 65 80 71 - +1 +11 291 $14,300

T60 D.J. Trahan 72 70 77 72 - +2 +11 291 $14,300

T63 Ben Crane 71 71 74 76 - +6 +12 292 $13,780

T63 Kevin Kisner 72 69 75 76 - +6 +12 292 $13,780

T63 Zack Miller 67 74 73 78 - +8 +12 292 $13,780

T63 Alex Prugh 71 72 72 77 - +7 +12 292 $13,780

T63 Jeff Quinney 66 75 72 79 - +9 +12 292 $13,780

T68 Anthony Kim 72 71 76 74 - +4 +13 293 $13,260

T68 Alexandre Rocha 71 70 78 74 - +4 +13 293 $13,260

T68 Josh Teater 66 71 76 80 - +10 +13 293 $13,260

71 Cameron Percy 71 72 75 77 - +7 +15 295 $13,000

72 Tag Ridings 70 73 81 74 - +4 +18 298 $12,870

73 Tommy Gainey 72 71 76 80 - +10 +19 299 $12,740

74 Tom Gillis 69 72 80 80 - +10 +21 301 $12,610

T75 Woody Austin 71 73 MC MC - - - 144 -

T75 Joseph Bramlett 75 69 MC MC - - - 144 -

T75 Bob Estes 69 75 MC MC - - - 144 -

T75 Todd Fischer 71 73 MC MC - - - 144 -

T75 Andres Gonzales 73 71 MC MC - - - 144 -

T75 Jim Herman 70 74 MC MC - - - 144 -

T75 Sunghoon Kang 71 73 MC MC - - - 144 -

T75 Jarrod Lyle 69 75 MC MC - - - 144 -

T75 Ben Martin 72 72 MC MC - - - 144 -

T75 John Senden 70 74 MC MC - - - 144 -

T75 Chris Stroud 69 75 MC MC - - - 144 -

T75 Duffy Waldorf 72 72 MC MC - - - 144 -

T75 Mike Weir 74 70 MC MC - - - 144 -

T88 Briny Baird 70 75 MC MC - - - 145 -

T88 Shane Bertsch 70 75 MC MC - - - 145 -

T88 Colt Knost 74 71 MC MC - - - 145 -

T88 John Mallinger 73 72 MC MC - - - 145 -

T88 Shaun Micheel 73 72 MC MC - - - 145 -

T88 Bryce Molder 74 71 MC MC - - - 145 -

T88 Michael Thompson 73 72 MC MC - - - 145 -

T88 Chris Tidland 69 76 MC MC - - - 145 -

T88 Charlie Wi 69 76 MC MC - - - 145 -

T97 Kevin Chappell 73 73 MC MC - - - 146 -

T97 Joe Durant 71 75 MC MC - - - 146 -

T97 Kent Jones 72 74 MC MC - - - 146 -

T97 David Mathis 71 75 MC MC - - - 146 -

T97 Parker McLachlin 74 72 MC MC - - - 146 -

T97 Matt McQuillan 73 73 MC MC - - - 146 -

T97 John Merrick 70 76 MC MC - - - 146 -

T97 Nick O'Hern 69 77 MC MC - - - 146 -

T97 Chez Reavie 71 75 MC MC - - - 146 -

T97 Michael Sim 71 75 MC MC - - - 146 -

T97 Heath Slocum 76 70 MC MC - - - 146 -

T97 Jimmy Walker 75 71 MC MC - - - 146 -

T97 Dean Wilson 69 77 MC MC - - - 146 -

T110 Robert Allenby 69 78 MC MC - - - 147 -

T110 Scott Gutschewski 69 78 MC MC - - - 147 -

T110 J.P. Hayes 74 73 MC MC - - - 147 -

T110 David Hearn 73 74 MC MC - - - 147 -

T110 Marc Leishman 70 77 MC MC - - - 147 -

T110 Justin Leonard 76 71 MC MC - - - 147 -

T110 Andres Romero 70 77 MC MC - - - 147 -

T110 Scott Verplank 73 74 MC MC - - - 147 -

T110 Charles Warren 75 72 MC MC - - - 147 -

T119 Chris Baryla 71 77 MC MC - - - 148 -

T119 Brian Davis 72 76 MC MC - - - 148 -

T119 Nathan Green 75 73 MC MC - - - 148 -

T119 Charley Hoffman 74 74 MC MC - - - 148 -

T119 Bio Kim 72 76 MC MC - - - 148 -

T119 Michael Letzig 72 76 MC MC - - - 148 -

T119 Sean O'Hair 75 73 MC MC - - - 148 -

T119 Nate Smith 75 73 MC MC - - - 148 -

T119 Sam Smith 75 73 MC MC - - - 148 -

T119 Matt Weibring 72 76 MC MC - - - 148 -

T119 Garrett Willis 72 76 MC MC - - - 148 -

T130 Matt Bettencourt 73 76 MC MC - - - 149 -

T130 Kris Blanks 73 76 MC MC - - - 149 -

T130 Martin Flores 72 77 MC MC - - - 149 -

T130 Aron Price 77 72 MC MC - - - 149 -

T130 Jim Renner 78 71 MC MC - - - 149 -

T130 Cameron Tringale 71 78 MC MC - - - 149 -

T136 Cameron Beckman 71 79 MC MC - - - 150 -

T136 Steve Elkington 73 77 MC MC - - - 150 -

T136 Richard S. Johnson 75 75 MC MC - - - 150 -

T136 Jerod Turner 72 78 MC MC - - - 150 -

T140 Fabian Gomez 73 78 MC MC - - - 151 -

T140 Todd Hamilton 72 79 MC MC - - - 151 -

T140 Rory Sabbatini 69 82 MC MC - - - 151 -

143 Lee Janzen 74 78 MC MC - - - 152 -

T144 Stephen Ames 71 82 MC MC - - - 153 -

T144 Bobby Gates 71 82 MC MC - - - 153 -

T144 Billy Horschel 73 80 MC MC - - - 153 -

T144 Derek Lamely 72 81 MC MC - - - 153 -

T144 Scott Stallings 76 77 MC MC - - - 153 -

T149 Troy Matteson 76 78 MC MC - - - 154 -

T149 Daniel Summerhays 72 82 MC MC - - - 154 -

T151 D.J. Brigman 80 76 MC MC - - - 156 -

T151 Rick Woodson 77 79 MC MC - - - 156 -

153 Jeff Maggert 72 34 WD WD - - - 106 -

154 Chris Kirk 68 75 DQ DQ - - - 143 -

155 Alex Cejka 74 WD WD WD - - - 74 -

- Blake Adams - - - - - - - - -

well based on what you said and the code you posted, I assumed that this line printed the information you get

$grabber->draw(); 

and what you want to do is store what is printed there. Is my assumption incorrect? If so can you reclarify what you wanted

I see. So you need to basically parse the output table and insert each row into its own row in the table. THere is no easy way to go about this. If you have access to the grabber function, you can alter that. Otherwise, you will have to use some combination of explode, and simple substring matching, or more complicated regular expression matching on the string you catch with output buffering

 

see:

explode:http://php.net/manual/en/function.explode.php

substr: http://php.net/manual/en/function.substr.php

strpos: http://php.net/manual/en/function.strpos.php

 

for regular expression see

preg_match: http://php.net/manual/en/function.preg-match.php

for a general tutorial on what regular expressions are see: http://www.regular-expressions.info/tutorial.html

 

Hope this helps

I have posted the code below. Also here is a link to the online documentation: http://wiseloop.net/wiseloop/phpwebgrabber/documentation/html/index.html

 


<?php
/**
* WiseLoop Web Grabber Processor class definition<br/>
* This class is designed to retreive various tag contents from an url and stores them in the $_result variable.<br/>
* Also, it is capable to do some processing (string replacements and tags removal) on the extracted contents.<br/>
* The information needed to extract and process must be provided into one array consisting of a list of wlWgParam objects.
* @note WiseLoop takes no responsibility if the targeted url changes its tag structure or its HTML DOM tree, resulting in unexpected data retrieval;
* this will not be considered as malfunction or bug, and you should check the targeted url's HTML DOM tree for changes and modify the code that instatiates this class or any inherited classes.<br/>
* Also, WiseLoop assumes no responsibility for any abusive use of this class and/or violation of terms of usage of the target url.
* @see wlWgParam
* @author WiseLoop
*/
class wlWgProcessor {
    /**
     * String used as a separator when writting to cache the different grabbed contents extracted from the same url
     */
    const DELIMITER = "<!--WLWG-->";

    /**
     * @var wlCurl the real target url to be parsed, scanned and processed
     */
    private $_curl;

    /**
     * @var array|wlWgParam the parameters that contains the information to extract and process the full grabbed content of the $_targetUrl
     */
    private $_params;

    /**
     * @var int caching time expressed in minutes
     * @see wlWgConfig
     */
    private $_cacheTime;

    /**
     * @var array the resulting processed grabed contents
     */
    private $_result;

    /**
     * Constructor.<br/>
     * Creates a wlWgProcessor object.
     * @param string $targetUrl real target url to be parsed, scanned and processed
     * @param array|wlWgParam $params the parameters that contains the information to extract and process the full grabbed content of the $_targetUrl
     * @param int $cacheTime
     * @return void
     */
    public function __construct($targetUrl, $params = null, $cacheTime = wlWgConfig::DEFAULT_CACHE_TIME) {
        $this->setUrl($targetUrl);
        if (is_array($params)) {
            $this->_params = $params;
        }else {
            $this->_params = array($params);
        }
        $this->_cacheTime = $cacheTime;
        $this->_result = null;
    }

    /**
     * Sets the target url to be parsed, scanned and processed
     * @param string $targetUrl real target url to be parsed, scanned and processed
     * @return void
     */
    public function setUrl($targetUrl) {
        if(!isset($this->_curl)) {
            $this->_curl = new wlCurl($targetUrl);
        }
        $this->_curl->setUrl($targetUrl);
    }

    /**
     * Returns the target url string to be parsed, scanned and processed
     * @return string
     */
    public function getUrl() {
        return $this->_curl->getUrl();
    }

    /**
     * Sets the caching time
     * @param int $cacheTime the new caching time expressed in minutes
     * @return void
     */
    public function setCacheTime($cacheTime) {
        $this->_cacheTime = $cacheTime;
    }

    /**
     * Returns the caching time
     * @return int the cache time
     */
    public function getCacheTime() {
        return $this->_cacheTime;
    }

    /**
     * Appends a wlWgParam object to the $_params list
     * @param wlWgParam $param
     * @return void
     */
    public function addParam($param) {
        $this->_params[] = $param;
    }

    /**
     * Removes all the wlWgParam objects from parameters list
     * @return void
     */
    public function removeParams() {
        unset($this->_params);
        $this->_params = null;
    }

    /**
     * Parses the $_targetUrl contents and fills the $_result with the grabbed contents obtained by processing all the parameters founded in $_params against the $_targetUrl's contents.
     * @return void
     */
    private function process() {
        $ret = array();
        try
        {
            $urlContent = $this->loadUrl();
        } catch (Exception $ex)
        {
            $this->_result = array($ex->getMessage());
            return;
        }

        /**
         * @var wlWgParam $param
         */
        if(isset($this->_params)) {
            foreach ($this->_params as $param) {
                $content = $urlContent;
                if (isset($param->tagSlice)) {
                    $content = wlHtmlDom::getTagContent($content, $param->tagSlice);
                    if (false === $content) {
                        $content = htmlentities(sprintf("Tag %s not found.", $param->tagSlice));
                    }
                    else
                    {
                        if (isset($param->removeTags)) {
                            if (is_array($param->removeTags)) {
                                foreach ($param->removeTags as $rTag) {
                                    $rTagContents = wlHtmlDom::getTagContents($content, $rTag);
                                    $content = str_replace($rTagContents, '', $content);
                                }
                            }
                        }

                        if (isset($param->stripTags)) {
                            if (is_array($param->stripTags)) {
                                foreach ($param->stripTags as $sTag) {
                                    $sTagContentsFull = wlHtmlDom::getTagContents($content, $sTag, false);
                                    $sTagContentsStripped = wlHtmlDom::getTagContents($content, $sTag, true);
                                    $content = str_replace($sTagContentsFull, $sTagContentsStripped, $content);
                                }
                            }
                        }

                        if (isset($param->replaceStrings)) {
                            $search = wlWgUtils::getArrayValue($param->replaceStrings, array("search", 0), "");
                            $replace = wlWgUtils::getArrayValue($param->replaceStrings, array("replace", 0), "");

                            if ('' !== $search) {
                                $content = str_replace($search, $replace, $content);
                            }
                        }
                    }
                    $ret[] = $content;
                }
            }
        }
        $this->_result = $ret;

        if ($this->_cacheTime) {
            $this->saveCache();
        }
    }

    /**
     * Reads an entire content of the $_targetUrl
     * @return string the contens of the $_targetUrl
     */
    private function loadUrl() {
        if (!$this->_curl->getExists()) {
            $msg = '<div class="error">';
            $msg .= 'URL "'.$this->_curl->getUrl().'" does not exist, is not readable or is protected against scraping.<br/>';
            $msg .= 'Check if your IP address "'.$_SERVER["SERVER_ADDR"].'" has access permission to this URL.<br/>';
            if(!wlCurl::isCurlEnabled() || !wlCurl::isFopenEnabled()) {
                $msg .= wlCurl::getUnableMessage();
            }
            $hdrs = $this->_curl->getHeaders();
            if(isset($hdrs)) {
                $msg .= 'Headers received:<br/>';
                $msg .= ('<pre>'.print_r($hdrs, true).'</pre>');
            }
            $msg .= '</div>';

            throw new Exception($msg);
        }

        return $this->_curl->getContents();
    }

    /**
     * Loads the results form the cache.
     * @return void
     */
    private function loadCache() {
        $cache = new wlCurl($this->getCacheFilePath());
        $content = $cache->getContents();
        $this->_result = explode(self::DELIMITER, $content);
    }

    /**
     * Returns the grabbed results.
     * @return array the grabbed results
     */
    public function get() {
        if ($this->_result === null) {
            if ($this->isCacheUpdated()) {
                $this->loadCache();
            }else {
                $this->process();
            }
        }
        return $this->_result;
    }

    /**
     * Prints the grabbed results.
     * @return void
     */
    public function draw() {
        $ret = $this->get();
        foreach ($ret as $item) {
            echo $item;
        }
    }

    /**
     * Saves the grabbed results to the cache.
     * @return bool if the save was sucesfull
     */
    private function saveCache() {
        $cacheFilePath = $this->getCacheFilePath();
        if (!$cacheFilePath) {
            return false;
        }
        $fh = @fopen($cacheFilePath, "w");
        if (!$fh) {
            return false;
        }
        $ret = "";
        foreach ($this->_result as $content) {
            $ret .= ($content . self::DELIMITER);
        }
        if (substr($ret, -1 * strlen(self::DELIMITER)) == self::DELIMITER) {
            $ret = substr($ret, 0, strlen($ret) - strlen(self::DELIMITER));
        }
        fwrite($fh, $ret);
        fclose($fh);
        return true;
    }

    /**
     * Tests if the html cache is up to date.
     * @return bool if html cache is up to date
     */
    private function isCacheUpdated() {
        $cacheFilePath = $this->getCacheFilePath();
        if (!$cacheFilePath) {
            return false;
        }

        if (file_exists($cacheFilePath) && filemtime($cacheFilePath) + ($this->_cacheTime * 60) >= time()) {
            return true;
        }

        return false;
    }

    /**
     * Generates an unique cache file name.
     * @return string the cache file name
     */
    private function getCacheFileName() {
        $ret = $this->_curl->getUrl();
        if (isset($this->_params)) {
            if (is_array($this->_params)) {
                $ret .= serialize($this->_params);
            }
        }
        return md5($ret) . ".html";
    }

    /**
     * Returns the html cache real path.
     * @return string the cache file path
     */
    private function getCacheFilePath() {
        $cacheFileName = $this->getCacheFileName();
        if (!$cacheFileName) {
            return false;
        }
        return dirname(__FILE__) . "/../cache/" . $cacheFileName;
    }
}

?>

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.