phoenixx Posted November 15, 2010 Share Posted November 15, 2010 I had a simple regex working that I somehow broke this morning - call it exhaustion. Here's the data I'm scraping: <tr> <td><font size="1"><b>Description:</b></font></td> <td width="550">The rich contemporary style of the "Theo" Counter Height Table combines faux marble and a warm finish to create dining room furniture that adds an exciting style to the decor of any home. The thick polyurethane coated faux marble table top perfectly accentuates the warm brown finish flowing over the straight-lined contemporary design of the apron and legs to help create an exceptional dining experience. With the beautiful stitching and button tufting details of the faux leather upholstered bar stools, the "Theo" Counter Height Table is a refreshing addition to any home.</td> </tr> <tr bgcolor="#F8F7E4"> <td><font size="1"><b>Series Features:</b></font></td> <td width="550">Table top made with polyurethane coated print marble. Aprons and legs made from select veneer and solids with a warm brown finish. Chair is upholstered in a brown PVC with accent top stitching. D158-233 bar stool dimension: 18"W x 21"D x 40"H.</td> </tr> <tr> <td><font size="1"><b>Printable Page:</b></font></td> <td width="550"> <a href="javascript:LoadBrochure('D158')"><b>Click here</b> </a>to download full color page for the<b> Theo </b>series. </td> </tr> <tr bgcolor="#F8F7E4"> <td><font size="1"><b>Image Downloads:</b></font></td> <td width="550"><a href="../Downloads/download_results.asp?varSeriesNumber=D158&NAV=fromSeriesDetail"><b>Click here</b></a> for complete image download listing for series <b>D158</b>.</td> </tr> And here is the code I'm using: $url = "http://xxxxxxxxxxx.html"; $data = file_get_contents($url); preg_match_all('/<td>.*?width="550">([^"]*)".*?<\/td>.*?width="550">([^"]*)<\/td>/is',$data,$out2); if ((isset($out2[1]) && isset($out2[2])) == FALSE) { // Let's do some error checking to see if there is data to insert into the database. If not let's end the script die(); } $d = array_combine($out2[1], $out2[2]); foreach($d as $k=>$v){ echo $k . "<br>" . $v . "<br>"; }// I had it broken up where it would show: Description: (description output) Series Features: (series features output) Printable Page: (printable page output) Image Downloads: (image downloads output) I had it working and accidentally overwrote my backup file. Any help the community could give would be great. It's resulting in no output at all. Quote Link to comment Share on other sites More sharing options...
Pikachu2000 Posted November 15, 2010 Share Posted November 15, 2010 Most likely, the script is going to the die(). Put something in it to see what's happening. i.e. die('Script died in conditional') Quote Link to comment Share on other sites More sharing options...
phoenixx Posted November 15, 2010 Author Share Posted November 15, 2010 Here's the error it's generating Warning: array_combine() [function.array-combine]: Both parameters should have at least 1 element in /home/xxxxxxxxxxxxxxx/index.php on line 49 Warning: Invalid argument supplied for foreach() in /home/xxxxxxxxxxxxxxx/index.php on line 50 Quote Link to comment Share on other sites More sharing options...
phoenixx Posted November 16, 2010 Author Share Posted November 16, 2010 OK - it looks like I made it more complicated than it needs to be. Here is the data I'm trying to scrape - I need what is in between the <td width="550">DATA TO SCRAPE</td>. Here's a sample of the data: <tr> <td><font size="1"><b>Description:</b></font></td> <td width="550">The rich contemporary style of the "Theo" Counter Height Table combines faux marble and a warm finish to create dining room furniture that adds an exciting style to the decor of any home. The thick polyurethane coated faux marble table top perfectly accentuates the warm brown finish flowing over the straight-lined contemporary design of the apron and legs to help create an exceptional dining experience. With the beautiful stitching and button tufting details of the faux leather upholstered bar stools, the "Theo" Counter Height Table is a refreshing addition to any home.</td> </tr> <tr bgcolor="#F8F7E4"> <td><font size="1"><b>Series Features:</b></font></td> <td width="550">Table top made with polyurethane coated print marble. Aprons and legs made from select veneer and solids with a warm brown finish. Chair is upholstered in a brown PVC with accent top stitching. D158-233 bar stool dimension: 18"W x 21"D x 40"H.</td> </tr> <tr> <td><font size="1"><b>Printable Page:</b></font></td> <td width="550"> <a href="javascript:LoadBrochure('D158')"><b>Click here</b> </a>to download full color page for the<b> Theo </b>series. </td> </tr> <tr bgcolor="#F8F7E4"> <td><font size="1"><b>Image Downloads:</b></font></td> <td width="550"><a href="../Downloads/download_results.asp?varSeriesNumber=D158&NAV=fromSeriesDetail"><b>Click here</b></a> for complete image download listing for series <b>D158</b>.</td> </tr> Quote Link to comment Share on other sites More sharing options...
sasa Posted November 16, 2010 Share Posted November 16, 2010 <?php $test = '<tr> <td><font size="1"><b>Description:</b></font></td> <td width="550">The rich contemporary style of the "Theo" Counter Height Table combines faux marble and a warm finish to create dining room furniture that adds an exciting style to the decor of any home. The thick polyurethane coated faux marble table top perfectly accentuates the warm brown finish flowing over the straight-lined contemporary design of the apron and legs to help create an exceptional dining experience. With the beautiful stitching and button tufting details of the faux leather upholstered bar stools, the "Theo" Counter Height Table is a refreshing addition to any home.</td> </tr> <tr bgcolor="#F8F7E4"> <td><font size="1"><b>Series Features:</b></font></td> <td width="550">Table top made with polyurethane coated print marble. Aprons and legs made from select veneer and solids with a warm brown finish. Chair is upholstered in a brown PVC with accent top stitching. D158-233 bar stool dimension: 18"W x 21"D x 40"H.</td> </tr> <tr> <td><font size="1"><b>Printable Page:</b></font></td> <td width="550"> <a href="javascript:LoadBrochure(\'D158\')"><b>Click here</b> </a>to download full color page for the<b> Theo </b>series. </td> </tr> <tr bgcolor="#F8F7E4"> <td><font size="1"><b>Image Downloads:</b></font></td> <td width="550"><a href="../Downloads/download_results.asp?varSeriesNumber=D158&NAV=fromSeriesDetail"><b>Click here</b></a> for complete image download listing for series <b>D158</b>.</td> </tr>'; preg_match_all('~<td\s+width="550">(.*?)</td>~', $test, $out); print_r($out[1]); ?> Quote Link to comment Share on other sites More sharing options...
phoenixx Posted November 16, 2010 Author Share Posted November 16, 2010 Many thanks! Here is the final code I used to break down the different arrays. preg_match_all('~<td\s+width="550">(.*?)</td>~', $data, $series_description); // Series Descriptions $d = array_merge($series_description[0]); foreach($d as $k=>$v){ echo "<p align='left'>" . $v . "</p>"; } Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.