funstein Posted October 18, 2011 Share Posted October 18, 2011 I have <tr> tags that contain extra attributes, like <tr style=blabla>, and </tr> tags. I want to make PHP grab the data in between. Please check the example. The Data : <tr style="font-weight: bold; background-color: #aaa;"> <td>School</td><td>Position</td><td>Name</td><td>Surname</td><td>Delegation</td><td>Commitee</td> </tr> <tr style="font-weight: bold; background-color: #aaa;"> <td>School1</td><td>Position1</td><td>Name1</td><td>Surname1</td><td>Delegation1</td><td>Commitee1</td> </tr> It should return me with these : $array[0] will be <td>School</td><td>Position</td><td>Name</td><td>Surname</td><td>Delegation</td><td>Commitee</td> $array[1] will be <td>School1</td><td>Position1</td><td>Name1</td><td>Surname1</td><td>Delegation1</td><td>Commitee1</td> Quote Link to comment Share on other sites More sharing options...
xyph Posted October 18, 2011 Share Posted October 18, 2011 Did you want to support nested <tr>'s as well? <table> <tr> <td> <table> <tr> <td></td> </tr> </table> </td> </tr> </table> If not, you probably want something like this. Keep in mind, there will be a lot of backtracking, as you have to use a lazy quantifier which has to verify the next part of the expression can't be matches at each character. %<tr[^>]++>(.*?)</tr>%s Options: dot matches newline (s) Match the characters “<tr” literally «<tr» Match any character that is NOT a “>” «[^>]++» Between one and unlimited times, as many times as possible, without giving back (possessive) «++» Match the character “>” literally «>» Match the regular expression below and capture its match into backreference number 1 «(.*?)» Match any single character «.*?» Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?» Match the characters “</tr>” literally «</tr>» Quote Link to comment Share on other sites More sharing options...
funstein Posted October 18, 2011 Author Share Posted October 18, 2011 I tried, for some reason it wont work and it says Warning: preg_match_all() [function.preg-match-all]: Unknown modifier '<' in C:\blabla\test.php on line 7 And not all <tr> tags are on new lines. Can you please send me another working one? Quote Link to comment Share on other sites More sharing options...
xyph Posted October 18, 2011 Share Posted October 18, 2011 A working one? Really? My expression works fine. It's not my fault you don't know how to use the preg_match_all function. Whether it's ignorance or laziness, perhaps you should check the manual - or even copy the error into a search engine. I handed you the solution to the harder part of the problem. I'm going to leave the rest up to you. Quote Link to comment Share on other sites More sharing options...
silkfire Posted October 18, 2011 Share Posted October 18, 2011 Seriously xyph you should know better than give him a preg solution to a problem that should be solved with DOM. Mate someone should teach you how to use the marvelous DOM parser included in PHP: function innerHTML($node, $escape = false) { $innerHTML = ''; $children = $node->childNodes; foreach ($children as $child) { $dom = new DOMDocument(); $dom->appendChild($dom->importNode($child, true)); $innerHTML .= ($escape ? htmlspecialchars($dom->saveHTML()) : $dom->saveHTML()); } return trim($innerHTML) . "\r\n\r\n"; } $dom = new DOMDocument(); @$dom->loadHTML($html); // Put your HTML in this variable first! $xpath = new DOMXPath($dom); $trs = $xpath->query('//tr'); $rows = array(); foreach($trs as $tr) $rows[] = innerHTML($tr, true); print_r($rows); Quote Link to comment Share on other sites More sharing options...
funstein Posted October 18, 2011 Author Share Posted October 18, 2011 I seriously sound like a n00b here xyph, but what I meant by it doesn't work was actually it returns associative arrays. I have no idea about why that is happening, all I know is that it should be returning an array that has the first match as $array[0] and the second one as $array[1] and it doesn't. And silkfire, what does that do, and how do I run a regex on that? Quote Link to comment Share on other sites More sharing options...
cags Posted October 19, 2011 Share Posted October 19, 2011 And silkfire, what does that do, and how do I run a regex on that? You don't. That's the point. You are attempting to parse data out of an HTML element, something that regular expressions are not really suited to. The solution shown by silkfire loads the html into a DOM object which are built specifically for handling html. Quote Link to comment Share on other sites More sharing options...
funstein Posted October 19, 2011 Author Share Posted October 19, 2011 OK, I see. But can you point out how I can import the HTML and get an array of <tr> tags? Thanks Quote Link to comment Share on other sites More sharing options...
silkfire Posted October 19, 2011 Share Posted October 19, 2011 Depends where you're getting the HTML from. Is it you own site or are you scraping? What is the address? Quote Link to comment Share on other sites More sharing options...
funstein Posted October 19, 2011 Author Share Posted October 19, 2011 I'm getting an HTML input from Google Spreadsheets Visualizations. It's a pretty simple one actually. I get the data using file_get_contents(). Quote Link to comment Share on other sites More sharing options...
silkfire Posted October 19, 2011 Share Posted October 19, 2011 You have the answer, mate... Quote Link to comment Share on other sites More sharing options...
funstein Posted October 19, 2011 Author Share Posted October 19, 2011 Which is? Quote Link to comment Share on other sites More sharing options...
funstein Posted October 19, 2011 Author Share Posted October 19, 2011 Oh, I really am sorry My browser didn't display the iframe scroll bar Thanks for everything! Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.