dsaba Posted December 3, 2007 Share Posted December 3, 2007 Here's some more sample bulk html where i want to match the pattern that is easier to read without the auto conversion of html entities: </td> </tr> <tr valign="top"> <td> </td> <td class="smallfont" valign="bottom" align="right"> <div>Last Activity: Today <span class="time">04:11 PM</span> </div> <div>Viewing Thread <a href="showthread.php?t=160518" title="V1.01 is the opening day rosters (ie before trades/drops/waivers etc) using 2007 stats. V2.91 is end of season rosters with adjustments for trades. Included: 1° Every MLB player that played in 2007 with their real stats (thus some who have had a poor season will have a future peak set at a...">2007 Rosters for BM08</a> @ 04:11 PM </div> </td> </tr> </table> </td> </tr> </table> <!-- / main info - avatar, profilepic etc. --> <!-- button row --> <!-- / button row --> <br /> here's another: </tr> <tr> <td class="vbmenu_option" title="nohilite"> <form action="index.php" method="get" onsubmit="return this.gotopage()" id="pagenav_form"> <input type="text" class="bginput" id="pagenav_itxt" style="font-size:11px" size="4" /> <input type="button" class="button" id="pagenav_ibtn" value="סע" /> </form> </td> </tr> </table> </div> <!-- / PAGENAV POPUP --> <!-- main info - avatar, profilepic etc. --> <table class="tborder" cellpadding="6" cellspacing="1" border="0" width="100%" align="center"> <tr> <td class="tcat">צפיה בפרופיל<span class="normal">: RAN2007</span></td> </tr> <tr> <td class="alt2"> <table cellpadding="0" cellspacing="0" border="0" width="100%"> <tr> <td style="border-bottom:1px solid #D1D1E1" width="100%" colspan="2"> <div class="bigusername">RAN2007 <img class="inlineimg" src="images/statusicon/user_offline.gif" alt="RAN2007 is offline" border="0" /> </div> </td> </tr> <tr valign="top"> <td><img src="image.php?u=4469&dateline=1193899515" width="150" height="112" alt="RAN2007's Avatar" border="0" style="border:1px solid #D1D1E1; border-top:none" /></td> <td class="smallfont" valign="bottom" align="left"> <div>ביקור אחרון: 29-11-07 <span class="time">12:56</span> </div> </td> </tr> </table> </td> </tr> </table> <!-- / main info - avatar, profilepic etc. --> <!-- button row --> <!-- / button row --> <br /> <table class="tborder" cellpadding="6" cellspacing="1" border="0" width="100%" align="center"> <tr> <td class="tcat" width="50%">פרטים ממערכת הפורומים</td> <td class="tcat" width="50%">שמור על קשר</td> </tr> <?php $raw = ' <tr valign="top"> <td><img src="image.php?u=4469&dateline=1193899515" width="150" height="112" alt="RAN2007's Avatar" border="0" style="border:1px solid #D1D1E1; border-top:none" /></td> <td class="smallfont" valign="bottom" align="left"> <div>ביקור אחרון: 29-11-07 <span class="time">12:56</span> </div> </td> </tr> </table> </td> </tr> </table> <!-- / main info - avatar, profilepic etc. --> '; $pattern = "~\<div\>(.*){200}.*){200}\<span class=\"time\"\>(.*){2}.*){5}\<\/span\>\ \<\/div\>~"; $lala = preg_match_all($pattern,$raw,$captArr); ?> Here's two things as an example of what I want to match: <div>Last Activity: Today <span class="time">04:11 PM</span> </div> <div>ביקור אחרון: 29-11-07 <span class="time">12:56</span> </div> in psuedoregex this is what I want to say: <div>less than 200 chars: less than 200 chars <span class="time">2 chars or less:5 chars or less</span> </div> the result i get from my above code/attempt is: Array ( [0] => Array ( ) [1] => Array ( ) [2] => Array ( ) [3] => Array ( ) [4] => Array ( ) ) Can you help me fix it to match the pattern correctly? Thanks Link to comment https://forums.phpfreaks.com/topic/80025-regex-for-matching-simple-pattern-within-bulk-of-html/ Share on other sites More sharing options...
effigy Posted December 3, 2007 Share Posted December 3, 2007 {200} means exactly 200 characters; you want {0,200}. You may be better off by making the match ungreedy--(.*?)--or using ([^<]+) if you're not expecting HTML tags. There's no need to escape < and >. Link to comment https://forums.phpfreaks.com/topic/80025-regex-for-matching-simple-pattern-within-bulk-of-html/#findComment-405495 Share on other sites More sharing options...
dsaba Posted December 3, 2007 Author Share Posted December 3, 2007 {200} means exactly 200 characters; you want {0,200}. You may be better off by making the match ungreedy--(.*?)--or using ([^<]+) if you're not expecting HTML tags. There's no need to escape < and >. I changed the pattern to say {0,200} and I still got the same blank result. I would like to try your advice, but I don't really understand what you mean by "ungreedy". Could you give me an example pattern of what you mean with this new approach?? Thank you. *edit Then I also tried what i understood to be your advice: $pattern = "~\<div\>(.*?){0,200}.*?){0,200}\<span class=\"time\"\>(.*?){0,2}.*?){0,5}\<\/span\>\ \<\/div\>~"; This actually worked! Yet, I would like to try your other advice, how can I implement this? Link to comment https://forums.phpfreaks.com/topic/80025-regex-for-matching-simple-pattern-within-bulk-of-html/#findComment-405568 Share on other sites More sharing options...
effigy Posted December 4, 2007 Share Posted December 4, 2007 Greediness, etc. preg_match_all('% <div> (.*?) <span\x20class="time"> (.*?) </span> </div> %xs', $raw, $matches); print_r($matches); Link to comment https://forums.phpfreaks.com/topic/80025-regex-for-matching-simple-pattern-within-bulk-of-html/#findComment-406135 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.