dsaba Posted December 3, 2007 Share Posted December 3, 2007 Here's some more sample bulk html where i want to match the pattern that is easier to read without the auto conversion of html entities: </td> </tr> <tr valign="top"> <td> </td> <td class="smallfont" valign="bottom" align="right"> <div>Last Activity: Today <span class="time">04:11 PM</span> </div> <div>Viewing Thread <a href="showthread.php?t=160518" title="V1.01 is the opening day rosters (ie before trades/drops/waivers etc) using 2007 stats. V2.91 is end of season rosters with adjustments for trades. Included: 1° Every MLB player that played in 2007 with their real stats (thus some who have had a poor season will have a future peak set at a...">2007 Rosters for BM08</a> @ 04:11 PM </div> </td> </tr> </table> </td> </tr> </table> <!-- / main info - avatar, profilepic etc. --> <!-- button row --> <!-- / button row --> <br /> here's another: </tr> <tr> <td class="vbmenu_option" title="nohilite"> <form action="index.php" method="get" onsubmit="return this.gotopage()" id="pagenav_form"> <input type="text" class="bginput" id="pagenav_itxt" style="font-size:11px" size="4" /> <input type="button" class="button" id="pagenav_ibtn" value="סע" /> </form> </td> </tr> </table> </div> <!-- / PAGENAV POPUP --> <!-- main info - avatar, profilepic etc. --> <table class="tborder" cellpadding="6" cellspacing="1" border="0" width="100%" align="center"> <tr> <td class="tcat">צפיה בפרופיל<span class="normal">: RAN2007</span></td> </tr> <tr> <td class="alt2"> <table cellpadding="0" cellspacing="0" border="0" width="100%"> <tr> <td style="border-bottom:1px solid #D1D1E1" width="100%" colspan="2"> <div class="bigusername">RAN2007 <img class="inlineimg" src="images/statusicon/user_offline.gif" alt="RAN2007 is offline" border="0" /> </div> </td> </tr> <tr valign="top"> <td><img src="image.php?u=4469&dateline=1193899515" width="150" height="112" alt="RAN2007's Avatar" border="0" style="border:1px solid #D1D1E1; border-top:none" /></td> <td class="smallfont" valign="bottom" align="left"> <div>ביקור אחרון: 29-11-07 <span class="time">12:56</span> </div> </td> </tr> </table> </td> </tr> </table> <!-- / main info - avatar, profilepic etc. --> <!-- button row --> <!-- / button row --> <br /> <table class="tborder" cellpadding="6" cellspacing="1" border="0" width="100%" align="center"> <tr> <td class="tcat" width="50%">פרטים ממערכת הפורומים</td> <td class="tcat" width="50%">שמור על קשר</td> </tr> <?php $raw = ' <tr valign="top"> <td><img src="image.php?u=4469&dateline=1193899515" width="150" height="112" alt="RAN2007's Avatar" border="0" style="border:1px solid #D1D1E1; border-top:none" /></td> <td class="smallfont" valign="bottom" align="left"> <div>ביקור אחרון: 29-11-07 <span class="time">12:56</span> </div> </td> </tr> </table> </td> </tr> </table> <!-- / main info - avatar, profilepic etc. --> '; $pattern = "~\<div\>(.*){200}.*){200}\<span class=\"time\"\>(.*){2}.*){5}\<\/span\>\ \<\/div\>~"; $lala = preg_match_all($pattern,$raw,$captArr); ?> Here's two things as an example of what I want to match: <div>Last Activity: Today <span class="time">04:11 PM</span> </div> <div>ביקור אחרון: 29-11-07 <span class="time">12:56</span> </div> in psuedoregex this is what I want to say: <div>less than 200 chars: less than 200 chars <span class="time">2 chars or less:5 chars or less</span> </div> the result i get from my above code/attempt is: Array ( [0] => Array ( ) [1] => Array ( ) [2] => Array ( ) [3] => Array ( ) [4] => Array ( ) ) Can you help me fix it to match the pattern correctly? Thanks Quote Link to comment Share on other sites More sharing options...
effigy Posted December 3, 2007 Share Posted December 3, 2007 {200} means exactly 200 characters; you want {0,200}. You may be better off by making the match ungreedy--(.*?)--or using ([^<]+) if you're not expecting HTML tags. There's no need to escape < and >. Quote Link to comment Share on other sites More sharing options...
dsaba Posted December 3, 2007 Author Share Posted December 3, 2007 {200} means exactly 200 characters; you want {0,200}. You may be better off by making the match ungreedy--(.*?)--or using ([^<]+) if you're not expecting HTML tags. There's no need to escape < and >. I changed the pattern to say {0,200} and I still got the same blank result. I would like to try your advice, but I don't really understand what you mean by "ungreedy". Could you give me an example pattern of what you mean with this new approach?? Thank you. *edit Then I also tried what i understood to be your advice: $pattern = "~\<div\>(.*?){0,200}.*?){0,200}\<span class=\"time\"\>(.*?){0,2}.*?){0,5}\<\/span\>\ \<\/div\>~"; This actually worked! Yet, I would like to try your other advice, how can I implement this? Quote Link to comment Share on other sites More sharing options...
effigy Posted December 4, 2007 Share Posted December 4, 2007 Greediness, etc. preg_match_all('% <div> (.*?) <span\x20class="time"> (.*?) </span> </div> %xs', $raw, $matches); print_r($matches); Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.