blommer Posted February 7, 2010 Share Posted February 7, 2010 Hi, I'm trying to parse out the title, url, and thread score from a forum. I would like to display it very simply. For the examples below, I want it to look like: Cleveland Indians 2010 Odds, showthread.php?t=18523545, 1 We need another thing like the Albert Belle Candy Bar, showthread.php?t=18523546&page=, -2 Can somebody give me some tips to do this? <td class="mobile_pen_right_alt"></td></tr><tr><td class="mobile_pen_left"></td><td class="mobile_pen"> <div class="mobile_threadbit"> <div class="mobile_threadbit_title"> <a href="showthread.php?t=18523545" id="thread_title_18523545">Cleveland Indians 2010 Odds</a> </div> <div class="mobile_threadbit_details"> Today 06:31 PM - <a href="member.php?u=4971" rel="nofollow">larrypaging</a> <!-- Replies Views --> <br>Replies: 0 - Views: 142 - <a href="showthread.php?t=18523545&page=">Last Page</a> <br> <span> <img class="inlineimg" src="/images/rating/rating1.gif" border="0" alt="Votes: 1 Score: 1" /> </span> </div> </div> </td><td class="mobile_fun_right"></td></tr><tr><td class="mobile_fun_left_alt"></td><td class="mobile_fun_alt"> <div class="mobile_threadbit"> <div class="mobile_threadbit_title"> <a href="showthread.php?t=18523546" id="thread_title_18523546" style="font-weight:bold">We need another thing like the Albert Belle Candy Bar</a> </div> <div class="mobile_threadbit_details"> Today 06:24 PM - <a href="member.php?u=5614" rel="nofollow">lofton4ever</a> <!-- Replies Views --> <br>Replies: 3 - Views: 292 - <a href="showthread.php?t=18523546&page=">Last Page</a> <br> <span> <img class="inlineimg" src="/images/rating/rating-2.gif" border="0" alt="Votes: 2 Score: -2" /> </span> </div> </div> </td> Quote Link to comment Share on other sites More sharing options...
MadTechie Posted February 7, 2010 Share Posted February 7, 2010 Try this <a href="showthread\.php([^"]*)"[^>]*>([^<]+)</a> Quote Link to comment Share on other sites More sharing options...
blommer Posted February 7, 2010 Author Share Posted February 7, 2010 I also need to grab the score which is located on this line: <img class="inlineimg" src="/images/rating/rating1.gif" border="0" alt="Votes: 1 Score: 1" /> Quote Link to comment Share on other sites More sharing options...
MadTechie Posted February 7, 2010 Share Posted February 7, 2010 try this preg_match('%<a href="showthread\.php([^"]*)"[^>]*>([^<]+)</a>.*?alt="Votes:\s+\d+\s+Score: (\d+)%si', $html, $regs)); unset($regs[0]); print_r($regs); Quote Link to comment Share on other sites More sharing options...
blommer Posted February 7, 2010 Author Share Posted February 7, 2010 I think I'm following you, but I'm getting a parse error. Quote Link to comment Share on other sites More sharing options...
MadTechie Posted February 7, 2010 Share Posted February 7, 2010 Ahh $html, $regs)); should be $html, $regs); Quote Link to comment Share on other sites More sharing options...
MadTechie Posted February 7, 2010 Share Posted February 7, 2010 Opps noticed the minus, on the score Here is a full example preg_match_all('%<a href="showthread\.php([^"]*)"[^>]*>([^<]+)</a>.*?alt="Votes:\s+\d+\s+Score: ([\d-]+)%si', $html, $results, PREG_SET_ORDER); foreach($results as $result){ echo $result[1]." -> "; echo $result[2]." -> "; echo $result[3]."<BR />\n"; } Quote Link to comment Share on other sites More sharing options...
blommer Posted February 7, 2010 Author Share Posted February 7, 2010 I thank you MadTechie for your assistance! However I am still having some problems. The url link is coming back partially corrupted. Instead of just a number, it has "?s=da7e4443638d7d2e8a430f7a0da3a0bf&t=" appended to the beginning of every url. My output right now looks like: ?s=da7e4443638d7d2e8a430f7a0da3a0bf&t=18523544 -> Steroids can't be tested on everyone -> 1 ?s=da7e4443638d7d2e8a430f7a0da3a0bf&t=1852845 -> We need another thing like the Albert Belle Candy Bar -> -2 The other problem is that the score is not being shown as belonging to correct thread. I think this is because I forgot to mention that some threads have no score. So in the above output, the "Steroids can't be tested on everyone" thread has no score, but it is displaying the score from the "Cleveland Indians 2010 Odds" thread. If a thread does not have a score, I don't need to display it. So instead of: <td class="mobile_pen_right_alt"></td></tr><tr><td class="mobile_pen_left"></td><td class="mobile_pen"> <div class="mobile_threadbit"> <div class="mobile_threadbit_title"> <a href="showthread.php?t=18523545" id="thread_title_18523545">Cleveland Indians 2010 Odds</a> </div> <div class="mobile_threadbit_details"> Today 06:31 PM - <a href="member.php?u=4971" rel="nofollow">larrypaging</a> <!-- Replies Views --> <br>Replies: 0 - Views: 142 - <a href="showthread.php?t=18523545&page=">Last Page</a> <br> <span> <img class="inlineimg" src="/images/rating/rating1.gif" border="0" alt="Votes: 1 Score: 1" /> </span> </div> </div> </td> The code for a thread might look like this: <td class="mobile_pen_right_alt"></td></tr><tr><td class="mobile_pen_left"></td><td class="mobile_pen"> <div class="mobile_threadbit"> <div class="mobile_threadbit_title"> <a href="showthread.php?t=18523545" id="thread_title_18523544">Steroids can't be tested on everyone</a> </div> <div class="mobile_threadbit_details"> Today 06:31 PM - <a href="member.php?u=4674" rel="nofollow">dudeabiding</a> <!-- Replies Views --> <br>Replies: 0 - Views: 142 - <a href="showthread.php?t=18523544&page=">Last Page</a> <br> </div> </div> </td> If you need more examples from the source code please let me know. And thank you, you've already helped me quite a bit! Quote Link to comment Share on other sites More sharing options...
MadTechie Posted February 7, 2010 Share Posted February 7, 2010 Okay I'm a little unsure but i think this is what you want! preg_match_all ( '%<a href="showthread\.php([^"]*)"[^>]*>([^<]+)</a>.*?<br>\s*?<span>\s*<img [^>]*? alt="Votes:\s+\d+\s+Score: ([\d-]+)%si', $html, $results, PREG_SET_ORDER ); foreach ( $results as $result ) { echo $result [1] . " -> "; echo $result [2] . " -> "; echo $result [3] . "<BR />\n"; } Quote Link to comment Share on other sites More sharing options...
blommer Posted February 7, 2010 Author Share Posted February 7, 2010 No, I'm still getting the same problem. I'll work on it tonight, and if I still have the problem tomorrow, I'm post some more examples. Quote Link to comment Share on other sites More sharing options...
blommer Posted February 7, 2010 Author Share Posted February 7, 2010 Unfortunately I still haven't figured it out. Here is the exact code sample with the first thread having NO score. Right now, the regex is returning the score of the "Cleveland Indians 2010 Odds" to the "Steroids can't be tested on everyone." I don't want the "Steroids can't be tested on everyone" to even be in the output. Any ideas what could fix this? <td class="mobile_pen_right_alt"></td></tr><tr><td class="mobile_pen_left"></td><td class="mobile_pen"> <div class="mobile_threadbit"> <div class="mobile_threadbit_title"> <a href="showthread.php?t=18523545" id="thread_title_18523544">Steroids can't be tested on everyone</a> </div> <div class="mobile_threadbit_details"> Today 06:31 PM - <a href="member.php?u=4674" rel="nofollow">dudeabiding</a> <!-- Replies Views --> <br>Replies: 0 - Views: 142 - <a href="showthread.php?t=18523544&page=">Last Page</a> <br> </div> </div> </td><td class="mobile_pen_right_alt"></td></tr><tr><td class="mobile_pen_left"></td><td class="mobile_pen"> <div class="mobile_threadbit"> <div class="mobile_threadbit_title"> <a href="showthread.php?t=18523545" id="thread_title_18523545">Cleveland Indians 2010 Odds</a> </div> <div class="mobile_threadbit_details"> Today 06:31 PM - <a href="member.php?u=4971" rel="nofollow">larrypaging</a> <!-- Replies Views --> <br>Replies: 0 - Views: 142 - <a href="showthread.php?t=18523545&page=">Last Page</a> <br> <span> <img class="inlineimg" src="/images/rating/rating1.gif" border="0" alt="Votes: 1 Score: 1" /> </span> </div> </div> </td><td class="mobile_fun_right"></td></tr><tr><td class="mobile_fun_left_alt"></td><td class="mobile_fun_alt"> <div class="mobile_threadbit"> <div class="mobile_threadbit_title"> <a href="showthread.php?t=18523546" id="thread_title_18523546" style="font-weight:bold">We need another thing like the Albert Belle Candy Bar</a> </div> <div class="mobile_threadbit_details"> Today 06:24 PM - <a href="member.php?u=5614" rel="nofollow">lofton4ever</a> <!-- Replies Views --> <br>Replies: 3 - Views: 292 - <a href="showthread.php?t=18523546&page=">Last Page</a> <br> <span> <img class="inlineimg" src="/images/rating/rating-2.gif" border="0" alt="Votes: 2 Score: -2" /> </span> </div> </div> </td> Quote Link to comment Share on other sites More sharing options...
blommer Posted February 10, 2010 Author Share Posted February 10, 2010 Anybody have any ideas? I had an idea that you could restrict the threads with votes because the text "Score:" would have to be within ~400 characters of the thread title. Does anyone follow what I mean? Quote Link to comment Share on other sites More sharing options...
blommer Posted February 10, 2010 Author Share Posted February 10, 2010 I think I figured it out! I changed: preg_match_all ( '%<a href="showthread\.php([^"]*)"[^>]*>([^<]+)</a>.*?<br>\s*?<span>\s*<img [^>]*? alt="Votes:\s+\d+\s+Score: ([\d-]+)%si', $html, $results, PREG_SET_ORDER ); to: preg_match_all('%id="thread_title_([^"]*)"[^>]*>([^<]+)</a>.{350,1000}?alt="Votes:\s+\d+\s+Score: ([\d-]+)%si', $html, $results, PREG_SET_ORDER); It seems to have made the URL more readable (somehow they were embedding a link WITHIN THEIR SOURCE CODE), and now it only returns threads where "Score:" is within 350 and 1000 characters of the thread title. I stand on the shoulders of MadTechie. Thanks! Quote Link to comment Share on other sites More sharing options...
MadTechie Posted February 10, 2010 Share Posted February 10, 2010 Okay well if you must use a RegEx, then here's an ugly one <td class="mobile_pen_right_alt"></td></tr><tr><td class="mobile_pen_left"></td><td class="mobile_pen"> <div class="mobile_threadbit"> <div class="mobile_threadbit_title"> <a href="showthread.php?t=18523545" id="thread_title_18523544">Steroids can't be tested on everyone</a> </div> <div class="mobile_threadbit_details"> Today 06:31 PM - <a href="member.php?u=4674" rel="nofollow">dudeabiding</a> <!-- Replies Views --> <br>Replies: 0 - Views: 142 - <a href="showthread.php?t=18523544&page=">Last Page</a> <br> </div> </div> </td><td class="mobile_pen_right_alt"></td></tr><tr><td class="mobile_pen_left"></td><td class="mobile_pen"> <div class="mobile_threadbit"> <div class="mobile_threadbit_title"> <a href="showthread.php?t=18523545" id="thread_title_18523545">Cleveland Indians 2010 Odds</a> </div> <div class="mobile_threadbit_details"> Today 06:31 PM - <a href="member.php?u=4971" rel="nofollow">larrypaging</a> <!-- Replies Views --> <br>Replies: 0 - Views: 142 - <a href="showthread.php?t=18523545&page=">Last Page</a> <br> <span> <img class="inlineimg" src="/images/rating/rating1.gif" border="0" alt="Votes: 1 Score: 1" /> </span> </div> </div> </td><td class="mobile_fun_right"></td></tr><tr><td class="mobile_fun_left_alt"></td><td class="mobile_fun_alt"> <div class="mobile_threadbit"> <div class="mobile_threadbit_title"> <a href="showthread.php?t=18523546" id="thread_title_18523546" style="font-weight:bold">We need another thing like the Albert Belle Candy Bar</a> </div> <div class="mobile_threadbit_details"> Today 06:24 PM - <a href="member.php?u=5614" rel="nofollow">lofton4ever</a> <!-- Replies Views --> <br>Replies: 3 - Views: 292 - <a href="showthread.php?t=18523546&page=">Last Page</a> <br> <span> <img class="inlineimg" src="/images/rating/rating-2.gif" border="0" alt="Votes: 2 Score: -2" /> </span> </div> </div> </td> preg_match_all ( '%<a href="showthread\.php([^"]*)"[^>]*>([^<]+)</a>\s*</div>\s*<div class="mobile_threadbit_details">\s*[^<]*[^<]*[^>]*>[^>]*>\s*<!-- Replies Views -->\s*<br>Replies: \d+\s+[^>]*>[^>]*>\s*<br>\s+<span>\s*<img [^>]*? alt="Votes:\s+\d+\s+Score: ([\d-]+)%sim', $html, $results, PREG_SET_ORDER ); foreach ( $results as $result ) { echo $result [1] . " -> "; echo $result [2] . " -> "; echo $result [3] . "<BR />\n"; } ?t=18523545 -> Cleveland Indians 2010 Odds -> 1 ?t=18523546 -> We need another thing like the Albert Belle Candy Bar -> -2 seam correct to me! Quote Link to comment Share on other sites More sharing options...
blommer Posted February 10, 2010 Author Share Posted February 10, 2010 Thanks MT, would there be a better way of doing this besides RegEx? Quote Link to comment Share on other sites More sharing options...
MadTechie Posted February 10, 2010 Share Posted February 10, 2010 You could use a DOMXPath, Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.