Jump to content

Parsing out title, url, and thread score


blommer

Recommended Posts

Hi, I'm trying to parse out the title, url, and thread score from a forum. I would like to display it very simply. For the examples below, I want it to look like:

 

Cleveland Indians 2010 Odds, showthread.php?t=18523545, 1

We need another thing like the Albert Belle Candy Bar, showthread.php?t=18523546&page=, -2

 

Can somebody give me some tips to do this?

 

<td class="mobile_pen_right_alt"></td></tr><tr><td class="mobile_pen_left"></td><td class="mobile_pen">
<div class="mobile_threadbit">	
<div class="mobile_threadbit_title">


		 <a href="showthread.php?t=18523545" id="thread_title_18523545">Cleveland Indians 2010 Odds</a>

	</div>

	<div class="mobile_threadbit_details">
	Today 06:31 PM -  
            
            <a href="member.php?u=4971" rel="nofollow">larrypaging</a>
            


<!-- Replies Views -->

	 <br>Replies: 0
	- Views: 142 - <a href="showthread.php?t=18523545&page=">Last Page</a>


<br>

		<span>



			<img class="inlineimg" src="/images/rating/rating1.gif" border="0" alt="Votes: 1 Score: 1" />
		</span>


</div>
</div>
</td><td class="mobile_fun_right"></td></tr><tr><td class="mobile_fun_left_alt"></td><td class="mobile_fun_alt">
<div class="mobile_threadbit">	
<div class="mobile_threadbit_title">


		 <a href="showthread.php?t=18523546" id="thread_title_18523546" style="font-weight:bold">We need another thing like the Albert Belle Candy Bar</a>
	</div>


	<div class="mobile_threadbit_details">
	Today 06:24 PM -  
            
            <a href="member.php?u=5614" rel="nofollow">lofton4ever</a>
            


<!-- Replies Views -->

	 <br>Replies: 3
	- Views: 292 - <a href="showthread.php?t=18523546&page=">Last Page</a>


<br>

		<span>


			<img class="inlineimg" src="/images/rating/rating-2.gif" border="0" alt="Votes: 2 Score: -2" />

		</span>


</div>
</div>
</td>

Link to comment
Share on other sites

Opps noticed the minus, on the score

 

Here is a full example

preg_match_all('%<a href="showthread\.php([^"]*)"[^>]*>([^<]+)</a>.*?alt="Votes:\s+\d+\s+Score: ([\d-]+)%si', $html, $results, PREG_SET_ORDER);
foreach($results as $result){
echo $result[1]." -> ";
echo $result[2]." -> ";
echo $result[3]."<BR />\n";
}

Link to comment
Share on other sites

I thank you MadTechie for your assistance! However I am still having some problems. The url link is coming back partially corrupted. Instead of just a number, it has "?s=da7e4443638d7d2e8a430f7a0da3a0bf&t=" appended to the beginning of every url.

 

My output right now looks like:

 

?s=da7e4443638d7d2e8a430f7a0da3a0bf&t=18523544 -> Steroids can't be tested on everyone -> 1

?s=da7e4443638d7d2e8a430f7a0da3a0bf&t=1852845 -> We need another thing like the Albert Belle Candy Bar -> -2

 

The other problem is that the score is not being shown as belonging to correct thread. I think this is because I forgot to mention that some threads have no score. So in the above output, the "Steroids can't be tested on everyone" thread has no score, but it is displaying the score from the "Cleveland Indians 2010 Odds" thread. If a thread does not have a score, I don't need to display it.

 

So instead of:

<td class="mobile_pen_right_alt"></td></tr><tr><td class="mobile_pen_left"></td><td class="mobile_pen">
   <div class="mobile_threadbit">   
   <div class="mobile_threadbit_title">
         
         
          <a href="showthread.php?t=18523545" id="thread_title_18523545">Cleveland Indians 2010 Odds</a>

      </div>
   
      <div class="mobile_threadbit_details">
      Today 06:31 PM - 
           
            <a href="member.php?u=4971" rel="nofollow">larrypaging</a>
           


<!-- Replies Views -->
   
       <br>Replies: 0
      - Views: 142 - <a href="showthread.php?t=18523545&page=">Last Page</a>
         
   
<br>
   
         <span>

            

            <img class="inlineimg" src="/images/rating/rating1.gif" border="0" alt="Votes: 1 Score: 1" />
         </span>
         

</div>
</div>
</td>

 

The code for a thread might look like this:

 

<td class="mobile_pen_right_alt"></td></tr><tr><td class="mobile_pen_left"></td><td class="mobile_pen">
   <div class="mobile_threadbit">   
   <div class="mobile_threadbit_title">
         
         
          <a href="showthread.php?t=18523545" id="thread_title_18523544">Steroids can't be tested on everyone</a>

      </div>
   
      <div class="mobile_threadbit_details">
      Today 06:31 PM - 
           
            <a href="member.php?u=4674" rel="nofollow">dudeabiding</a>
           


<!-- Replies Views -->
   
       <br>Replies: 0
      - Views: 142 - <a href="showthread.php?t=18523544&page=">Last Page</a>
         
   
<br>
         

</div>
</div>
</td>

 

If you need more examples from the source code please let me know. And thank you, you've already helped me quite a bit!

Link to comment
Share on other sites

Okay I'm a little unsure but i think this is what you want!

preg_match_all ( '%<a href="showthread\.php([^"]*)"[^>]*>([^<]+)</a>.*?<br>\s*?<span>\s*<img [^>]*? alt="Votes:\s+\d+\s+Score: ([\d-]+)%si', $html, $results, PREG_SET_ORDER );
foreach ( $results as $result ) {
echo $result [1] . " -> ";
echo $result [2] . " -> ";
echo $result [3] . "<BR />\n";
}

 

Link to comment
Share on other sites

Unfortunately I still haven't figured it out. Here is the exact code sample with the first thread having NO score. Right now, the regex is returning the score of the "Cleveland Indians 2010 Odds" to the "Steroids can't be tested on everyone." I don't want the "Steroids can't be tested on everyone" to even be in the output. Any ideas what could fix this?

 

<td class="mobile_pen_right_alt"></td></tr><tr><td class="mobile_pen_left"></td><td class="mobile_pen">
   <div class="mobile_threadbit">   
   <div class="mobile_threadbit_title">
         
         
          <a href="showthread.php?t=18523545" id="thread_title_18523544">Steroids can't be tested on everyone</a>

      </div>
   
      <div class="mobile_threadbit_details">
      Today 06:31 PM -
           
            <a href="member.php?u=4674" rel="nofollow">dudeabiding</a>
           


<!-- Replies Views -->
   
       <br>Replies: 0
      - Views: 142 - <a href="showthread.php?t=18523544&page=">Last Page</a>
         
   
<br>
         

</div>
</div>
</td><td class="mobile_pen_right_alt"></td></tr><tr><td class="mobile_pen_left"></td><td class="mobile_pen">
   <div class="mobile_threadbit">   
   <div class="mobile_threadbit_title">
         
         
          <a href="showthread.php?t=18523545" id="thread_title_18523545">Cleveland Indians 2010 Odds</a>

      </div>
   
      <div class="mobile_threadbit_details">
      Today 06:31 PM - 
           
            <a href="member.php?u=4971" rel="nofollow">larrypaging</a>
           


<!-- Replies Views -->
   
       <br>Replies: 0
      - Views: 142 - <a href="showthread.php?t=18523545&page=">Last Page</a>
         
   
<br>
   
         <span>

            

            <img class="inlineimg" src="/images/rating/rating1.gif" border="0" alt="Votes: 1 Score: 1" />
         </span>
         

</div>
</div>
</td><td class="mobile_fun_right"></td></tr><tr><td class="mobile_fun_left_alt"></td><td class="mobile_fun_alt">
   <div class="mobile_threadbit">   
   <div class="mobile_threadbit_title">
         
         
          <a href="showthread.php?t=18523546" id="thread_title_18523546" style="font-weight:bold">We need another thing like the Albert Belle Candy Bar</a>
      </div>

   
      <div class="mobile_threadbit_details">
      Today 06:24 PM - 
           
            <a href="member.php?u=5614" rel="nofollow">lofton4ever</a>
           


<!-- Replies Views -->
   
       <br>Replies: 3
      - Views: 292 - <a href="showthread.php?t=18523546&page=">Last Page</a>
         
   
<br>
   
         <span>
            

            <img class="inlineimg" src="/images/rating/rating-2.gif" border="0" alt="Votes: 2 Score: -2" />

         </span>
         

</div>
</div>
</td>

Link to comment
Share on other sites

I think I figured it out! :)

 

I changed:

preg_match_all ( '%<a href="showthread\.php([^"]*)"[^>]*>([^<]+)</a>.*?<br>\s*?<span>\s*<img [^>]*? alt="Votes:\s+\d+\s+Score: ([\d-]+)%si', $html, $results, PREG_SET_ORDER );

 

to:

preg_match_all('%id="thread_title_([^"]*)"[^>]*>([^<]+)</a>.{350,1000}?alt="Votes:\s+\d+\s+Score: ([\d-]+)%si', $html, $results, PREG_SET_ORDER);

 

It seems to have made the URL more readable (somehow they were embedding a link WITHIN THEIR SOURCE CODE), and now it only returns threads where "Score:" is within 350 and 1000 characters of the thread title.

 

I stand on the shoulders of MadTechie. Thanks!

Link to comment
Share on other sites

Okay well if you must use a RegEx, then here's an ugly one

 

<td class="mobile_pen_right_alt"></td></tr><tr><td class="mobile_pen_left"></td><td class="mobile_pen">
   <div class="mobile_threadbit">   
   <div class="mobile_threadbit_title">
         
         
          <a href="showthread.php?t=18523545" id="thread_title_18523544">Steroids can't be tested on everyone</a>

      </div>
   
      <div class="mobile_threadbit_details">
      Today 06:31 PM -
           
            <a href="member.php?u=4674" rel="nofollow">dudeabiding</a>
           


<!-- Replies Views -->
   
       <br>Replies: 0
      - Views: 142 - <a href="showthread.php?t=18523544&page=">Last Page</a>
         
   
<br>
         

</div>
</div>
</td><td class="mobile_pen_right_alt"></td></tr><tr><td class="mobile_pen_left"></td><td class="mobile_pen">
   <div class="mobile_threadbit">   
   <div class="mobile_threadbit_title">
         
         
          <a href="showthread.php?t=18523545" id="thread_title_18523545">Cleveland Indians 2010 Odds</a>

      </div>
   
      <div class="mobile_threadbit_details">
      Today 06:31 PM -
           
            <a href="member.php?u=4971" rel="nofollow">larrypaging</a>
           


<!-- Replies Views -->
   
       <br>Replies: 0
      - Views: 142 - <a href="showthread.php?t=18523545&page=">Last Page</a>
         
   
<br>
   
         <span>

           

            <img class="inlineimg" src="/images/rating/rating1.gif" border="0" alt="Votes: 1 Score: 1" />
         </span>
         

</div>
</div>
</td><td class="mobile_fun_right"></td></tr><tr><td class="mobile_fun_left_alt"></td><td class="mobile_fun_alt">
   <div class="mobile_threadbit">   
   <div class="mobile_threadbit_title">
         
         
          <a href="showthread.php?t=18523546" id="thread_title_18523546" style="font-weight:bold">We need another thing like the Albert Belle Candy Bar</a>
      </div>

   
      <div class="mobile_threadbit_details">
      Today 06:24 PM -
           
            <a href="member.php?u=5614" rel="nofollow">lofton4ever</a>
           


<!-- Replies Views -->
   
       <br>Replies: 3
      - Views: 292 - <a href="showthread.php?t=18523546&page=">Last Page</a>
         
   
<br>
   
         <span>
           

            <img class="inlineimg" src="/images/rating/rating-2.gif" border="0" alt="Votes: 2 Score: -2" />

         </span>
         

</div>
</div>
</td>

 

preg_match_all ( '%<a href="showthread\.php([^"]*)"[^>]*>([^<]+)</a>\s*</div>\s*<div class="mobile_threadbit_details">\s*[^<]*[^<]*[^>]*>[^>]*>\s*<!-- Replies Views -->\s*<br>Replies: \d+\s+[^>]*>[^>]*>\s*<br>\s+<span>\s*<img [^>]*? alt="Votes:\s+\d+\s+Score: ([\d-]+)%sim', $html, $results, PREG_SET_ORDER );
foreach ( $results as $result ) {
echo $result [1] . " -> ";
echo $result [2] . " -> ";
echo $result [3] . "<BR />\n";
}

 

?t=18523545 -> Cleveland Indians 2010 Odds -> 1

?t=18523546 -> We need another thing like the Albert Belle Candy Bar -> -2

 

seam correct to me!

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.