Jump to content


Photo

Regex Help Required :)


  • Please log in to reply
5 replies to this topic

#1 Orio

Orio
  • Staff Alumni
  • Advanced Member
  • 2,491 posts

Posted 21 October 2006 - 10:07 AM

Hello :)

I am coding a little project for myself, but I ran into a problem when it came to regex.

What basicly I want to do it filter results I get when I search using cURL.
So I have HTML stored in the variable $html, and it holds something that looks like this:
// Alot of HTML above

<table>
  <tr>
    <td class="box_content" align="center">186,996</td>
    <td class="box_content"><a href="/viewprofile.php?session=&id=1384922">User1</a></td>
    <td class="box_content" align="center">268,655,655</td>
    <td class="box_content" align="center">660</td>
    <td class="box_content" align="center">12</td>
    <td class="box_content" align="center"><img src="/images/images/2.gif"></td>
  </tr>
  <tr>
    <td class="box_content" align="center">186,997</td>
    <td class="box_content"><a href="/viewprofile.php?session=&id=1183963">User2</a></td>
    <td class="box_content" align="center">778,138</td>
    <td class="box_content" align="center">163</td>
    <td class="box_content" align="center">12</td>
    <td class="box_content" align="center"><img src="/images/images/3.gif"></td>
  </tr>

//////////More and more of these table rows....

  <tr>
    <td class="box_content" align="center">187,000</td>
    <td class="box_content"><a href="/viewprofile.php?session=&id=1172426">User50</a></td>
    <td class="box_content" align="center">364,387,830</td>
    <td class="box_content" align="center">6,200</td>
    <td class="box_content" align="center">12</td>
    <td class="box_content" align="center"><img src="/images/images/4.gif"></td>
  </tr>
</table>

//More HTML below

As you can see, every page holds 50 results. Moving to the next page and everything is no problem, but the problem is the filtering itself- I want to echo only the user's who have in their third coulmn a value of 200 million or greater. In this HTML example user50 and user1.

So what I basicly thought of doing is to break the code using explode("<tr>", $html) and then (using a regular expression and preg_match_all I suppose) get all the info I need (between the <td> tags, I dont care echoing the link to the username too). Then I'll "clean" the numbers from commas etc' and check if the number I want is greater than 200mil. If it is, print the user.

So, I'd really appreciate it if someone could help me get the information between the <td> tags, it has really been a struggle for me and I think google is going to ban me soon for searching too much  :D


Thanks alot :)
Orio.
Think you're smarty?

(Gone until 20 to November)

#2 c4onastick

c4onastick
  • Members
  • PipPipPip
  • Advanced Member
  • 216 posts

Posted 21 October 2006 - 03:54 PM

I'm certianly not a master at regex, but I have a similar project that I created. I think you might be shooting yourself in the foot with the explode command. Sure, you can do it that way, but it seems like it makes a bigger mess. I'd do one "preg_match_all" command, with sub expressions. That'll dump all the information into an array that you can step through and filter out the users that don't hit 200 million.

preg_match_all( '/<tr>.*?<td class=\"box_content\" align=\"center\">[0-9,]+<\/td>.*?<td class=\"box_content\"><a href=\"\/viewprofile.php\?session=&id=[0-9]+\">([a-zA-Z0-9]+)<\/a><\/td>.*?<td class=\"box_content\" align=\"center\">([0-9,]{11})<\/td>/is', $html, $result);

I tested it at http://regexlib.com/RETester.aspx, it works for the example you posted. And  (I'm a little proud of this part) it'll only grab people with at least 100 million, so you get half of your filtering done right there with one function!
Regex Tester::Unicode Regex::PHP Function List::MySQL 5.1
"Sorry sweetheart... but this all day sucker is down to the soggy white stick." -- Topper Harley

#3 Orio

Orio
  • Staff Alumni
  • Advanced Member
  • 2,491 posts

Posted 21 October 2006 - 04:01 PM

Looks good :D Thanks!
I haven't tested it yet, because I haven't finished the whole script, but can you tell me how $result will look like so I can use it properly?

Orio.
Think you're smarty?

(Gone until 20 to November)

#4 c4onastick

c4onastick
  • Members
  • PipPipPip
  • Advanced Member
  • 216 posts

Posted 21 October 2006 - 04:52 PM

$result will be a multidimensional array looking something like this:
Array (
    Array (Each occurance of full pattern matched, for you <tr>blah blah User1 blah blah 200 million</td>),
    Array (Each occurance of First sub pattern, here User1),
    Array (Each occurance of Second sub pattern, here 200 million)
      )

I'd step through them with something like:
for($i=0; $i<count($result[0]); $i++)
{
   $user = $result[1][$i];
   $numeber = $result[2][$i];
   //More code
}
I'm not sure where the 200 million numebr is going to go, but that preg_match_all will pull it out with the commas in it, so you'll have to remove those if you need to do math on it.

Regex Tester::Unicode Regex::PHP Function List::MySQL 5.1
"Sorry sweetheart... but this all day sucker is down to the soggy white stick." -- Topper Harley

#5 Orio

Orio
  • Staff Alumni
  • Advanced Member
  • 2,491 posts

Posted 21 October 2006 - 07:41 PM

I got everything working prefectly :D
Thanks a ton!!!

Orio.
Think you're smarty?

(Gone until 20 to November)

#6 c4onastick

c4onastick
  • Members
  • PipPipPip
  • Advanced Member
  • 216 posts

Posted 21 October 2006 - 09:52 PM

Glad to help!
Regex Tester::Unicode Regex::PHP Function List::MySQL 5.1
"Sorry sweetheart... but this all day sucker is down to the soggy white stick." -- Topper Harley




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users