Jump to content

Archived

This topic is now archived and is closed to further replies.

Orio

Regex Help Required :)

Recommended Posts

Hello :)

I am coding a little project for myself, but I ran into a problem when it came to regex.

What basicly I want to do it filter results I get when I search using cURL.
So I have HTML stored in the variable $html, and it holds something that looks like this:
[code]// Alot of HTML above

<table>
  <tr>
    <td class="box_content" align="center">186,996</td>
    <td class="box_content"><a href="/viewprofile.php?session=&id=1384922">User1</a></td>
    <td class="box_content" align="center">268,655,655</td>
    <td class="box_content" align="center">660</td>
    <td class="box_content" align="center">12</td>
    <td class="box_content" align="center"><img src="/images/images/2.gif"></td>
  </tr>
  <tr>
    <td class="box_content" align="center">186,997</td>
    <td class="box_content"><a href="/viewprofile.php?session=&id=1183963">User2</a></td>
    <td class="box_content" align="center">778,138</td>
    <td class="box_content" align="center">163</td>
    <td class="box_content" align="center">12</td>
    <td class="box_content" align="center"><img src="/images/images/3.gif"></td>
  </tr>

//////////More and more of these table rows....

  <tr>
    <td class="box_content" align="center">187,000</td>
    <td class="box_content"><a href="/viewprofile.php?session=&id=1172426">User50</a></td>
    <td class="box_content" align="center">364,387,830</td>
    <td class="box_content" align="center">6,200</td>
    <td class="box_content" align="center">12</td>
    <td class="box_content" align="center"><img src="/images/images/4.gif"></td>
  </tr>
</table>

//More HTML below[/code]

As you can see, every page holds 50 results. Moving to the next page and everything is no problem, but the problem is the filtering itself- I want to echo only the user's who have in their third coulmn a value of 200 million or greater. In this HTML example user50 and user1.

So what I basicly thought of doing is to break the code using explode("<tr>", $html) and then (using a regular expression and preg_match_all I suppose) get all the info I need (between the <td> tags, I dont care echoing the link to the username too). Then I'll "clean" the numbers from commas etc' and check if the number I want is greater than 200mil. If it is, print the user.

So, I'd really appreciate it if someone could help me get the information between the <td> tags, it has really been a struggle for me and I think google is going to ban me soon for searching too much  :D


Thanks alot :)
Orio.

Share this post


Link to post
Share on other sites
I'm certianly not a master at regex, but I have a similar project that I created. I think you might be shooting yourself in the foot with the explode command. Sure, you can do it that way, but it seems like it makes a bigger mess. I'd do one "preg_match_all" command, with sub expressions. That'll dump all the information into an array that you can step through and filter out the users that don't hit 200 million.

[code]preg_match_all( '/<tr>.*?<td class=\"box_content\" align=\"center\">[0-9,]+<\/td>.*?<td class=\"box_content\"><a href=\"\/viewprofile.php\?session=&id=[0-9]+\">([a-zA-Z0-9]+)<\/a><\/td>.*?<td class=\"box_content\" align=\"center\">([0-9,]{11})<\/td>/is', $html, $result);[/code]

I tested it at [url=http://regexlib.com/RETester.aspx]http://regexlib.com/RETester.aspx[/url], it works for the example you posted. And  (I'm a little proud of this part) it'll only grab people with at least 100 million, so you get half of your filtering done right there with one function!

Share this post


Link to post
Share on other sites
Looks good :D Thanks!
I haven't tested it yet, because I haven't finished the whole script, but can you tell me how $result will look like so I can use it properly?

Orio.

Share this post


Link to post
Share on other sites
$result will be a multidimensional array looking something like this:
[code]Array (
    Array (Each occurance of full pattern matched, for you <tr>blah blah User1 blah blah 200 million</td>),
    Array (Each occurance of First sub pattern, here User1),
    Array (Each occurance of Second sub pattern, here 200 million)
      )[/code]

I'd step through them with something like:
[code]for($i=0; $i<count($result[0]); $i++)
{
  $user = $result[1][$i];
  $numeber = $result[2][$i];
  //More code
}[/code]
I'm not sure where the 200 million numebr is going to go, but that preg_match_all will pull it out with the commas in it, so you'll have to remove those if you need to do math on it.

Share this post


Link to post
Share on other sites
I got everything working prefectly :D
Thanks a ton!!!

Orio.

Share this post


Link to post
Share on other sites

×

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.