Jump to content

Regex Help Required :)


Orio

Recommended Posts

Hello :)

I am coding a little project for myself, but I ran into a problem when it came to regex.

What basicly I want to do it filter results I get when I search using cURL.
So I have HTML stored in the variable $html, and it holds something that looks like this:
[code]// Alot of HTML above

<table>
  <tr>
    <td class="box_content" align="center">186,996</td>
    <td class="box_content"><a href="/viewprofile.php?session=&id=1384922">User1</a></td>
    <td class="box_content" align="center">268,655,655</td>
    <td class="box_content" align="center">660</td>
    <td class="box_content" align="center">12</td>
    <td class="box_content" align="center"><img src="/images/images/2.gif"></td>
  </tr>
  <tr>
    <td class="box_content" align="center">186,997</td>
    <td class="box_content"><a href="/viewprofile.php?session=&id=1183963">User2</a></td>
    <td class="box_content" align="center">778,138</td>
    <td class="box_content" align="center">163</td>
    <td class="box_content" align="center">12</td>
    <td class="box_content" align="center"><img src="/images/images/3.gif"></td>
  </tr>

//////////More and more of these table rows....

  <tr>
    <td class="box_content" align="center">187,000</td>
    <td class="box_content"><a href="/viewprofile.php?session=&id=1172426">User50</a></td>
    <td class="box_content" align="center">364,387,830</td>
    <td class="box_content" align="center">6,200</td>
    <td class="box_content" align="center">12</td>
    <td class="box_content" align="center"><img src="/images/images/4.gif"></td>
  </tr>
</table>

//More HTML below[/code]

As you can see, every page holds 50 results. Moving to the next page and everything is no problem, but the problem is the filtering itself- I want to echo only the user's who have in their third coulmn a value of 200 million or greater. In this HTML example user50 and user1.

So what I basicly thought of doing is to break the code using explode("<tr>", $html) and then (using a regular expression and preg_match_all I suppose) get all the info I need (between the <td> tags, I dont care echoing the link to the username too). Then I'll "clean" the numbers from commas etc' and check if the number I want is greater than 200mil. If it is, print the user.

So, I'd really appreciate it if someone could help me get the information between the <td> tags, it has really been a struggle for me and I think google is going to ban me soon for searching too much  :D


Thanks alot :)
Orio.
Link to comment
Share on other sites

I'm certianly not a master at regex, but I have a similar project that I created. I think you might be shooting yourself in the foot with the explode command. Sure, you can do it that way, but it seems like it makes a bigger mess. I'd do one "preg_match_all" command, with sub expressions. That'll dump all the information into an array that you can step through and filter out the users that don't hit 200 million.

[code]preg_match_all( '/<tr>.*?<td class=\"box_content\" align=\"center\">[0-9,]+<\/td>.*?<td class=\"box_content\"><a href=\"\/viewprofile.php\?session=&id=[0-9]+\">([a-zA-Z0-9]+)<\/a><\/td>.*?<td class=\"box_content\" align=\"center\">([0-9,]{11})<\/td>/is', $html, $result);[/code]

I tested it at [url=http://regexlib.com/RETester.aspx]http://regexlib.com/RETester.aspx[/url], it works for the example you posted. And  (I'm a little proud of this part) it'll only grab people with at least 100 million, so you get half of your filtering done right there with one function!
Link to comment
Share on other sites

$result will be a multidimensional array looking something like this:
[code]Array (
    Array (Each occurance of full pattern matched, for you <tr>blah blah User1 blah blah 200 million</td>),
    Array (Each occurance of First sub pattern, here User1),
    Array (Each occurance of Second sub pattern, here 200 million)
      )[/code]

I'd step through them with something like:
[code]for($i=0; $i<count($result[0]); $i++)
{
  $user = $result[1][$i];
  $numeber = $result[2][$i];
  //More code
}[/code]
I'm not sure where the 200 million numebr is going to go, but that preg_match_all will pull it out with the commas in it, so you'll have to remove those if you need to do math on it.
Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.