Jump to content

Not getting all of Needle out of Haystack


coder007

Recommended Posts

Hey!

 

This is my RegEx (PCRE - preg):

 

preg_match("/<td><center>([0-9,.]+)<\/td><\/center><td><center>([0-9,.]+)<\/td><\/center><td><center>([0-9,.]+)<\/td><\/center><td><center>([0-9])<\/td><\/center>/", $line, $matches);

 

This is the haystack:

 

<td><center>5/30/2009 3:07:08 AM</td></center><td><center>66<br>Days</td></center><td><center><img border="0" src="images/teams/team_Orange.gif" width="14" height="13" title="Team: Orange"> </td></center><td><center>125,633.273</td></center><td><center>15,604.47</td></center><td><center>9,627.72</td></center><td><center><img src='images/war.gif' border=0 title='War is an option'> </td></center></tr><tr bgcolor='#ffffff'> 

 

The values I am trying to get are the numbers, namely the date (5/30/2009 3:07:08 AM), the first number (125,633.273), the second number (15,604.47), and the third number (9627.72).

 

Currently I am getting 0's for all but the date.

 

Thanks in advance!

 

Link to comment
Share on other sites

When dot is used literally it needs to be escaped with a backslash, i.e:

 

([0-9,\.]+)

 

Not within a character class it doesn't. Almost all meta characters lose their special meaning within the class (the dot included).

Link to comment
Share on other sites

Note to OP... that chunk of code has improper nesting (which could make things a little trickier to navigate):

 

<td><center>125,633.273</td></center>

should be

<td><center>125,633.273</center></td>

 

So perhaps a quick and dirty way could be:

$str = <<<HTML
<td><center>5/30/2009 3:07:08 AM</td></center><td><center>66<br>Days</td></center><td><center><img border="0" src="images/teams/team_Orange.gif" width="14" height="13" title="Team: Orange"> </td></center><td><center>125,633.273</td></center><td><center>15,604.47</td></center><td><center>9,627.72</td></center><td><center><img src='images/war.gif' border=0 title='War is an option'> </td></center></tr><tr bgcolor='#ffffff'>
HTML;

preg_match('#((?:\d{1,2}/){2}\d{4}[^<]+).+?([\d,]+\.\d+).+?([\d,]+\.\d+)#', $str, $match);
echo $match[1] . "<br />\n" . $match[2] . "<br />\n" . $match[3];

 

EDIT - The way I grab those two sets of numbers relies on the decimal to be involved. Again, this is a quick fast way with no fus.. as those tags are not ordered correctly in some spots, which makes me question the consistency of the code you are checking...

Link to comment
Share on other sites

Well, what a shocker! Didn't know that...

 

Perhaps this will work better:

 

preg_match('#<td><center>([^<]+)</td></center><td><center>66<br>Days</td></center><td><center><img border="0" src="images/teams/team_Orange.gif" width="14" height="13" title="Team: Orange"> </td></center><td><center>([^<]+)</td></center><td><center>([^<]+)</td></center><td><center>([^<]+)</td>#', $line, $matches);

 

EDIT: Probs best off going with nrg_alpha's!

Link to comment
Share on other sites

I suppose I could have included say <td><center> at the start of my pattern to help ensure the matching of the appropriate date location (in the event there is other dates located on the page - would have been nice if those td tags had some ids or classes to help differentiate themselves though).

Link to comment
Share on other sites

Perhaps cutting and paste a 'small' portion of the code (containing one or two examples) if it differs from what you initially posted.. because I used that line of code you posted as a test, and it worked... so I'm thinking there might be some variances in the code that the pattern isn't taking into account?

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.