Preg_match and results

Lautarox · April 27, 2009

I want to extract phrases from a text, something like:

"The bird was flying when the nest felt" -> extract flying that is between "The bird was" and "when the nest felt"

How could I do it using preg_match?

Thanks!

Mchl · April 27, 2009

What is the condition to extract this phrase? Anythin between "was" and "when"? Anything that ends in "ing"? Something else?

Lautarox · April 27, 2009

In fact it's a long string, I'm using curl to retrieve some info from a page

<td width="6%" height="1" valign="top" nowrap >De:</td><td width="81%" height="1" valign="top" >" Here is the text " <<A href=\'compose.php?nomUsr"

It's anything where "Here is the text" is

Mchl · April 27, 2009

Ahhh... that's much better

preg_match("/valign=\"top\"\s>([^<])*<A href/",$string,$match);
echo $match[1];

That should do it I think (but I just learned this stuff myself yesterday )

Lautarox · April 27, 2009

Well.. I did something like this and it didn't work

<?
preg_match("<td width=\"6%\" height=\"1\" valign=\"top\" nowrap >De:</td><td width=\"81%\" height=\"1\" valign=\"top\" >"\s>([^<])*" <<A href='compose.php?nomUsr", $content, $s_de);
?>

I'm using the hole string because there are lots of HTML tags around the content

Mchl · April 27, 2009

THe problem is that you will have to escape special regex characters in this string (like ? and . in 'compose.php?' for example)

Lautarox · April 27, 2009

It's the only option I have.. I need that unique part of the code, how could I escape those characters?

Mchl · April 27, 2009

With \

Here,s great tutorial

http://www.regular-expressions.info/tutorial.html

Lautarox · April 27, 2009

I'll take a look at it, thanks!

nrg_alpha · April 27, 2009

Well.. I did something like this and it didn't work

<?
preg_match("<td width=\"6%\" height=\"1\" valign=\"top\" nowrap >De:</td><td width=\"81%\" height=\"1\" valign=\"top\" >"\s>([^<])*" <<A href='compose.php?nomUsr", $content, $s_de);
?>

I just glanced at this and at first sight, I spot an immediate problem.. lack of delimiters... well, what is actually happening is that the regex engine sees the initial < in preg_match("<td... and treats this as the opening delimiter.. but in this case (which is not even the intended result), there must be a closing > delimiter at the end of the pattern. You can read about delimiters here.

In addition to the link Mchl provided, you can also read about regex from these as well:

webtoolscollection

Mastering Regular Exrpessions book

PHPFreak resources

PHPFreaks tutorial

More than enough stuff to get you started

Lautarox · April 27, 2009

I've changed the <td for /<\td ,the width like this, width=\"81\/%\ and the href like this <A href='compose\/.php\/?nomUsr , is that ok? how could I create a clear delimiter?

nrg_alpha · April 27, 2009

I haven't had a good close look at your pattern, but if you surrounded the entire pattern with proper delimiters (which can be any non whitespaced, nonalphanumeric character but a backslash), this will solve that issue.. so delimiters like /...../ or !......! or #....# or ~.....~ by example will suffice.

So if you used /..../ as your delimiter, any / characters within it should be escaped, like \/ (as a rule, I tend to avoid / as delimiters for this very reason.. I personally prefer #..#, as the odds of running into # within the pattern is slim to none, so I don't have to concern myself with much escaping.

Also note that you surrounded your entire preg statement with double quotes: preg_match("........"). This means (as you seem to have noticed, that you must also escape double quotes inside your pattern. I would recommend getting into the habit of using single quotes for surrounding the pattern, just so you don't have to escape double quotes within your pattern (similar to using something other than / for delimiters, to avoid needing to escape those as well). There is no harm in not doing these things mind you.. so long as you have the proper stuff escaped, that's one less problem you'll have.. but for the sake of making things easier, why not use characters and quotes that don't need any escaping (or if there is a need for it, only occasionally?).

It is easier best if you have a look in the links Mchl and I provided if you want to learn preg (PCRE). It's an extremely valuable toolset to understand and have at your disposal.

Lautarox · April 28, 2009

I have this..

             <td height="1" valign="top" nowrap>Recibido el:</td>
             <td height="1" valign="top">15 Mar 2005 21:14</td>
             </tr>
<!-- INICIO DESTINATARIOS -->
           <tr>
             <td height="1" valign="top" nowrap>Para:</td>

             <td height="1" valign="top">Lavezzari Georgina <<A href='compose.php?nomUsr=glavezzari'>glavezzari</A>></td>
             </tr>
	   
           <tr>
             <td height="1" valign="top" nowrap>Asunto:</td>
             <td height="1" valign="top">class 1</td>

And using preg_match('#<td height="1" valign="top">(.*)</td>#', $content, $s_asunto); only prints:

Array

(

[0] => <td height="1" valign="top">15 Mar 2005 21:14</td>

[1] => 15 Mar 2005 21:14

)

Why does it only finds one of them?

nrg_alpha · April 28, 2009

To find multiple instances of something, use preg_match_all instead of simply preg_match (which as you just discovered, stops matching after a successful find.)

Lautarox · April 28, 2009

Lol, thanks, about the | metacharacter, is it ok used like this? preg_match_all('#<td height="1" valign="top">(.*)</td>|<<A href=#', $content, $s_asunto);

I'm triying to get the content in the middle matching it if </td> or <<A href=# is found at the end. Regex is getting me a little confused

nrg_alpha · April 28, 2009

Just some posting advice, please make use of

and

when posting your code (instead of lumping your code in with your sentences).. like this:

preg_match_all('#<td height="1" valign="top">(.*)</td>|<<A href=#', $content, $s_asunto);

Which colour codes everything for easy readability:

preg_match_all('#<td height="1" valign="top">(.*)</td>|<<A href=#', $content, $s_asunto);

The code tags will not colour code anything.

To answer your question, no, | will not work as is, because alternation make use of brackets.. the format is (...|...). Since we don't want to capture in this alternation, we use (?: .... | ... ).

So perhaps something like this is what you are looking for?

$content = <<<HTML
             <td height="1" valign="top" nowrap>Recibido el:</td>
             <td height="1" valign="top">15 Mar 2005 21:14</td>
             </tr>
<!-- INICIO DESTINATARIOS -->
           <tr>
             <td height="1" valign="top" nowrap>Para:</td>

             <td height="1" valign="top">Lavezzari Georgina <<A href='compose.php?nomUsr=glavezzari'>glavezzari</A>></td>
             </tr>
	   
           <tr>
             <td height="1" valign="top" nowrap>Asunto:</td>
             <td height="1" valign="top">class 1</td>
HTML;

preg_match_all('#<td height="1" valign="top">([^>]+)(?:</td>| <<A href=.+?</td>)#', $content, $s_asunto);
foreach($s_asunto[1] as $val){
echo $val . "<br />\n";
}

Output:

15 Mar 2005 21:14
Lavezzari Georgina
class 1

Lautarox · April 28, 2009

Yeah, thanks for your time, I'm going to take a better look to the tutorials, using () confuses me a lot..

Lautarox · April 29, 2009

I've been reading some tutorials, but I can't figure out how to get the content from here:

reg_match('#<td height="99%" colspan="3" class="mensajeBody">(.+?)</td>#', $content, $s_body);

From:

             <td height="99%" colspan="3" class="mensajeBody"><P>Los que no presenten el TP de la semana 4 en el día de la fecha antes de las 22 hs. quedarán con el primer informe desaprobado.</P>
<P>Saludos</P> 
               </td>

[code]
The phpfreak's tutorial says that ([^<]+) and (.+?) would get anything inside..
And what about if I want to not match the <> and what's inside? Will something like this be ok? ([^<.*>]+)

nrg_alpha · April 29, 2009

I've been reading some tutorials, but I can't figure out how to get the content from here:
reg_match('#<td height="99%" colspan="3" class="mensajeBody">(.+?)</td>#', $content, $s_body);
From:
             <td height="99%" colspan="3" class="mensajeBody"><P>Los que no presenten el TP de la semana 4 en el día de la fecha antes de las 22 hs. quedarán con el primer informe desaprobado.</P>
<P>Saludos</P> 
               </td>

[code]
The phpfreak's tutorial says that ([^<]+) and (.+?) would get anything inside..

The dot in (.+?) is a dot_match_all, that matches anything other than newlines. So since your sample has a newline in it, the dot_match_all doesn't get to match all the way to </td>. So the answer here is by adding an 's' modifier after the closing delimiter (noted here by the red s):

preg_match('#<td height="99%" colspan="3" class="mensajeBody">(.+?)</td>#s', $content, $s_body);

Character classes like ([^<]+) don't care about \n (newline) and as a result, they aren't effected by them.. but if you want a dot_match_all to include newlines, you need the s modifier. You can read about modifiers here.

And what about if I want to not match the <> and what's inside? Will something like this be ok? ([^<.*>]+)

Instead of asking, you can always try it out. And no, that won't work. What you have to understand is that a character class [..] looks for a single character that is either in it or not.. depending on the usage of ^.. so doing that, you are basically saying, anything that is not a < nor a dot nor a star nor a > one or more times. Before tackling those kind of issues, I would focus on working through tutorials and getting more comfortable with regex basics first (kind of akin to learning how to walk before learning to do back flips).

Sign In

Preg_match and results

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Archived

Important Information