Jump to content

Recommended Posts

In fact it's a long string, I'm using curl to retrieve some info from a page

<td width="6%" height="1" valign="top" nowrap >De:</td><td width="81%" height="1" valign="top" >" Here is the text " <<A href=\'compose.php?nomUsr"

It's anything where "Here is the text" is

Well.. I did something like this and it didn't work

<?
preg_match("<td width=\"6%\" height=\"1\" valign=\"top\" nowrap >De:</td><td width=\"81%\" height=\"1\" valign=\"top\" >"\s>([^<])*" <<A href='compose.php?nomUsr", $content, $s_de);
?>

I'm using the hole string because there are lots of HTML tags around the content

Well.. I did something like this and it didn't work

<?
preg_match("<td width=\"6%\" height=\"1\" valign=\"top\" nowrap >De:</td><td width=\"81%\" height=\"1\" valign=\"top\" >"\s>([^<])*" <<A href='compose.php?nomUsr", $content, $s_de);
?>

 

I just glanced at this and at first sight, I spot an immediate problem.. lack of delimiters... well, what is actually happening is that the regex engine sees the initial < in preg_match("<td... and treats this as the opening delimiter.. but in this case (which is not even the intended result), there must be a closing > delimiter at the end of the pattern. You can read about delimiters here.

 

In addition to the link Mchl provided, you can also read about regex from these as well:

 

webtoolscollection

Mastering Regular Exrpessions book

PHPFreak resources

PHPFreaks tutorial

 

More than enough stuff to get you started ;)

 

 

I haven't had a good close look at your pattern, but if you surrounded the entire pattern with proper delimiters (which can be any non whitespaced, nonalphanumeric character but a backslash), this will solve that issue.. so delimiters like /...../ or !......! or #....# or ~.....~ by example will suffice.

So if you used /..../ as your delimiter, any / characters within it should be escaped, like \/ (as a rule, I tend to avoid / as delimiters for this very reason.. I personally prefer #..#, as the odds of running into # within the pattern is slim to none, so I don't have to concern myself with much escaping.

 

Also note that you surrounded your entire preg statement with double quotes: preg_match("........"). This means (as you seem to have noticed, that you must also escape double quotes inside your pattern. I would recommend getting into the habit of using single quotes for surrounding the pattern, just so you don't have to escape double quotes within your pattern (similar to using something other than / for delimiters, to avoid needing to escape those as well). There is no harm in not doing these things mind you.. so long as you have the proper stuff escaped, that's one less problem you'll have.. but for the sake of making things easier, why not use characters and quotes that don't need any escaping (or if there is a need for it, only occasionally?).

 

It is easier best if you have a look in the links Mchl and I provided if you want to learn preg (PCRE). It's an extremely valuable toolset to understand and have at your disposal.

I have this..

             <td height="1" valign="top" nowrap>Recibido el:</td>
             <td height="1" valign="top">15 Mar 2005 21:14</td>
             </tr>
<!-- INICIO DESTINATARIOS -->
           <tr>
             <td height="1" valign="top" nowrap>Para:</td>

             <td height="1" valign="top">Lavezzari Georgina <<A href='compose.php?nomUsr=glavezzari'>glavezzari</A>></td>
             </tr>
	   
           <tr>
             <td height="1" valign="top" nowrap>Asunto:</td>
             <td height="1" valign="top">class 1</td>

And using preg_match('#<td height="1" valign="top">(.*)</td>#', $content, $s_asunto); only prints:

Array

(

    [0] => <td height="1" valign="top">15 Mar 2005 21:14</td>

    [1] => 15 Mar 2005 21:14

)

Why does it only finds one of them?

Lol, thanks, about the | metacharacter, is it ok used like this? preg_match_all('#<td height="1" valign="top">(.*)</td>|<<A href=#', $content, $s_asunto);

I'm triying to get the content in the middle matching it if </td> or <<A href=# is found at the end. Regex is getting me a little confused

Just some posting advice, please make use of


and


when posting your code (instead of lumping your code in with your sentences).. like this:

 

 

preg_match_all('#<td height="1" valign="top">(.*)</td>|<<A href=#', $content, $s_asunto);

 

 

Which colour codes everything for easy readability:

preg_match_all('#<td height="1" valign="top">(.*)</td>|<<A href=#', $content, $s_asunto);

 

The code tags will not colour code anything.

 

To answer your question, no, | will not work as is, because alternation make use of brackets.. the format is (...|...). Since we don't want to capture in this alternation, we use (?: .... | ... ).

 

So perhaps something like this is what you are looking for?

$content = <<<HTML
             <td height="1" valign="top" nowrap>Recibido el:</td>
             <td height="1" valign="top">15 Mar 2005 21:14</td>
             </tr>
<!-- INICIO DESTINATARIOS -->
           <tr>
             <td height="1" valign="top" nowrap>Para:</td>

             <td height="1" valign="top">Lavezzari Georgina <<A href='compose.php?nomUsr=glavezzari'>glavezzari</A>></td>
             </tr>
	   
           <tr>
             <td height="1" valign="top" nowrap>Asunto:</td>
             <td height="1" valign="top">class 1</td>
HTML;

preg_match_all('#<td height="1" valign="top">([^>]+)(?:</td>| <<A href=.+?</td>)#', $content, $s_asunto);
foreach($s_asunto[1] as $val){
echo $val . "<br />\n";
}

 

Output:

15 Mar 2005 21:14
Lavezzari Georgina
class 1

I've been reading some tutorials, but I can't figure out how to get the content from here:

reg_match('#<td height="99%" colspan="3" class="mensajeBody">(.+?)</td>#', $content, $s_body);

From:

             <td height="99%" colspan="3" class="mensajeBody"><P>Los que no presenten el TP de la semana 4 en el día de la fecha antes de las 22 hs. quedarán con el primer informe desaprobado.</P>
<P>Saludos</P> 
               </td>

[code]
The phpfreak's tutorial says that ([^<]+) and (.+?) would get anything inside..
And what about if I want to not match the <> and what's inside? Will something like this be ok? ([^<.*>]+)


I've been reading some tutorials, but I can't figure out how to get the content from here:

reg_match('#<td height="99%" colspan="3" class="mensajeBody">(.+?)</td>#', $content, $s_body);

From:

             <td height="99%" colspan="3" class="mensajeBody"><P>Los que no presenten el TP de la semana 4 en el día de la fecha antes de las 22 hs. quedarán con el primer informe desaprobado.</P>
<P>Saludos</P> 
               </td>

[code]

The phpfreak's tutorial says that ([^<]+) and (.+?) would get anything inside..

 

The dot in (.+?) is a dot_match_all, that matches anything other than newlines. So since your sample has a newline in it, the dot_match_all doesn't get to match all the way to </td>. So the answer here is by adding an 's' modifier after the closing delimiter (noted here by the red s):

 

preg_match('#<td height="99%" colspan="3" class="mensajeBody">(.+?)</td>#s', $content, $s_body);

 

Character classes like ([^<]+) don't care about \n (newline) and as a result, they aren't effected by them.. but if you want a dot_match_all to include newlines, you need the s modifier. You can read about modifiers here.

 

And what about if I want to not match the <> and what's inside? Will something like this be ok? ([^<.*>]+)

Instead of asking, you can always try it out. ;) And no, that won't work. What you have to understand is that a character class [..] looks for a single character that is either in it or not.. depending on the usage of ^.. so doing that, you are basically saying, anything that is not a < nor a dot nor a star nor a > one or more times. Before tackling those kind of issues, I would focus on working through tutorials and getting more comfortable with regex basics first (kind of akin to learning how to walk before learning to do back flips).

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.