Lautarox Posted April 27, 2009 Share Posted April 27, 2009 I want to extract phrases from a text, something like: "The bird was flying when the nest felt" -> extract flying that is between "The bird was" and "when the nest felt" How could I do it using preg_match? Thanks! Quote Link to comment https://forums.phpfreaks.com/topic/155903-preg_match-and-results/ Share on other sites More sharing options...
Mchl Posted April 27, 2009 Share Posted April 27, 2009 What is the condition to extract this phrase? Anythin between "was" and "when"? Anything that ends in "ing"? Something else? Quote Link to comment https://forums.phpfreaks.com/topic/155903-preg_match-and-results/#findComment-820674 Share on other sites More sharing options...
Lautarox Posted April 27, 2009 Author Share Posted April 27, 2009 In fact it's a long string, I'm using curl to retrieve some info from a page <td width="6%" height="1" valign="top" nowrap >De:</td><td width="81%" height="1" valign="top" >" Here is the text " <<A href=\'compose.php?nomUsr" It's anything where "Here is the text" is Quote Link to comment https://forums.phpfreaks.com/topic/155903-preg_match-and-results/#findComment-820682 Share on other sites More sharing options...
Mchl Posted April 27, 2009 Share Posted April 27, 2009 Ahhh... that's much better preg_match("/valign=\"top\"\s>([^<])*<A href/",$string,$match); echo $match[1]; That should do it I think (but I just learned this stuff myself yesterday ) Quote Link to comment https://forums.phpfreaks.com/topic/155903-preg_match-and-results/#findComment-820689 Share on other sites More sharing options...
Lautarox Posted April 27, 2009 Author Share Posted April 27, 2009 Well.. I did something like this and it didn't work <? preg_match("<td width=\"6%\" height=\"1\" valign=\"top\" nowrap >De:</td><td width=\"81%\" height=\"1\" valign=\"top\" >"\s>([^<])*" <<A href='compose.php?nomUsr", $content, $s_de); ?> I'm using the hole string because there are lots of HTML tags around the content Quote Link to comment https://forums.phpfreaks.com/topic/155903-preg_match-and-results/#findComment-820697 Share on other sites More sharing options...
Mchl Posted April 27, 2009 Share Posted April 27, 2009 THe problem is that you will have to escape special regex characters in this string (like ? and . in 'compose.php?' for example) Quote Link to comment https://forums.phpfreaks.com/topic/155903-preg_match-and-results/#findComment-820701 Share on other sites More sharing options...
Lautarox Posted April 27, 2009 Author Share Posted April 27, 2009 It's the only option I have.. I need that unique part of the code, how could I escape those characters? Quote Link to comment https://forums.phpfreaks.com/topic/155903-preg_match-and-results/#findComment-820707 Share on other sites More sharing options...
Mchl Posted April 27, 2009 Share Posted April 27, 2009 With \ Here,s great tutorial http://www.regular-expressions.info/tutorial.html Quote Link to comment https://forums.phpfreaks.com/topic/155903-preg_match-and-results/#findComment-820708 Share on other sites More sharing options...
Lautarox Posted April 27, 2009 Author Share Posted April 27, 2009 I'll take a look at it, thanks! Quote Link to comment https://forums.phpfreaks.com/topic/155903-preg_match-and-results/#findComment-820713 Share on other sites More sharing options...
nrg_alpha Posted April 27, 2009 Share Posted April 27, 2009 Well.. I did something like this and it didn't work <? preg_match("<td width=\"6%\" height=\"1\" valign=\"top\" nowrap >De:</td><td width=\"81%\" height=\"1\" valign=\"top\" >"\s>([^<])*" <<A href='compose.php?nomUsr", $content, $s_de); ?> I just glanced at this and at first sight, I spot an immediate problem.. lack of delimiters... well, what is actually happening is that the regex engine sees the initial < in preg_match("<td... and treats this as the opening delimiter.. but in this case (which is not even the intended result), there must be a closing > delimiter at the end of the pattern. You can read about delimiters here. In addition to the link Mchl provided, you can also read about regex from these as well: webtoolscollection Mastering Regular Exrpessions book PHPFreak resources PHPFreaks tutorial More than enough stuff to get you started Quote Link to comment https://forums.phpfreaks.com/topic/155903-preg_match-and-results/#findComment-820731 Share on other sites More sharing options...
Lautarox Posted April 27, 2009 Author Share Posted April 27, 2009 I've changed the <td for /<\td ,the width like this, width=\"81\/%\ and the href like this <A href='compose\/.php\/?nomUsr , is that ok? how could I create a clear delimiter? Quote Link to comment https://forums.phpfreaks.com/topic/155903-preg_match-and-results/#findComment-820749 Share on other sites More sharing options...
nrg_alpha Posted April 27, 2009 Share Posted April 27, 2009 I haven't had a good close look at your pattern, but if you surrounded the entire pattern with proper delimiters (which can be any non whitespaced, nonalphanumeric character but a backslash), this will solve that issue.. so delimiters like /...../ or !......! or #....# or ~.....~ by example will suffice. So if you used /..../ as your delimiter, any / characters within it should be escaped, like \/ (as a rule, I tend to avoid / as delimiters for this very reason.. I personally prefer #..#, as the odds of running into # within the pattern is slim to none, so I don't have to concern myself with much escaping. Also note that you surrounded your entire preg statement with double quotes: preg_match("........"). This means (as you seem to have noticed, that you must also escape double quotes inside your pattern. I would recommend getting into the habit of using single quotes for surrounding the pattern, just so you don't have to escape double quotes within your pattern (similar to using something other than / for delimiters, to avoid needing to escape those as well). There is no harm in not doing these things mind you.. so long as you have the proper stuff escaped, that's one less problem you'll have.. but for the sake of making things easier, why not use characters and quotes that don't need any escaping (or if there is a need for it, only occasionally?). It is easier best if you have a look in the links Mchl and I provided if you want to learn preg (PCRE). It's an extremely valuable toolset to understand and have at your disposal. Quote Link to comment https://forums.phpfreaks.com/topic/155903-preg_match-and-results/#findComment-820756 Share on other sites More sharing options...
Lautarox Posted April 28, 2009 Author Share Posted April 28, 2009 I have this.. <td height="1" valign="top" nowrap>Recibido el:</td> <td height="1" valign="top">15 Mar 2005 21:14</td> </tr> <!-- INICIO DESTINATARIOS --> <tr> <td height="1" valign="top" nowrap>Para:</td> <td height="1" valign="top">Lavezzari Georgina <<A href='compose.php?nomUsr=glavezzari'>glavezzari</A>></td> </tr> <tr> <td height="1" valign="top" nowrap>Asunto:</td> <td height="1" valign="top">class 1</td> And using preg_match('#<td height="1" valign="top">(.*)</td>#', $content, $s_asunto); only prints: Array ( [0] => <td height="1" valign="top">15 Mar 2005 21:14</td> [1] => 15 Mar 2005 21:14 ) Why does it only finds one of them? Quote Link to comment https://forums.phpfreaks.com/topic/155903-preg_match-and-results/#findComment-821227 Share on other sites More sharing options...
nrg_alpha Posted April 28, 2009 Share Posted April 28, 2009 To find multiple instances of something, use preg_match_all instead of simply preg_match (which as you just discovered, stops matching after a successful find.) Quote Link to comment https://forums.phpfreaks.com/topic/155903-preg_match-and-results/#findComment-821233 Share on other sites More sharing options...
Lautarox Posted April 28, 2009 Author Share Posted April 28, 2009 Lol, thanks, about the | metacharacter, is it ok used like this? preg_match_all('#<td height="1" valign="top">(.*)</td>|<<A href=#', $content, $s_asunto); I'm triying to get the content in the middle matching it if </td> or <<A href=# is found at the end. Regex is getting me a little confused Quote Link to comment https://forums.phpfreaks.com/topic/155903-preg_match-and-results/#findComment-821277 Share on other sites More sharing options...
nrg_alpha Posted April 28, 2009 Share Posted April 28, 2009 Just some posting advice, please make use of and when posting your code (instead of lumping your code in with your sentences).. like this: preg_match_all('#<td height="1" valign="top">(.*)</td>|<<A href=#', $content, $s_asunto); Which colour codes everything for easy readability: preg_match_all('#<td height="1" valign="top">(.*)</td>|<<A href=#', $content, $s_asunto); The code tags will not colour code anything. To answer your question, no, | will not work as is, because alternation make use of brackets.. the format is (...|...). Since we don't want to capture in this alternation, we use (?: .... | ... ). So perhaps something like this is what you are looking for? $content = <<<HTML <td height="1" valign="top" nowrap>Recibido el:</td> <td height="1" valign="top">15 Mar 2005 21:14</td> </tr> <!-- INICIO DESTINATARIOS --> <tr> <td height="1" valign="top" nowrap>Para:</td> <td height="1" valign="top">Lavezzari Georgina <<A href='compose.php?nomUsr=glavezzari'>glavezzari</A>></td> </tr> <tr> <td height="1" valign="top" nowrap>Asunto:</td> <td height="1" valign="top">class 1</td> HTML; preg_match_all('#<td height="1" valign="top">([^>]+)(?:</td>| <<A href=.+?</td>)#', $content, $s_asunto); foreach($s_asunto[1] as $val){ echo $val . "<br />\n"; } Output: 15 Mar 2005 21:14 Lavezzari Georgina class 1 Quote Link to comment https://forums.phpfreaks.com/topic/155903-preg_match-and-results/#findComment-821296 Share on other sites More sharing options...
Lautarox Posted April 28, 2009 Author Share Posted April 28, 2009 Yeah, thanks for your time, I'm going to take a better look to the tutorials, using () confuses me a lot.. Quote Link to comment https://forums.phpfreaks.com/topic/155903-preg_match-and-results/#findComment-821428 Share on other sites More sharing options...
Lautarox Posted April 29, 2009 Author Share Posted April 29, 2009 I've been reading some tutorials, but I can't figure out how to get the content from here: reg_match('#<td height="99%" colspan="3" class="mensajeBody">(.+?)</td>#', $content, $s_body); From: <td height="99%" colspan="3" class="mensajeBody"><P>Los que no presenten el TP de la semana 4 en el día de la fecha antes de las 22 hs. quedarán con el primer informe desaprobado.</P> <P>Saludos</P> </td> [code] The phpfreak's tutorial says that ([^<]+) and (.+?) would get anything inside.. And what about if I want to not match the <> and what's inside? Will something like this be ok? ([^<.*>]+) Quote Link to comment https://forums.phpfreaks.com/topic/155903-preg_match-and-results/#findComment-821534 Share on other sites More sharing options...
nrg_alpha Posted April 29, 2009 Share Posted April 29, 2009 I've been reading some tutorials, but I can't figure out how to get the content from here: reg_match('#<td height="99%" colspan="3" class="mensajeBody">(.+?)</td>#', $content, $s_body); From: <td height="99%" colspan="3" class="mensajeBody"><P>Los que no presenten el TP de la semana 4 en el día de la fecha antes de las 22 hs. quedarán con el primer informe desaprobado.</P> <P>Saludos</P> </td> [code] The phpfreak's tutorial says that ([^<]+) and (.+?) would get anything inside.. The dot in (.+?) is a dot_match_all, that matches anything other than newlines. So since your sample has a newline in it, the dot_match_all doesn't get to match all the way to </td>. So the answer here is by adding an 's' modifier after the closing delimiter (noted here by the red s): preg_match('#<td height="99%" colspan="3" class="mensajeBody">(.+?)</td>#s', $content, $s_body); Character classes like ([^<]+) don't care about \n (newline) and as a result, they aren't effected by them.. but if you want a dot_match_all to include newlines, you need the s modifier. You can read about modifiers here. And what about if I want to not match the <> and what's inside? Will something like this be ok? ([^<.*>]+) Instead of asking, you can always try it out. And no, that won't work. What you have to understand is that a character class [..] looks for a single character that is either in it or not.. depending on the usage of ^.. so doing that, you are basically saying, anything that is not a < nor a dot nor a star nor a > one or more times. Before tackling those kind of issues, I would focus on working through tutorials and getting more comfortable with regex basics first (kind of akin to learning how to walk before learning to do back flips). Quote Link to comment https://forums.phpfreaks.com/topic/155903-preg_match-and-results/#findComment-821547 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.