Jump to content

ragax

Members
  • Posts

    186
  • Joined

  • Last visited

Everything posted by ragax

  1. Amen. I burned out trying to do that a few months ago. Still recovering.
  2. Superb answer, .josh! This answer gives me the motivation to start a text file with a list of useful threads for the "common expressions" stickie: Phone numbers: matching with blacklists, etc
  3. Yes, but what is the rule to tell the difference between: Non-Matches 000-000-0000 | 123-456-7890 and Matches 800-555-5555 | 333-444-5555 They both have three digits, a dash, three digits, a dash, four digits? What is the rule, in plain English, to know that one is okay and the other one is not okay?
  4. Hi Shadowing, two questions: 1. Why the star in (\d{3}-\d{3}-\d{4})*? This allows the regex to match "" (empty string) as well as 555-2222-2222555-2222222 2. How do I know that 123-456-7890 is a non-match? (What is the rule in plain English, so that I can start thinking about the regex?) Wishing you a fun weekend. [Edit: typo ("these" instead of "this") ]
  5. Hi .josh, I love what you've done. It really does feel like a Spring cleaning. (Even though it's Autumn here in New Zealand). Thanks for explaining about SMF. I should know, I have two SMF forums on obscure topics---they get about one post a year. On good years. Really nice to see such a cool forum as phpfreaks running on SMF, actually. Err... I decline the challenge. (Is that allowed? ) You're totally right, there isn't one. Thank you for making the regex board even more awesome. Wishing you a beautiful day, ragax
  6. By the way, if you guys are in a debating mood (Wednesdays sometimes seem to trigger that), I'd really appreciate your thoughts, input, musings, insights, focus and raves on this thread. It needs some senior blood.
  7. Not to defend it, but to lay out the pros and cons of various methods depending on the input and the user's needs. I am not attached to it, that would make me a complete idiot. I'm curious, after reading my post, does it really sound like I am attached to that regex? If so, I have a lot to learn about expressing myself. :'( I like your solution. Simple and elegant. Of course we still don't know much about the OP's input and his actual needs. Wishing you all a fun day.
  8. Hi .josh, Truly nice to hear from you, I enjoy your posts a lot. Thankfully not, because I tend to use delimiters that don't need escaping, depending on the expression within. Usually either commas, tildes or hashes. As you know, whichever delimiter you choose, it may need escaping sometimes. Commas are part of the syntax, so they certainly need escaping more often than tildes. On an esthetic level I like commas a lot, as they let the regex stand out like a building over the horizon. I like tildes a lot too. Hashes are a bit blocky for my taste. To me #Hashes# feel like SHOUTING. And slashes tend to degenerate into wiggle art. \/\/ I don't want to get into an argument with you, but "no point" seems a bit strong. The expression does get us in the ballpark of something that is very close to a date, except for odd cases like 31/11. IMO, whether this is helpful is not, neither of us can judge. It's for the OP to say, depending on his needs. Not knowing his exact needs, I made something that seemed close to his question, offering to help if that was not what he was looking for. What you offered could be exactly what he needs, I cannot say. For instance, he could be looking at input where he knows that all strings of the xx/xx/xx format are dates perfectly formed on a dd/mm/yy pattern, so that we don't need to validate anything, and your solution is perfect. Or he could have a mixture of perfectly formed dates (again no validation needed) but in a variety of formats, such as mm/dd/yy, dd/mm/yy and dd/mm/yyyy, and be looking for a way to fish out the two formats he indicated in the post. Another note: in my experience, imperfect, "best guess" matching is often a reality when you look at long texts. For instance is 07/01/12 January 7, or July first? You have probably noticed that the board gets loads of requests to match an email address. And none of the email address checkers can truly ensure that you have a valid email address. Knowing this, should we just look for (stuff)@(stuff)? Or try to get in the ballpark? That depends on the what we know about the input and the on needs of the user. I'm certainly not attached the expression, I never am---always delighted to see someone coming up with something better or pointing out potential improvements. And this is one of the rare cases when I didn't even craft a regex from scratch (I tweaked something out of the RB library), so I don't have a strong feel for the "balance" of the expression---how it would feel in the hand if it were a hammer. Btw, did you notice that I used your benchmark() function on a recent post? Love it, thanks for that. I used to do my own home-made benchmarking in a way that wasn't nearly so elegant. Sorry for the long message, I just love talking about regex! Not faulting your expression in any way, just a little chat about shades of grey. Wishing you a fun day.
  9. Hi lovelycesar, For speed, I adapted the core expression from the RegexBuddy library, then added some code. Displaying all the pieces for you, just take those you need: Input: is there a date like 07/03/12 or 31/12/2008 in here? Code: <?php $regex=',(0[1-9]|[12][0-9]|3[01])/(0[1-9]|1[012])/((?:19|20)?[0-9]{2}),'; $string='is there a date like 07/03/12 or 31/12/2008 in here?'; $size=preg_match_all($regex, $string, $m); for ($i=0;$i<$size;$i++) { echo "Date # ".($i+1).": ".$m[0][$i]."<br />"; echo "Day / Match # ".($i+1).": ".$m[1][$i]."<br />"; echo "Month / Match # ".($i+1).": ".$m[2][$i]."<br />"; echo "Year / Match # ".($i+1).": ".$m[3][$i]."<br />"; } ?> Output: Date # 1: 07/03/12 Day / Match # 1: 07 Month / Match # 1: 03 Year / Match # 1: 12 Date # 2: 31/12/2008 Day / Match # 2: 31 Month / Match # 2: 12 Year / Match # 2: 2008 Note that this is NOT a date validator: for instance, it will match 31 November, even though November only has 30 days. Let me know if this is what you are looking for and if you have any questions!
  10. Ah, AyKay, I often agree with you, but for this specific case. I checked because it surprised me that parse_url would be faster than a straight regex. (Maybe someone knows: is parse_url full of regex, or built from scratch?). Here's the result of benchmarking, using .josh's very own (and very cool) benchmarking function: times executed (each): 500000 fastest to slowest: Array ( [ragax] => 0.00001112222897780482 [josh] => 0.00001650101801627486 ) biggest difference time: 0.00000537878903847004 fastest is 48.3607% faster than the slowest The code is below if anyone is interested. Wishing you all a beautiful day. function josh() { $string='http://maps.google.com/maps?q=46.055603,14.507732&num=1&t=m&z=12'; $parts = parse_url($string); parse_str($parts['query'],$params); return $params['q']; } echo josh().'<br />'; function ragax() { $string='http://maps.google.com/maps?q=46.055603,14.507732&num=1&t=m&z=12'; $regex=',http://maps\.google\.com/maps\?q=\K[^&]+,'; preg_match($regex,$string,$m); return $m[0]; } echo ragax().'<br />'; benchmark(500000,'ragax','josh'); function benchmark ($rounds=100) { // from .josh on http://www.phpfreaks.com/forums/index.php?topic=338139.0 // syntax: benchmark(10000,'lazy_matchall','greedy_matchall','negative_lookahead'); // if an integer is not passed as first argument, default to 100 if (!is_integer($rounds)) $rounds = 100; // get the rest of the arguments passed to the function $funcs = func_get_args(); // remove first argument from the list, since it is $rounds array_shift($funcs); // for each user defined function... foreach ($funcs as $func) { // if the function doesn't exist, skip it if (!function_exists($func)) break; $time = array(); // call the function the specified/default amount of times for ($c = 0; $c < $rounds; $c++) { // get the current microtime timestamp $start = explode(" ",microtime()); $start = $start[0]; // call the user defined function $func(); // get the current microtime timestamp $end = explode(" ",microtime()); $end = $end[0]; // find out the difference between the two $diff = bcsub($end,$start,20); // for some unknown reason, $diff occasionally returns a negative number // dunno if it's a bug in bcsub or a bug in microtime or maybe sometimes // it just goes so damn fast it goes backwards in time! anyways...let's // just weed it out, make sure it doesn't skew the results. if ($diff > 0) $time[$c] = $diff; } // end $c // get the average time $average[$func] = rtrim(bcdiv(array_sum($time),count($time),20),'0'); } // end foreach funcs // sort the averages ascending (preserving the keys, because we used the // user defined function name as the array keys asort($average,SORT_NUMERIC); // get the fastest average $fastest = max($average); // get the slowest average $slowest = min($average); // display how many times we executed each function echo "times executed (each): ".$rounds . "<br/>"; // display the averages echo "fastest to slowest:<br/><pre>";print_r($average); echo "</pre>"; // display the time difference between slowest and fastest echo "biggest difference time: " . rtrim(bcsub($fastest,$slowest,20),'0') . "<br/>"; // calculate and display how much faster the fastest one was, compared // to the slowest, as a percentage $percent = rtrim(round(bcmul(bcsub(bcdiv($fastest,$slowest,20),1,20),100,20),4),'0'); echo "fastest is " . $percent . "% faster than the slowest"; }
  11. I like it love it that in a galaxy far far away some people are free of the laws of the FB empire... and that forums still exist.
  12. Hi Regex gang! Hope you all had a nice weekend. For a few months, I have been visiting the regex board nearly daily and reading all the posts. I LOVE this board and all the stimulating activity! But, seeing the five stickies at the top of the board everyday, it seems to me that perhaps these threads are a tad outdated and not tremendously helpful. Of the five, three are locked, and only one seems fairly current and useful. I'd like to share a few thoughts about this in order to get the convo going with the ordinary contributors (as well as with gurus, mods and admins) who also love this board, hoping that some thoughts will be exchanged and that some way down the line we'll be left with an even better board. Here's my view about the five stickies. These are only opinions, and I'm sharing them "abruptly" in order to stimulate discussion. 1. How to ask a regex question: for me this is the most important thread, as many new posters ask questions in plain English, without posting sample input and desired output. I'd suggest making this topic the top thread on the board, and unlocking it to allow current contributors to post a few examples of well-phrased regex questions. 2. Regex Resources: It should be wonderful for us all to have a stickied Regex Resources thread... but this thread has been locked for more than five years, and effigy, who started it, hasn't been on the forum in over two years. In five years, the web has not stood still. New sites appear, some old resources disappear or become irrelevant. Take a look at the thread... For instance, is this really a good resource? Many of the regex board's current contributors were not around at the time of the stickie. So could it be time for a new regex resources thread where all the current users can contribute?... With that in mind, today I started a Regex Resources... Reloaded thread, which links to the old thread. 3. Highlighting Search Terms: Of the hundreds of regex questions on the regex board, I cannot find a reason why this particular one needs to be stickied. Watching all the questions that go through the board week after week, I very much doubt that posters are more hungry for the info in that post than for the info in any other post. I'd suggest unstickying this post. 4. Common Expressions: Take a look. Do the two-and-a-half posts on this thread really contain useful expressions? I don't think so. And it is hugely outdated (see the use of the deprecated eregi in the first post). If this were a useful stickie, we wouldn't get questions about "how do I match an email address" every third day. I would suggest either unlocking the thread so we can all add something meaningful. Or unstickying it using the "Regex Resources" thread to add links to libraries and "common expressions". (v) Mastering Regular Expressions, Third Edition, is available: should this thread from 2006 really be the top thread on the board? I love Jeffrey's book, and it seems to me that it should be prominent in our "Regex Resources" thread... but does this old announcement need to be the first thing you see on the board??? Something about it reminds me the people who knock at your door to convert you to their religion. Jeffrey's book is fab, but, for the record, PCRE has evolved since 2006: (i) MRE3 doesn't cover everything, e.g features like \K and (?DEFINE), and (ii) the optimizations he mentions are interesting, but having benchmarked them in recent PCRE versions, do not seem to result in speed gains. In summary: for my taste, the "cleaned up stickies" of our beloved regex board would only have two posts, until something else of great value comes along: 1. How to ask a regex question, unlocked, so the people who answer questions can post . 2. sticky a new Regex Resources thread where we can all participate, whether the new Regex Resources... Reloaded or another one. 3. I'd suggest unstickying the other three stickies and letting them fight for survival, like every other thread. Things almost never happen as one initially imagines they should, so I'm not expecting these suggestions to be followed to the letter. But I thought it might be time to kick up the mud a bit to get the conversation rolling and see what everyone else thinks... Whatever the result, for the people who love this board, I am sure it will be an improvement on the current situation. Wishing you all a beautiful week! Peace, ragax
  13. ragax

    Resources

    Hello Regex lovers!!! As of March 2012, the old regex resources thread has been locked for five years. Some of the resources are still great, some have disappeared, some are no longer relevant. What's more, many of us current contributors may be aware of new resources that we can share among ourselves. So it seemed like we use a Reload of our Regex Resources section! To get the ball rolling, as a regex addict, here is the list of regex resources I like at the moment. It's only one person's taste of course. Looking forward to discovering the resources everyone else is digging!!! A. Regex Books 1. For beginner to intermediate regex, my favorite book is The Regex Cookbook, co-authored by Jan Goyvaerts (from RegexBuddy and regular-expressions.info). The book starts out with a strong tutorial. Then the "cookbook approach" provides structure to dive into sub-topics: numbers, urls, markup... On the "not so strong" side, - to my eyes, it looks like the code examples are often generated in RegexBuddy---which does not always produce the ideal code. - to me, many of the "recipes" in the cookbook seem repetitive because they focus on very specific problems (e.g. matching one specific part of an url, then another specific part...), rather than offering solutions or strategies to more "general" problems that you might encounter (e.g how to replace text in certain situations). 2. For advanced regex, my favorite book is Jeffrey Friedl's Mastering Regular Expressions (now in its third edition). True to its title, that book will propel your regex skills a long way toward mastery. That being said, it's not a regex bible, and the reader should be aware of some weaker points: - Some PHP (PCRE) features are not covered (for instance \K) - Some features could be treated in greater depth (e.g. conditionals, recursion, matching strategy) - the PCRE engine has evolved since 2007. I benchmarked most of the suggested "optimizations" by clocking a million matches at a time and did not notice real improvement, suggesting that Philip Hazel & team may have rolled them into the PCRE engine---or that my benchmarking is wrong! 3. I have looked at other books but IMO they don't come close to these two. B. Regex Tuts 1. The classic Tut is Jan Goyvaert's regular-expressions.info. Jan is the author of RegexBuddy (below) and the co-author of the Regex Cookbook (above). Strengths: - a top-notch introduction to regex. The tutorial starts out by giving you a grasp of how a regex engine does its job---very important if you're going to be writing regex. - covers not just php regex, but also other flavors you might encounter, such as javascript and mySQL. That's really handy. Weaknesses: It seems to me that Jan tends not to mention the regex features that RegexBuddy does not support... So if you rely on this tut to learn regex, you may be left with a few gaps in your knowledge. For instance, here are some features not yet supported in RB, and that I haven't found on Jan's tut: \K --- to reset the reported match (?R) --- recursive patterns (?1) --- to repeat sub-patterns (?| --- to duplicate group numbers (?(DEFINE)) --- to define sub-patterns (?(-1)A|B) --- in conditionals, to refer to relative sub-patterns 2. Tooting my own Tut, I strongly recommend Andy's Regex Tutorial. AFAIK, along with Jan's site, it's one of the two largest online regex tuts. The focus is a little different: - lots of examples for those who want their regex to progress to advanced level, - plugging holes when features are not mentioned or developed in books or other tuts, - cutting diagonally across topics that are usually presented linearly, for instance: ----> bringing all the bits of (? syntax in one place ----> discussing "good regex style", matching vs capturing strategies, etc. - grounded in PCRE (PHP), but with examples for regex you might use in other contexts (e.g. Apache, text editors). Weaknesses: - at this stage, not ideal for someone who has never tried regex (as a regex addict, I am more interested in advanced regex). Could use a slow introduction, with detailed explanation of how regex engines work. - you tell me what else could be improved! Comment forms at the bottom of each page, I love feedback. C. Regex Tools 1. RegexBuddy I don't know if I could live without RegexBuddy... well, I guess I could. Here's a permanent link to the latest RegexBuddy demo. I build all my "complex expressions" in RB: in the Test field, I dump my test strings. In the regex field, I type my expression. The matches and captures are highlighted, with instant feedback as you tweak your subject text or expressions. I usually find it faster to craft tricky expressions this way than in PHP. RegexBuddy has other five-star features: - support for many other regex flavors (Java, .NET, Javascript etc). - it will generate "template code" for you if you're not too sure of the syntax to integrate the regex in your language. - if you're starting out with regex, it will help you build your expression by "inserting tokens" On the "weaknesses side"... - there's a Grep function to look inside of files, but for those operations I much prefer the next tool, ABA Search and Replace - the "debug" tab doesn't work for me (clunky interface) - as reported above, a number of features that I use quite a bit are not supported by RB, so I end up developing these expressions directly in PHP: \K (?R) (?1) (?| (?(DEFINE)) (?(-1)A|B) 2. ABA Search and Replace I have fallen in love with this tool, developed by a member of this forum. ABA Search and Replace is a kind of "grep" tool in the sense that it lets you search for text inside files on your disk... but it's way more powerful than a command-line grep, and, for my taste, far easier to use than Jan's PowerGrep (which I don't own but have demoed). Once you've specified your files, in one field, you type your regex (and potentially your replacement expression). In the results box, the matches (and replacements) appear immediately, as you type and modify your expressions! It's magical. You can check boxes to choose which replacements to make, you can copy the matches, etc. On the weaker side, there are a several potential improvements (and Peter assures me that he's working on several of them), such as: - support for advanced regex syntax - specifying the names of the files to be searched by using regex - for me, as a PHP user, it would be 5-star if you could switch to the full PCRE engine in order to tap into the full power of PHP regex. 3. Other tools There are free desktop and online tools. I don't use them because RegexBuddy is at my fingertips, but I've heard good things about the larsolavtorvik online tester and Rad's regex designer. Maybe some of you can share what you think of these. D. Library of Common Expressions I rarely look at expression libraries. RegexBuddy's library has a few expressions that could come in handy for frequently asked questions on this board (e.g. matching email IP addresses). Some recommend regexlib. But frankly, for the largest collection of expressions, it would be hard to beat this board! It's the most active regex board I know of. ------- Well, that's my list at the moment... Please contribute your own favorite resources so we can grow our collective knowledge base! Wishing you all a fun day. Peace, ragax
  14. ragax

    Regex

    Hi Trykiz, Welcome to the phpfreaks forum! Here's one way to deal with it. Code: <?php $string='Maecenas ornare viverra rhoncus http://google.com . Donec pulvinar ipsum nec felis mollis non iaculis turpis pellentesque <a href=\'http://example.com\'>http://example.com</a> . Pellentesque vel convallis lacus <a href=\'http://www.phpfreaks.com\'>phpfreaks</a>'; $regex=',(http://[\w./=@?]++)(?![\'"<]),'; echo htmlentities(preg_replace($regex,'<a href="\\1">\\1</a>',$string)).'<br />'; ?> Output: Maecenas ornare viverra rhoncus <a href="http://google.com">http://google.com</a> . Donec pulvinar ipsum nec felis mollis non iaculis turpis pellentesque <a href='http://example.com'>http://example.com</a> . Pellentesque vel convallis lacus <a href='http://www.phpfreaks.com'>phpfreaks</a> After matching the url, the regex just checks (negative lookahead) that it is not followed by a quote or a < character. Here's more about regex lookarounds. Let me know if you have any questions.
  15. Mmm... You mean, it only works with URLs that contain coordinates (not just with the specific coordinates in the sample string). Yes, the code assumes you were feeding it a url with coordinates, per your request. If not, just add an IF: Code: <?php $string=array('http://maps.google.com/maps?q=46.055603,14.507732&num=1&t=m&z=12','http://google.com','http://maps.google.com/maps?q=99.999,00.000&num=1&t=m&z=12'); $regex=',http://maps\.google\.com/maps\?q=\K[^&]+,'; foreach ($string as $url) if (preg_match($regex,$url,$m)) echo $m[0].'<br />'; else echo 'no coordinates'.'<br />'; ?> Output: 46.055603,14.507732 no coordinates 99.999,00.000 Let me know if you have any questions.
  16. Hi Klepec! Input: http://maps.google.com/maps?q=46.055603,14.507732&num=1&t=m&z=12 Code: <?php $string='http://maps.google.com/maps?q=46.055603,14.507732&num=1&t=m&z=12'; $regex=',http://maps\.google\.com/maps\?q=\K[^&]+,'; preg_match($regex,$string,$m); echo $m[0].'<br />'; ?> Output: 46.055603,14.507732 Let me know if this works for you and if you have any questions.
  17. Wow, that sounds full on! Good luck with your project.
  18. While fully in synch with Psycho about the method to match the content of the brackets, I'd like to offer a couple fine-tuning suggestions for the sake of "clean and compact": - the "e" modifier has no place in this regex - to match a single character out of a list, a character class is more efficient than alternations: [{}] rather than |{|} - no need for the parentheses $regex='#\[[^\]]*\]=|[{}]#'; Wishing you's a fun weekend.
  19. ragax

    preg_replace

    And thank you very much for letting me know about a typo on the tut, asmith! Nothing more precious than a careful reader. Wishing you a fun weekend.
  20. ragax

    preg_replace

    Just in case someone is interested: 1. Multi-Variable Variation (taking care of both regexes, as in the first post) Input: <a href="somethingFile?act=Act_One"></a><a href="somethingFile?act=Act2;var=X"></a><a href="somethingFile?act=Act3;var=Y;var2=Z"></a> Code: <?php $string='<a href="somethingFile?act=Act_One"></a><a href="somethingFile?act=Act2;var=X"></a><a href="somethingFile?act=Act3;var=Y;var2=Z"></a>'; $regex=',somethingFile\?act=([^;"]+)(?,'; $output=preg_replace_callback($regex,function($m){return 'somethingNew/'.$m[1].(isset($m[2])?'/?':'/');},$string); echo htmlentities($output).'<br />'; ?> Output: <a href="somethingNew/Act_One/"></a><a href="somethingNew/Act2/?var=X"></a><a href="somethingNew/Act3/?var=Y;var2=Z"></a> 2. Basic option without callback (only for the first regex) In the first post, I didn't give a code example of the "three basic solutions" if you just wanted to fix the first regex (as the solution I proceeded to give rolled your two regexes into one). But if you were interested, here's one possibility among many (along the lines of option #3 I was mentioning). $string='<a href="somethingFile?act=Act_One"></a><a href="somethingFile?act=Act2;var=X"></a><a href="somethingFile?act=Act3;var=Y;var2=Z"></a>'; $regex=',somethingFile\?act=([^;"]+);,'; $replace='somethingNew/\\1?'; $output=preg_replace($regex,$replace,$string); echo htmlentities($output).'<br />'; Output: <a href="somethingFile?act=Act_One"></a><a href="somethingNew/Act2?var=X"></a><a href="somethingNew/Act3?var=Y;var2=Z"></a> Naturally, the first url is not replaced (it would be a target for the second regex).
  21. ragax

    preg_replace

    Hi asmith, the problem is not that the second link gets matched by the second regex. (If you want to see that, eliminate the second regex: you will get the same output.) The problem is your second greedy plus quantifier. Your second plus matches everything up to the end of the string, so that your Group 2 capture actually is: At that stage, after the first replacement, the whole string has been matched, so there is nothing left for the regex engine to match. This is a classic problem (you will find it explained in detail on this page of mine about various kinds of greedy and lazy regex matching). There are three basic solutions: - making the second plus quantifier lazy so that it only expands until the first end of string or tag marker is found (adding a question mark to the + sign) - changing the character class so that it cannot expand beyond the first end quote (using a negative character class, e.g. [^"] - the easiest: not capturing Group 2 at all, because who cares... At this stage, you are just replacing the semi-colon with a question mark, right? So you can stop. To take care of your two regexes in one single match, I suggest this: Input: <a href="somethingFile?act=Act_One"></a><a href="somethingFile?act=Act2;var=X"></a><a href="somethingFile?act=Act3;var=Y"></a> Code: <?php $string='<a href="somethingFile?act=Act_One"></a><a href="somethingFile?act=Act2;var=X"></a><a href="somethingFile?act=Act3;var=Y"></a>'; $regex=',somethingFile\?act=([^"]+),'; $output=preg_replace_callback($regex,function($m){return 'somethingNew/'.str_replace(';','?',$m[1]);},$string); echo htmlentities($output).'<br />'; ?> Output: <a href="somethingNew/Act_One"></a><a href="somethingNew/Act2?var=X"></a><a href="somethingNew/Act3?var=Y"></a> This solution assumes there is only one variable (aside from act) in each url, conforming to your sample, i.e. not "?act=1;v1=x;v2=y". If you need multiple variables, it's a simple modification, just let me know. I may have missed something, so please let me know if I did or if you have any questions. Wishing you a fun weekend. [Edit: added "disclaimer" about the "?act=1;v1=x;v2=y" situation.]
  22. You're welcome, jmahdi! I can't help at all with the DomDocument, but if you have any questions about the regex above, I'll be happy to help.
  23. I'm glad you posted this full working example, xyph! For someone like me who has never used DOMDocument, it will be a great reference and tut explaining some benefits of that approach.
  24. Hi jmahdi, I'm glad you finally posted an example of what you are trying to do. requinix and xyph posted some very helpful counter-examples! Here is a pattern that will exclude green and maroon. $regex=',<font[^>]+?color="(?!green|maroon)[^"]+"[^>]*>([^>]+)</font>,'; The code below shows you how it works. Code: <?php $regex=',<font[^>]+?color="(?!green|maroon)[^"]+"[^>]*>([^>]+)</font>,'; $string=' <font class="example">This is okay</font> but <font color="red">this is red</font> <font color="blue">This is blue</font> but <font color="red">and this is red</font> <font color="red">weirdly nested tags <font color="blue">normal blue text</font> will be lost</font> <font color="red" size="12">This red is okay</font> but <font color="green" size="12">This green is toast</font> '; preg_match_all($regex, $string, $m, PREG_PATTERN_ORDER); $size=count($m[1]); for ($i=0;$i<$size;$i++) echo $m[1][$i]."<br />"; ?> Output: this is red This is blue and this is red normal blue text is fine This red is okay As you can see, the one situation (of the ones suggested so far) where the pattern does not capture the text is when colors are "strangely nested": <font color="red">weirdly nested tags <font color="blue">normal blue text is fine</font> will be lost</font> The blue is fine but the red is lost. Let me know if this is a problem. And maybe xyph and requinix will come up with other counter-examples. Let us know if you need more help!
  25. Could you please change my username to ragax? (all lower-case). Old: playful New: ragax Thank you.
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.