cursed Posted November 7, 2009 Share Posted November 7, 2009 Hey, so I have some code: othercontent... <!-- AddThis Button BEGIN --><br /><a href="http://www.addthis.com/bookmark.php" onclick="addthis_url = location.href; addthis_title = document.title; return addthis_click(this);" target="_blank"><img src="http://s7.addthis.com/button1-share.gif" width="125" height="16" border="0" alt="Bookmark and Share" /></a> <script type="text/javascript">var addthis_pub = '';</script><script type="text/javascript" src="http://s7.addthis.com/js/widget.php?v=10"></script> <br /><!-- AddThis Button END -->(other content) and my regex to match the code looks like: <!-- AddThis Button BEGIN -->[\s\S]*?<!-- AddThis Button END --> Why doesn't this work? Quote Link to comment https://forums.phpfreaks.com/topic/180690-solved-removing-between-html-comments-quick-q/ Share on other sites More sharing options...
cags Posted November 7, 2009 Share Posted November 7, 2009 Erm, it does work... You may wish to provide more details of what you think isn't working... Quote Link to comment https://forums.phpfreaks.com/topic/180690-solved-removing-between-html-comments-quick-q/#findComment-953326 Share on other sites More sharing options...
cursed Posted November 7, 2009 Author Share Posted November 7, 2009 I used RegexBuddy and RegexTester's website, they both say no match. A example of the code needing to be matched can be found at: http://the-palm-sound.blogspot.com/ (not my website, just randomly searched upon) Quote Link to comment https://forums.phpfreaks.com/topic/180690-solved-removing-between-html-comments-quick-q/#findComment-953337 Share on other sites More sharing options...
Alex Posted November 7, 2009 Share Posted November 7, 2009 Seems to work, http://www.rubular.com/regexes/11559 Quote Link to comment https://forums.phpfreaks.com/topic/180690-solved-removing-between-html-comments-quick-q/#findComment-953338 Share on other sites More sharing options...
cags Posted November 7, 2009 Share Posted November 7, 2009 It's a very bizarre pattern, generally you wouldn't use \S inside a character class. And if you wish to match basically anything just use the fullstop. If you need to match linebreaks just add the single line modifier (s). I can't speak for RegexBuddy or RegexTester, but if you copy that input string and that pattern and use preg_match, it finds the string. ~<!-- AddThis Button BEGIN -->.*?<!-- AddThis Button END -->~s Quote Link to comment https://forums.phpfreaks.com/topic/180690-solved-removing-between-html-comments-quick-q/#findComment-953340 Share on other sites More sharing options...
cursed Posted November 7, 2009 Author Share Posted November 7, 2009 Thanks guys, it seems to work fine now. I greatly appreciate the help. Quote Link to comment https://forums.phpfreaks.com/topic/180690-solved-removing-between-html-comments-quick-q/#findComment-953341 Share on other sites More sharing options...
Daniel0 Posted November 8, 2009 Share Posted November 8, 2009 generally you wouldn't use \S inside a character class. That's perfectly valid. You can add multiple character classes within a character class. In that case it'll act as the union of these (think of set theory in mathematics). Other flavors also support things like the intersection and difference. You might for instance do something like this: $name = 'Daniel'; var_dump(preg_match('/^[D\p{Ll}]+$/u', $name)); to match any names containing only unicode lowercase letters or capital latin 'D'. I know it's a crap example, but I couldn't think of anything better right now. Quote Link to comment https://forums.phpfreaks.com/topic/180690-solved-removing-between-html-comments-quick-q/#findComment-953556 Share on other sites More sharing options...
cags Posted November 8, 2009 Share Posted November 8, 2009 I didn't say it wasn't valid, I said it generally isn't used. I know full well it works, hence the fact that I said the OP's pattern does work when they claimed it didn't. I'm also well aware that multiple shorthand character classes can be used, and I have nothing against that, in your example none of the sets used are the negated versions of a set. As quoted from Regular-Expressions.info and I happen to agree with... Negated versions of the above. Should be used only outside character classes. (Can be used inside, but that is confusing.) Quote Link to comment https://forums.phpfreaks.com/topic/180690-solved-removing-between-html-comments-quick-q/#findComment-953595 Share on other sites More sharing options...
nrg_alpha Posted November 8, 2009 Share Posted November 8, 2009 Odds are, you probably won't see \S inside a class, but as Dan mentioned, it's perfectly valid (I know, you weren't saying it wasn't). I suppose it really boils down to what pattern the user chooses. By example, whether the pattern is \s or [^\S], both will match whitespace characters. On a side note, all character classes are in essence positive assertions, in that they must positively match something (even negated character classes- just that in that case, it must positively match something not listed). The trick here is to figure which way makes the regex engine work faster / more efficiently. As the expression goes, "Work smarter, not harder." definitely applies to regex. Given a benchmark test involving the above whitespace matching patterns, it isn't surprising to learn that \s is indeed faster than [^\S] (although it's perfectly acceptable / valid to use the latter - but I agree that it would be bizarre indeed). Quote Link to comment https://forums.phpfreaks.com/topic/180690-solved-removing-between-html-comments-quick-q/#findComment-953697 Share on other sites More sharing options...
Daniel0 Posted November 8, 2009 Share Posted November 8, 2009 it isn't surprising to learn that \s is indeed faster than [^\S] Did you actually benchmark that? I find it surprising that the engine doesn't realize they're identical. Quote Link to comment https://forums.phpfreaks.com/topic/180690-solved-removing-between-html-comments-quick-q/#findComment-953715 Share on other sites More sharing options...
nrg_alpha Posted November 8, 2009 Share Posted November 8, 2009 it isn't surprising to learn that \s is indeed faster than [^\S] Did you actually benchmark that? I find it surprising that the engine doesn't realize they're identical. Yeah, I did.. and there is a speed difference.. (granted, this is in a loop 5000 times).. on a single pass, we wouldn't perceive any difference whatsoever. While the end result is the same, I can only guess that the difference 'under the hood' so to speak is that one way (using \s), regex is checking to see if a character is a whitespace character, while the other way([^\S]), it ends up with two checks?; once for a non whitespace, then to see if the end result suite the negation (but I could be wrong here..it is only a guess). Whatever is actually happening under the hood is yeilding a difference in speed ( especially when doing a larger amount of loop iterations..) Quote Link to comment https://forums.phpfreaks.com/topic/180690-solved-removing-between-html-comments-quick-q/#findComment-953766 Share on other sites More sharing options...
nrg_alpha Posted November 8, 2009 Share Posted November 8, 2009 The code just in case: $loop = 5000; $start = gettimeofday(); for($a = 0; $a < $loop; $a++){ $str = 'The black cat sat in a hat!'; $str = preg_replace('#[^\S]#', '*', $str); } $final = gettimeofday(); $end = gettimeofday(); $sec = ($final['sec'] + $final['usec']/10000000) - ($start['sec'] + $start['usec']/10000000); printf("Result: %s - Time of executing [^\S]: %4f<br />\n\n", $str, $sec); $start = gettimeofday(); for($a = 0; $a < $loop; $a++){ $str = 'The black cat sat in a hat!'; $str = preg_replace('#\s#', '*', $str); } $final = gettimeofday(); $end = gettimeofday(); $sec = ($final['sec'] + $final['usec']/10000000) - ($start['sec'] + $start['usec']/10000000); printf("Result: %s - Time of executing \s: %4f<br />\n\n", $str, $sec); VS $loop = 5000; $time_start = microtime(true); for($a = 0; $a < $loop; $a++){ $str = 'The black cat sat in a hat!'; $str = preg_replace('#[^\S]#', '*', $str); } $time_end = microtime(true); $elapsed_time = round($time_end-$time_start, 4); printf("Result: %s - Time of executing [^\S]: %4f<br /> ", $str, $elapsed_time); $time_start = microtime(true); for($a = 0; $a < $loop; $a++){ $str = 'The black cat sat in a hat!'; $str = preg_replace('#\s#', '*', $str); } $time_end = microtime(true); $elapsed_time = round($time_end-$time_start, 4); printf("Result: %s - Time of executing \s: %4f<br /> ", $str, $elapsed_time); At least on my system, the first method (over many refresh tests) feels more or less split down the middle.. while the second one overall see \s with the edge (but there is still some flip-flopping). Quote Link to comment https://forums.phpfreaks.com/topic/180690-solved-removing-between-html-comments-quick-q/#findComment-953809 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.