Jump to content

ragax

Members
  • Posts

    186
  • Joined

  • Last visited

Everything posted by ragax

  1. You're welcome, Jay had it right, wishing you a fun weekend.
  2. Running this, I'm not seeing a difference. $start=time(); $string='http://www.whatever.com?12()$%^*&'; for ($i=0;$i<1000000;$i++) $url=preg_replace('#[^/[:alnum:]_-]#','',$string); $lap=time(); for ($i=0;$i<1000000;$i++) $url=preg_replace('#[^/0-9a-zA-Z_-]#','',$string); $end=time(); $time1= $lap - $start; $time2= $end - $lap; echo $time1."<br />"; echo $time2."<br />"; Output: 6 6
  3. Hi Julius, In agreement with Jay. A touch more compact than what you have at the moment: $url = preg_replace('#[^/[:alnum:]_-]#','',$url); Wishing you a fun weekend. [Edit: I had forgotten the hyphen and underscore]
  4. @jmahdi, apart from xyph's suggestion, if you really want to see how to do this with regex, you need to post sample text, a sample of what you want to grab, and a sample of what you don't want to grab. Please see the post about how to ask a regex question---it is just too much work to have to guess exactly what people want to match, replace etc.
  5. Glad to hear it, Codemunkie, you're very welcome.
  6. ragax

    going insane!

    Hi Rifts! Interesting, you have exactly the same problem as Codemunkie on the other post from today. This is an old, classic problem (which I've more fully described on my page about greedy and lazy quantifiers). Basically, your dot-star (.*) before "td" is "greedy": it will cause the regex engine to match every single character until the end of the string. Then, to match "td", the engine will roll back character by character until it finds a "td", which will be the very last "td" on your string. This means the match may swallow many tags in a row. If you want to use a dot-star for this task (and there are multiple ways of doing it), that is fine, but you need to make the star lazy by adding a question mark. Here is some working code: Code: <?php $string=' <td class="word fixed_text">HERE</td> <td class="word fixed_text">HERE 2</td> <td class="word fixed_text">HERE TOO</td> '; $regex=',class="word fixed_text">(.*?)</td>,'; preg_match_all($regex, $string, $matches, PREG_PATTERN_ORDER); $sz=count($matches[1]); for ($i=0;$i<$sz;$i++) echo $matches[1][$i]."<br />"; ?> Output: HERE HERE 2 HERE TOO Please let me know if you have any questions!
  7. Hi Codemunkie! Here is some working code for you: Code: <?php $string=' class="mw-redirect">Ahri</a></span></span></span> class="mw-redirect">Galio</a></span></span></span> class="mw-redirect">Graves</a></span></span></span> class="mw-redirect">Katarina</a></span></span></span> class="mw-redirect">Kog\'Maw</a></span></span></span> class="mw-redirect">LeBlanc</a></span></span></span> class="mw-redirect">Shen</a></span></span></span> class="mw-redirect">Skarner</a></span></span></span> class="mw-redirect">Soraka</a></span></span></span> class="mw-redirect">Warwick</a></span></span></span> '; $regex='#class="mw-redirect">(.*?)</a>#'; preg_match_all($regex, $string, $matches, PREG_PATTERN_ORDER); $sz=count($matches[1]); for ($i=0;$i<$sz;$i++) echo $matches[1][$i]."<br />"; ?> Output: Ahri Galio Graves Katarina Kog'Maw LeBlanc Shen Skarner Soraka Warwick The tweaks: - you needed a loop to iterate through the captures - I made your dot-star (.*) lazy (the question mark). (Here is my page on greedy and lazy quantifiers if you'd like to read up on them.) Otherwise, the "greedy star" would cause the regex engine to roll all the way down to the end of the string, then to backtrack to the last </a> etc, giving you a much longer capture than you want. Let me know if you have any questions.
  8. Pikachu is right, but just for the record, a regex version: <?php $regex=',[^[:alnum:]],'; echo preg_match($regex,"almost_alnum"); ?> The output is 1 because of the underscore in almost_alnum.
  9. Actually, when I run the code, everything is transformed. One small change, though. In the second replace, I used the same replacement string as in the third (copy-paste). That's a mistake, as the url in the second replacement is already formed. We need to drop the http from the replacement string: <a href="\0">\0</a> This gives us: $x=preg_replace(',(?<!=")(?:http|ftp|file)://(?>[^<>[:space:]]+[[:alnum:]/])(?!</a),i','<a href="\0">\0</a>',$x); Now here's what happens if we run that ugly string of yours through these regexes: Code: <?php $x='<p><span style="color: #222222; font-family: arial, sans-serif; font-size: 13px; line-height: normal; background-color: rgba(255, 255, 255, 0.917969);">View the reply at:&nbsp;</span><a style="color: #1155cc; font-family: arial, sans-serif; font-size: 13px; line-height: normal; background-color: rgba(255, 255, 255, 0.917969);" href="http://www.phpfreaks.com/forums/index.php?topic=354157.new;topicseen#new" target="_blank">http://www.phpfreaks.com/forums/index.php?topic=354157.new;topicseen#new<br /><br /></a>adsdsadsa<br /><br />http://link1.com/<br /><br />www.link2.com<br /><br />&nbsp;</p>'; $x=preg_replace(',(?<!//)www\.(?>[^<>[:space:]]+[[:alnum:]/])(?!</a),i','<a href="http://\0">\0</a>',$x); $x=preg_replace(',(?<!=")(?:http|ftp|file)://(?>[^<>[:space:]]+[[:alnum:]/])(?!</a),i','<a href="\0">\0</a>',$x); echo htmlentities($x).'<br /><br />'; ?> Output: <p><span style="color: #222222; font-family: arial, sans-serif; font-size: 13px; line-height: normal; background-color: rgba(255, 255, 255, 0.917969);">View the reply at:&nbsp;</span><a style="color: #1155cc; font-family: arial, sans-serif; font-size: 13px; line-height: normal; background-color: rgba(255, 255, 255, 0.917969);" href="<a href="http://www.phpfreaks.com/forums/index.php?topic=354157.new;topicseen#new&quot">http://www.phpfreaks.com/forums/index.php?topic=354157.new;topicseen#new&quot</a>; target="_blank"><a href="http://www.phpfreaks.com/forums/index.php?topic=354157.new;topicseen#new<br">http://www.phpfreaks.com/forums/index.php?topic=354157.new;topicseen#new<br</a> /><br /></a>adsdsadsa<br /><br /><a href="http://link1.com/<br">http://link1.com/<br</a> /><br /><a href="http://www.link2.com<br">www.link2.com<br</a> /><br />&nbsp;</p> Unless I've missed something, everything has been replaced. Now, it's also true that some weird elements have been converted into links, e.g. the end of www.link2.com<br 1. This is the nature of the original regex you provided: the [^<>[:space:]]+ will eat all kinds of characters. As I mentioned, all I did was add checks to the effect that the replaced urls are not already part of existing formed links, as you requested. 2. I am not sure whether this needs to be fixed, because I don't know if you are really applying the regex to that ugly string, or if it was just an "escaped" version that you pasted in your last post for some reason. Warmest wishes,
  10. Hi again Andy, also: Just so you know, ereg_replace is deprecated. In your code above, in the first replacement (the mailto), without looking at the regex itself, you should be able to substitute preg_replace where it says ereg_replace. Wishing you a fun day
  11. Hey Andy, For the second replacement line, we'll have to be a more specific than the [[:alpha:]]+ before the // and specify a protocol. Other than that, I assumed you're happy with the way the url is matched (one way in a million) and only added code to make sure you're not already part of a linked url. This gives us the following (to replace your second replacement line): $x=preg_replace(',(?<!=")(?:http|ftp|file)://(?>[^<>[:space:]]+[[:alnum:]/])(?!</a),i','<a href="http://\0">\0</a>',$x); It checks that the url is not preceded by =" and not followed by </a> For the third replacement line, that line already checks that the www is not preceded by //, taking care of the "not preceded by" check. Adding a check for not followed by </a>, you get: $x=preg_replace(',(?<!//)www\.(?>[^<>[:space:]]+[[:alnum:]/])(?!</a),i','<a href="http://\0">\0</a>',$x); I checked those by running the replacement twice on sample urls to make sure they only get replaced once: Code: <?php $x='www.google.com/test'; $y='<a href="http://www.google.com?q=f">http://www.google.com?q=f</a>'; for($i=1;$i<3;$i++) { $x=preg_replace(',(?<!//)www\.(?>[^<>[:space:]]+[[:alnum:]/])(?!</a),i','<a href="http://\0">\0</a>',$x); echo htmlentities($x).'<br />'; } for($i=1;$i<3;$i++) { $y=preg_replace(',(?<!=")(?:http|ftp|file)://(?>[^<>[:space:]]+[[:alnum:]/])(?!</a),i','<a href="http://\0">\0</a>',$y); echo htmlentities($y).'<br />'; } ?> Output: <a href="http://www.google.com/test">www.google.com/test</a> <a href="http://www.google.com/test">www.google.com/test</a> <a href="http://www.google.com?q=f">http://www.google.com?q=f</a> <a href="http://www.google.com?q=f">http://www.google.com?q=f</a> Again, there are a million ways of matching urls and I am only modifying what you have. Let me know if you have any questions or problems.
  12. On the other hand, sometimes the silence can be quite deep.
  13. Hi Andy, Since there are a million ways to match a url, here is a solution that uses the same way you have in your original code. Just add this line before the return statement. It matches lines that have www dot, unless they are preceded by //, as those lines have already been turned into links by your earlier regex. $x=preg_replace(',(?<!//)www\.[^<>[:space:]]+[[:alnum:]/],i','<a href="http://\0">\0</a>',$x); This is only one in a thousand ways to do this. Please let me know if you need more help with it.
  14. Hi AyKay, hi Salathe, Just because I have nothing better to do before breakfast this morning and I love regex puzzles, here is an answer that addresses nesting. Input: Replace 37.2 or $10 not {5} nor {a 6} nor {{7}} nor {an {8} or {{9}}!} but {unbalanced 10 Code: <?php $string='Replace 37.2 or $10 not {5} nor {a 6} nor {{7}} nor {an {8} or {{9}}!} but {unbalanced 10'; $regex=',([^{]++)({(?[^{}]*+)(??-2)(?-1))*)})?,'; echo preg_replace_callback($regex,'call_me',$string); function call_me ($m) { $cleaned=preg_replace(',\d+(?:\.\d+)?,','{NUMBER:\0}',$m[1]); $donttouch=isset($m[2])?$m[2]:''; return $cleaned.$donttouch; } ?> Output: Replace {NUMBER:37.2} or ${NUMBER:10} not {5} nor {a 6} nor {{7}} nor {an {8} or {{9}}!} but {unbalanced {NUMBER:10} A note for regex lovers: the recursive part of the expression is home-made, rather than the one everyone copies from Jeffrey's book. In my benchmarks, it matches about 20 percent faster, but fails about twenty percent slower. For my taste Aykay's solution is still a perfect answer until the OP gives signs of life. An expression like the one in this post is a mixed blessing: it works, but it's hard to explain, and even harder to maintain if the person who receives it doesn't fully understand it. Wishing you all a beautiful day!
  15. Wow, a fun little regex puzzle. I do see both of you guys' points. From an overall "web forum contribution ethos" standpoint, I confess that I'd approach the question like Aykay... Trying to keep it simple first before pulling out the big guns... But everyone is different, as can be seen by the difference in style and depth of responses on this very forum. For my own taste in regex, I found Aykay's solution elegant and relatively lightweight. (But taste is a personal matter.) If you wanted to kick yourself in the shins and go "all the way" just for the hell of it, that would be a bit of a mission, wouldn't it? Without fully exploring the question, I imagine that I'd have to cut the string into chunks, using a recursive pattern to identify (and leave out) the bits with curly braces. Thanks for the chance to rave about the choices we make when we answer questions. When the OP disappears, as often happens, it can feel like speaking in the dark... Unless someone else from the forum chimes in. Which happens all the time, and I appreciate that. (Thanks guys.) Wishing you all a fun day
  16. You're welcome, glad it works. And sorry for taking you in another direction originally, I too missed it on first glance.
  17. Sorry, I missed the original problem. Use $twitter=preg_replace( etc That will fix it. Here is a working example. <?php $twitter = '@whatever'; if (preg_match("/^@/", $twitter)) { $twitter=preg_replace('/@/','twitter.com/',$twitter); } else { $twitter = "twitter.com/".$twitter; } echo $twitter.'<br />'; ?>
  18. Cool, glad to hear you figured it out.
  19. Hi livethedead, It sounds like it could be an encoding problem, where what you see is not what you think it is. Before your preg_match, try running this to see the actual components of the string. If there is something odd, it should jump out. for($i=0;$i<strlen($twitter);$i++) echo "{{$twitter[$i]}}" . ord($twitter[$i]) . " <br />\n";
  20. Hi TLG, Walking out the door to go hiking, but wanted to give you a quick answer: find a table of html characters, find the ascii for > Let's say it's 65 (it's not), then in the character class you can use \x65. If that doesn't work it's probably an encoding story, you'll need the u for unicode at the end of the pattern and someone should be able to help you. For unicode what you put in the class looks like this. \x{201A} (wrong code though)
  21. Hi Smerny, Try this: Code: <?php $string='"vid":"066U0000000UG0I" "blahblah":"blah""csrf":"XpN.tQFYKrcAay1y6N1kSkg01QU.z9z2iV03dP_ukwA3SHZJ. uslyqrth.nmX_gAQDt1U.k4Vui3uinpULS.MjKVrXX8ifrU9h Z8MqxaCBau7uxhzKcJttctsXkyfdRus2BQtHr8g.u2v_nDOCP GWCgIvY4=" __sfdcSessionId = \'00DU0000000HgD9!ARgAQMPKwJ6Q.kqAWj4M0ikwvii9RTnvxGMD4mw3BV9VIT9xs 3ywp6.TwCEet6s8rU.f7lKMLl8AjJ9D_cyDSkllEJh783ux\';'; $regex[0]=',"vid":"([^"]+),'; $regex[1]=',"csrf":"([^"]+),'; $regex[2]=",__sfdcSessionId\s*=\s*'([^']+),"; foreach ($regex as $r) { preg_match($r, $string, $m); echo $m[1].'<br />'; } ?> Output: 066U0000000UG0I XpN.tQFYKrcAay1y6N1kSkg01QU.z9z2iV03dP_ukwA3SHZJ. uslyqrth.nmX_gAQDt1U.k4Vui3uinpULS.MjKVrXX8ifrU9h Z8MqxaCBau7uxhzKcJttctsXkyfdRus2BQtHr8g.u2v_nDOCP GWCgIvY4= 00DU0000000HgD9!ARgAQMPKwJ6Q.kqAWj4M0ikwvii9RTnvxGMD4mw3BV9VIT9xs 3ywp6.TwCEet6s8rU.f7lKMLl8AjJ9D_cyDSkllEJh783ux Pls let me know if this works for you. If it is not matching in some cases, it means you have variation in your format (e.g. extra spaces, or different delimiters). Just post these problem cases, and I (or whoever is watching the board at that time) should be able to fix it.
  22. Hey guys! Without reading the details, a couple of thoughts about the expression itself in the spirit of exploration and fine-tuning. (Nothing wrong with AyKay's expression!) $pattern = '~([^.,])\b([a-zA-Z-]+?)\b(~'; 1. You can drop the "lazy quantifier" (?), as there is no risk that the character class will ever roll over what follows (a word boundary and a colon). You can be greedy here, the engine will work a little faster as lazy matching involves checking ahead and backtracking. 2. I've read that case insensitive is a little faster than [a-zA-Z], not that you would notice the difference if you ran the code a million times. 3. You could make the quantifier possessive by adding a plus, it will fail a little faster. With those in, you get: $pattern = '~([^.,])\b([a-z-]++)\b(~i'; The word boundaries are forcing the string in [a-z-]+ to start and end with a letter (it cannot start or end with a dash). Assuming this is what you want. I haven't read the thread in detail so I don't know how the regex performs for the task at hand. These are just optional tweaks for the regex itself (which is already a fine regex as it is). Wishing you all a fun weekend!
  23. Hi Silkfire, hi abareplace! Silkfire, that's a really fun trick, thank you for sharing it. Exploring your idea, I'm sure you know this, but for the "knowledge base", your array_pop(explode('-', $s1)), $id) can be accomplished in regex. You would use something like this at the beginning of the pattern, making the regex eat up all sequences of non-dashes followed by a dash. (?:[^-]*+-)*+ In the original code, this would give us: <?php $s1='http://domain.com/xxx-c-39.html'; $s2='http://domain.com/xxx-c-38_107.html'; $regex=',(?:[^-]*+-)*+(\d+)(?:_(\d+))?,'; preg_match($regex,$s1,$m); echo $m[1].'<br />'; preg_match($regex,$s2,$m); echo $m[1].'<br />'; echo $m[2].'<br />'; ?> A few considerations for the knowledge base: - If you're using this idea (either in this format or Silkfire's), you must know in advance that the stem of the string contains the right domain (as it is no longer validated by the regex). - For the array version, you may need an ini_set('display_errors', 0); as PHP 5.3.8 complains like so: "Strict Standards: Only variables should be passed by reference". - The array version will take twice as long to run, but you'll only notice if you run it 100,000 times. Silkfire, thank you for the stimulating idea, it's fun to explore options. Wishing you all a fun day
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.