The Little Guy Posted February 9, 2012 Share Posted February 9, 2012 I am writing a css highlighter, It works, except when doing something like this, where there is a colon inside the braces and outside the braces. the part outside the brace gets all messed up and displays the html. example (bottom of page): http://phplive.org/phpLive/examples/misc/highlight.php a.link:hover{ text-decoration: underline; } Here is what I have so far: $find = array( "/([a-zA-Z-].+?)(:)/", "/'.+?'/", "/".+?"/", "/([.#:>a-zA-Z0-9].+?)(\{)/", ); $replace = array( '<span style="color:#0000ff;font-weight:bold;">$1</span>$2', '<span style="color:#ce7b00;">$0</span>', '<span style="color:#ce7b00;">$0</span>', '<span style="color:#007c00;font-weight:bold;">$1</span>$2', ); $this->quickString = preg_replace($find, $replace, htmlentities($content, ENT_QUOTES)); What I am thinking of doing (for the first array parameter) is tell to only match that if it is between { and } otherwise ignore it, but I am not sure how to do that. How can I do that? If that isn't a good way to do it, do you have any better suggestions for me? Thanks! Quote Link to comment https://forums.phpfreaks.com/topic/256747-match-between/ Share on other sites More sharing options...
AyKay47 Posted February 9, 2012 Share Posted February 9, 2012 I have put together this regex using word boundaries to grab the desired text to replace. $str = "a.link:hover{text-decoration: underline; color: #222; font-weight: bold;}"; $pattern = '~[^.]\b([a-zA-Z-]+?)\b(~'; $replacement = '<span style="color:#0000ff;font-weight:bold;">$1</span>$2'; echo preg_replace($pattern,$replacement,$str); my only concern with this, is that it will remove the opening bracket of the CSS code, since it matches the word boundary, I am working on a solution for that, but this can get you going for now. Quote Link to comment https://forums.phpfreaks.com/topic/256747-match-between/#findComment-1316184 Share on other sites More sharing options...
The Little Guy Posted February 9, 2012 Author Share Posted February 9, 2012 Nice! That seems to do the trick! Few questions: 1. How does it know to grab between the braces? 2. what is causing it to remove the formatting, such as the tabs? Quote Link to comment https://forums.phpfreaks.com/topic/256747-match-between/#findComment-1316192 Share on other sites More sharing options...
AyKay47 Posted February 9, 2012 Share Posted February 9, 2012 1. well, this regex took a little bit of trial and error. I know that a word boundary (\b), if placed to the left of an alphanumeric character, will only match an alphanumeric character if a non-alphanumeric character is to the immediate left of it. same goes for placing a boundary on the right of an alpha-numeric character, it will only match if a non-alphanumeric character immediately follows an alphanumeric character. Now since the text that you want to replace will always be in between either a space, curly bracket, colon, or semi-colon, these are all non-alphanumeric characters, I knew that word boundaries would match only those cases. I had to add [^.] in the beginning of the regex so it would not match a.link, since the word boundary would see the the non-alphanumeric character period (.) followed by an alphanumeric character (l) and would match that case, which we do not want. 2. I believe the [^.] is grabbing the tab before the CSS string.. what you can do is remeber this character, and back reference it back into the replacement string. $str = "a.link:hover{ text-decoration: underline; color: #222; }"; $pattern = '~([^.,])\b([a-zA-Z-]+?)\b(~'; $replacement = '$1<span style="color:#0000ff;font-weight:bold;">$2</span>$3'; echo preg_replace($pattern,$replacement,$str); this should add the tab back into the string. Edit: Thinking about this, I have made the regex a little more robust to also allow commas, for multiple element CSS.. $str = "a.link:hover, a.link:active{ text-decoration: underline; color: #222; }"; $pattern = '~([^.,])\b([a-zA-Z-]+?)\b(~'; $replacement = '$1<span style="color:#0000ff;font-weight:bold;">$2</span>$3'; echo preg_replace($pattern,$replacement,$str); Quote Link to comment https://forums.phpfreaks.com/topic/256747-match-between/#findComment-1316204 Share on other sites More sharing options...
AyKay47 Posted February 9, 2012 Share Posted February 9, 2012 let me know if this has worked out for you. Quote Link to comment https://forums.phpfreaks.com/topic/256747-match-between/#findComment-1316283 Share on other sites More sharing options...
The Little Guy Posted February 9, 2012 Author Share Posted February 9, 2012 It seems to be working! Take a look: http://phplive.org/phpLive/examples/misc/highlight.php Let me know what you think. Little off topic: I can now style CSS within the HTML! Thanks for the help! Your awesome! Quote Link to comment https://forums.phpfreaks.com/topic/256747-match-between/#findComment-1316287 Share on other sites More sharing options...
AyKay47 Posted February 9, 2012 Share Posted February 9, 2012 It seems to be working! Take a look: http://phplive.org/phpLive/examples/misc/highlight.php Let me know what you think. Little off topic: I can now style CSS within the HTML! Thanks for the help! Your awesome! it looks to be working nicely! If I think of any improvements to add to the regex I will post them on this thread. Quote Link to comment https://forums.phpfreaks.com/topic/256747-match-between/#findComment-1316290 Share on other sites More sharing options...
ragax Posted February 9, 2012 Share Posted February 9, 2012 Hey guys! If I think of any improvements to add to the regex I will post them on this thread. Without reading the details, a couple of thoughts about the expression itself in the spirit of exploration and fine-tuning. (Nothing wrong with AyKay's expression!) $pattern = '~([^.,])\b([a-zA-Z-]+?)\b(~'; 1. You can drop the "lazy quantifier" (?), as there is no risk that the character class will ever roll over what follows (a word boundary and a colon). You can be greedy here, the engine will work a little faster as lazy matching involves checking ahead and backtracking. 2. I've read that case insensitive is a little faster than [a-zA-Z], not that you would notice the difference if you ran the code a million times. 3. You could make the quantifier possessive by adding a plus, it will fail a little faster. With those in, you get: $pattern = '~([^.,])\b([a-z-]++)\b(~i'; The word boundaries are forcing the string in [a-z-]+ to start and end with a letter (it cannot start or end with a dash). Assuming this is what you want. I haven't read the thread in detail so I don't know how the regex performs for the task at hand. These are just optional tweaks for the regex itself (which is already a fine regex as it is). Wishing you all a fun weekend! Quote Link to comment https://forums.phpfreaks.com/topic/256747-match-between/#findComment-1316312 Share on other sites More sharing options...
The Little Guy Posted February 9, 2012 Author Share Posted February 9, 2012 Taking this CSS: p > a{ color: red; } and this Regex: /([.#:>\-_, a-zA-Z0-9 ]+?)(\{)/ the p > doesn't get highlighted, but the a does get highlighted. Any thoughts why? Quote Link to comment https://forums.phpfreaks.com/topic/256747-match-between/#findComment-1316331 Share on other sites More sharing options...
The Little Guy Posted February 9, 2012 Author Share Posted February 9, 2012 I just realized I perform htmlentities first, so somehow I need to add > in the character class, but how? Quote Link to comment https://forums.phpfreaks.com/topic/256747-match-between/#findComment-1316336 Share on other sites More sharing options...
ragax Posted February 9, 2012 Share Posted February 9, 2012 Hi TLG, Walking out the door to go hiking, but wanted to give you a quick answer: find a table of html characters, find the ascii for > Let's say it's 65 (it's not), then in the character class you can use \x65. If that doesn't work it's probably an encoding story, you'll need the u for unicode at the end of the pattern and someone should be able to help you. For unicode what you put in the class looks like this. \x{201A} (wrong code though) Quote Link to comment https://forums.phpfreaks.com/topic/256747-match-between/#findComment-1316341 Share on other sites More sharing options...
AyKay47 Posted February 10, 2012 Share Posted February 10, 2012 Taking this CSS: p > a{ color: red; } and this Regex: /([.#:>\-_, a-zA-Z0-9 ]+?)(\{)/ the p > doesn't get highlighted, but the a does get highlighted. Any thoughts why? this: $str = "p > a{ color: red; }"; $pattern = '~([.#:>\-_,a-z0-9 ]+)({)~i'; $replacement = '<span style="color:#333;font-weight:bold;">$1</span>$2'; echo preg_replace($pattern,$replacement,$str); works for me (tweaked it a tad). Edit: edited for > (you can also use the hex value is you wish, which would be \x{003E}, as playful suggested, but make sure the u modifier is appended to the regex) $pattern = '~([.#:>\-_,a-z0-9& ]+)({)~i'; Quote Link to comment https://forums.phpfreaks.com/topic/256747-match-between/#findComment-1316367 Share on other sites More sharing options...
AyKay47 Posted February 10, 2012 Share Posted February 10, 2012 edit to the above post, the last regex is not meant to be there, my mistake... Quote Link to comment https://forums.phpfreaks.com/topic/256747-match-between/#findComment-1316371 Share on other sites More sharing options...
abareplace Posted February 10, 2012 Share Posted February 10, 2012 The Little Guy, you should use a lexer here. There are too many edge cases where regexes will not work. highlight.js is a nice ready-to-use highlighter that does lexical analysis. Quote Link to comment https://forums.phpfreaks.com/topic/256747-match-between/#findComment-1316384 Share on other sites More sharing options...
The Little Guy Posted February 10, 2012 Author Share Posted February 10, 2012 The Little Guy, you should use a lexer here. There are too many edge cases where regexes will not work. highlight.js is a nice ready-to-use highlighter that does lexical analysis. I am not making a website that needs this, I am building a php library that has a highlight function in it (library link in signature, still in alpha stages but still very powerful). Quote Link to comment https://forums.phpfreaks.com/topic/256747-match-between/#findComment-1316707 Share on other sites More sharing options...
kicken Posted February 10, 2012 Share Posted February 10, 2012 Regardless of if your making a site or a library, regex is not the right tool for the job here. You should essentially parse the CSS codes into tokens then apply the formatting. How accurate your parser needs to be can depend on how accurate you want your highlighting. A simple parser that will separate out selectors, properties and values shouldn't be too hard to do to start with. Quote Link to comment https://forums.phpfreaks.com/topic/256747-match-between/#findComment-1316839 Share on other sites More sharing options...
The Little Guy Posted February 10, 2012 Author Share Posted February 10, 2012 Regardless of if your making a site or a library, regex is not the right tool for the job here. You should essentially parse the CSS codes into tokens then apply the formatting. I'm not quite sure what you mean here, could you explain? What do you mean by "parse the CSS codes into tokens"? Edit: After reading Wikipedia, it sounds like your saying make a dictionary, and highlight according to the dictionary. Quote Link to comment https://forums.phpfreaks.com/topic/256747-match-between/#findComment-1316846 Share on other sites More sharing options...
kicken Posted February 11, 2012 Share Posted February 11, 2012 What do you mean by "parse the CSS codes into tokens"? You break it down into it's fundamental parts using a parser script (aka a lexer) For example (quote for color): p > a { color: red; } p > a is a selector token { is a begin ruleset token color is a property name token red is a property value token ; is a end statement token } is a end ruleset token You would create a lexer that will break the css string down into tokens like that, then you can re-assemble the string from the tokens while applying whatever coloring or formatting you need around each token value. Quote Link to comment https://forums.phpfreaks.com/topic/256747-match-between/#findComment-1316853 Share on other sites More sharing options...
kicken Posted February 11, 2012 Share Posted February 11, 2012 Super simplistic example: <?php function tokenizeCss($str){ $tokens=array(); $len=strlen($str); $i=0; $state='selector'; $newState=null; $tokenValue=''; while ($i<$len){ $ch = $str[$i]; switch ($ch){ case '{': $tokens[] = array('type' => $state, 'value' => $tokenValue); $tokens[] = array('type' => 'ruleset-begin', 'value' => '{'); $state='ruleset'; $tokenValue=''; break; case '}': $tokens[] = array('type' => $state, 'value' => $tokenValue); $tokens[] = array('type' => 'ruleset-end', 'value' => '}'); $state='selector'; $tokenValue=''; break; default: $tokenValue .= $ch; } $i++; } if (!empty($tokenValue)){ $tokens[] = array('type' => $state, 'value' => $tokenValue); } return $tokens; } $css = ' p > a{ color: red; } a.link:hover{ text-decoration: underline; } '; $tokens = tokenizeCss($css); $colors=array( 'selector' => 'red', 'ruleset' => 'blue', 'ruleset-begin' => 'orange', 'ruleset-end' => 'orange' ); foreach ($tokens as $tok){ $color = $colors[$tok['type']]; echo '<span style="color: '.$color.';">'.$tok['value'].'</span>'; } Quote Link to comment https://forums.phpfreaks.com/topic/256747-match-between/#findComment-1316856 Share on other sites More sharing options...
The Little Guy Posted February 11, 2012 Author Share Posted February 11, 2012 Okay! Thank you! I will have to mess around with that. seems fairly simple! Quote Link to comment https://forums.phpfreaks.com/topic/256747-match-between/#findComment-1316864 Share on other sites More sharing options...
The Little Guy Posted February 11, 2012 Author Share Posted February 11, 2012 Okay, what would you do for something like a comment, when you need to check 2 characters? Quote Link to comment https://forums.phpfreaks.com/topic/256747-match-between/#findComment-1316889 Share on other sites More sharing options...
The Little Guy Posted February 11, 2012 Author Share Posted February 11, 2012 Here is what I have: http://phplive.org/phpLive/test.php <?php function tokenizeCss($str){ $tokens=array(); $len=strlen($str); $i=0; $state='selector'; $prevState = ""; $newState=null; $tokenValue=''; $commenting = false; $value = false; while ($i<$len){ switch ($str[$i]){ case '{': if(!$commenting){ $tokens[] = array('type' => $state, 'value' => $tokenValue); $tokens[] = array('type' => 'ruleset-begin', 'value' => '{'); $state='ruleset'; $tokenValue=''; }else{ $tokenValue .= $str[$i]; } break; case '}': if(!$commenting){ $tokens[] = array('type' => $state, 'value' => $tokenValue); $tokens[] = array('type' => 'ruleset-end', 'value' => '}'); $state='selector'; $tokenValue=''; }else{ $tokenValue .= $str[$i]; } break; default: if($str[$i] == ":" && !$commenting && $state == "ruleset"){ $value = true; $state = "value"; $tokens[] = array('type' => "ruleset", 'value' => $tokenValue.":"); $tokenValue = ""; } if($str[$i].$str[$i+1] == "/*" && !$commenting){ $commenting = true; $prevState = $state; $state = "comment"; } if($str[$i].$str[$i+1] == "*/" && $commenting){ $commenting = false; $tokens[] = array('type' => $state, 'value' => $tokenValue."*/"); $state = $prevState; $tokenValue = ""; $i++; }else{ if($prevState == "value" && $str[$i] == ";" && !$value){ $value = false; $tokens[] = array('type' => $prevState, 'value' => $tokenValue); $tokenValue = ""; $state = "ruleset"; }else{ if($str[$i] == ":" && $state == "value"){ //Removes extra colon in value }else $tokenValue .= $str[$i]; } } } $i++; } if (!empty($tokenValue)){ $tokens[] = array('type' => $state, 'value' => $tokenValue); } return $tokens; } $css = 'p > a{ color: red; } a.link:hover{ text-decoration: underline; } /* a.link:hover{ text-decoration: underline; } */ p > a{ color: red; } a.link{ /*text-decoration: underline;*/ text-decoration: none; } '; $tokens = tokenizeCss($css); //print_r($tokens); $styles = array( 'selector' => 'font-weight: bold;color: #007c00', 'ruleset' => 'color: #0000ff;', 'ruleset-begin' => 'orange', 'ruleset-end' => 'orange', 'comment' => 'color: #999999;font-style: italic;', 'value' => 'color: #00ff00;' ); echo "<pre>"; foreach ($tokens as $tok){ $style = $styles[$tok['type']]; echo '<span style="'.$style.'">'.$tok['value'].'</span>'; } echo "</pre>"; Quote Link to comment https://forums.phpfreaks.com/topic/256747-match-between/#findComment-1317075 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.