Jump to content


Photo

index everything between < and > but ignoring php, style and script


  • Please log in to reply
5 replies to this topic

#1 superaktieboy

superaktieboy
  • Members
  • PipPipPip
  • Advanced Member
  • 32 posts

Posted 29 October 2006 - 09:10 AM

hi

i have a code like this (selected from a Database, so is different as this, as it is posted by users)
<?
echo 'this is php higlighted';
?>

<html>
this is as html highlighted
</html>

script language="text/javascript" (with < and > but coz that posting problem)
// this is javascript highlighted
/script (with < and > but coz that posting problem)

<style>
/*
this is css highlighted
*/
</style>

now i got for each highlighter it's own function called: highlight_php, highlight_html, highlight_js and highlight_css and next to that i got a function called 'highlight($input, $unparse = false)'

now i use this to highlight html
	preg_match_all('#(<.*?>)#is', $highlighted, $matches);
	for ($i = 0; $i <= count($matches[0]); $i++)
	{
		$replacer = highlight_html($matches[0][$i]);
		$highlighted = str_replace($matches[0][$i], $replacer, $highlighted);
	}

this is the function highlight_html():
function highlight_html($input)
{
	/**
	* These three array will make it easyer to preg_replace to colors
	*/
	$commands_boldblue = array
	(
		'#(\ba\b)#', '#(\babbr\b)#', '#(\babove\b)#', '#(\bacronym\b)#', '#(\baddress\b)#', '#(\bapplet\b)#', '#(\barea\b)#',
		'#(\barray\b)#', '#(\bb\b)#', '#(\bbase\b)#', '#(\bbdo\b)#', '#(\bbgsound\b)#', '#(\bbig\b)#', '#(\bxmp\b)#',
		'#(\bblink\b)#', '#(\bblockquote\b)#', '#(\bbody\b)#', '#(\bbox\b)#', '#(\bbr\b)#', '#(\bbutton\b)#', '#(\bcaption\b)#',
		'#(\bcenter\b)#', '#(\bcite\b)#', '#(\bcode\b)#', '#(\bcol\b)#', '#(\bcolgroup\b)#', '#(\bcomment\b)#', '#(\bdd\b)#',
		'#(\bdel\b)#', '#(\bdfn\b)#', '#(\bdir\b)#', '#(\bdiv\b)#', '#(\bdl\b)#', '#(\bdoctype\b)#', '#(\bdt\b)#', '#(\bem\b)#',
		'#(\bembed\b)#', '#(\bfieldset\b)#', '#(\bfig\b)#', '#(\bform\b)#', '#(\bframe\b)#', '#(\bframeset\b)#', '#(\btfoot\b)#',
		'#(\bh\b)#', '#(\bh1\b)#', '#(\bh2\b)#', '#(\bh3\b)#', '#(\bh4\b)#', '#(\bh5\b)#', '#(\bh6\b)#', '#(\bhead\b)#',
		'#(\bhr\b)#', '#(\bhta\b)#', '#(\bhtml\b)#', '#(\bi\b)#', '#(\biframe\b)#', '#(\bimg\b)#', '#(\binput\b)#', '#(\bins\b)#',
		'#(\bisindex\b)#', '#(\bkbd\b)#', '#(\blabel\b)#', '#(\blegend\b)#', '#(\bli\b)#', '#(\blink\b)#', '#(\blisting\b)#',
		'#(\bmap\b)#', '#(\bmarquee\b)#', '#(\bmenu\b)#', '#(\bmeta\b)#', '#(\bmulticol\b)#', '#(\bnextid\b)#', '#(\bnobr\b)#',
		'#(\bnoframes\b)#', '#(\bnoscript\b)#', '#(\bnote\b)#', '#(\bol\b)#', '#(\boptgroup\b)#', '#(\boption\b)#', '#(\bp\b)#',
		'#(\bparam\b)#', '#(\bplaintext\b)#', '#(\bpre\b)#', '#(\bq\b)#', '#(\brange\b)#', '#(\broot\b)#', '#(\bs\b)#',
		'#(\bsamp\b)#', '#(\bselect\b)#', '#(\bsmall\b)#', '#(\bsound\b)#', '#(\bspacer\b)#', '#(\btbody\b)#', '#(\btd\b)#',
		'#(\bsqrt\b)#', '#(\bstrike\b)#', '#(\bstrong\b)#', '#(\bsub\b)#', '#(\bsup\b)#', '#(\btext\b)#', '#(\btextarea\b)#',
		'#(\btable\b)#', '#(\btextflow\b)#', '#(\bth\b)#', '#(\bthead\b)#', '#(\btitle\b)#', '#(\btr\b)#', '#(\btt\b)#',
		'#(\bu\b)#', '#(\bul\b)#', '#(\bvar\b)#', '#(\bwbr\b)#',
		/**
		* we replaced the text 'span' in 'c3Bhbg' by using
		* the function base64_encode().
		* so the next text is 'span'
		*/
		'#(\bc3Bhbg\b)#',
		/**
		* we replaced the text 'style' in 'c3R5bGU' by using
		* the function base64_encode().
		* so the next text is 'style'
		*/
		'#(\bc3R5bGU\b)#',
		/**
		* we replaced the text 'font' in 'Zm9udA' by using
		* the function base64_encode().
		* so the next two texts are
		* [1] basefont
		* [2] font
		*/
		'#(\bbaseZm9udA\b)#',
		'#(\bZm9udA\b)#'
	);
	
	$commands_red = array
	(
		'#(\baccept\b)#', '#(\baccesskey\b)#', '#(\baction\b)#', '#(\balign\b)#', '#(\balink\b)#', '#(\balt\b)#',
		'#(\bapplicationname\b)#', '#(\barchive\b)#', '#(\baxis\b)#', '#(\bbackground\b)#', '#(\bbehavior\b)#', '#(\bbelow\b)#',
		'#(\bcellpadding\b)#', '#(\bbgproperties\b)#', '#(\bborder\b)#', '#(\bcellspacing\b)#', '#(\bchar\b)#',
		'#(\bcharoff\b)#', '#(\bcharset\b)#', '#(\bchecked\b)#', '#(\bclass\b)#', '#(\bclassid\b)#', '#(\bclear\b)#',
		'#(\bcodebase\b)#', '#(\bcodetype\b)#', '#(\bcols\b)#',  '#(\bcompact\b)#', '#(\bhttp-equiv\b)#', '#(\bhttp-equiv\b)#',
		'#(\bcontent\b)#', '#(\bcoords\b)#', '#(\bdata\b)#', '#(\bdatetime\b)#', '#(\bdeclare\b)#', '#(\bdefer\b)#',
		'#(\bdirection\b)#', '#(\bdisabled\b)#', '#(\bdynsrc\b)#', '#(\benctype\b)#', '#(\bequiv\b)#', '#(\bface\b)#',
		'#(\bfor\b)#', '#(\bframeborder\b)#', '#(\bframespacing\b)#', '#(\bgutter\b)#', '#(\bheaders\b)#', '#(\bheight\b)#',
		'#(\bhref\b)#', '#(\bhreflang\b)#', '#(\bhspace\b)#', '#(\bicon\b)#', '#(\bid\b)#', '#(\bismap\b)#',
		'#(\blanguage\b)#', '#(\bleftmargin\b)#', '#(\blongdesc\b)#', '#(\bloop\b)#', '#(\blowsrc\b)#', '#(\bmarginheight\b)#',
		'#(\bmarginwidth\b)#', '#(\bmaximizebutton\b)#', '#(\bmaxlength\b)#', '#(\bmedia\b)#', '#(\bmethod\b)#', '#(\bmethods\b)#',
		'#(\bminimizebutton\b)#', '#(\bmultiple\b)#', '#(\bname\b)#', '#(\bnohref\b)#', '#(\bnoresize\b)#', '#(\bnoshade\b)#',
		'#(\bnowrap\b)#', '#(\bobject\b)#', '#(\bonabort\b)#', '#(\bonblur\b)#', '#(\bonchange\b)#', '#(\bonclick\b)#',
		'#(\bondblclick\b)#', '#(\bonfocus\b)#', '#(\bonkeydown\b)#', '#(\bonkeypress\b)#', '#(\bonkeyup\b)#', '#(\bonload\b)#',
		'#(\bonmousedown\b)#', '#(\bonmousemove\b)#', '#(\bonmouseout\b)#', '#(\bonmouseover\b)#', '#(\bonmouseup\b)#',
		'#(\bonreset\b)#', '#(\bonselect\b)#', '#(\bonsubmit\b)#', '#(\bonunload\b)#', '#(\bprofile\b)#', '#(\bprompt\b)#',
		'#(\breadonly\b)#', '#(\brel\b)#', '#(\brev\b)#', '#(\brows\b)#',
		'#(\brules\b)#', '#(\brunat\b)#', '#(\bscheme\b)#', '#(\bscope\b)#', '#(\bscrollamount\b)#', '#(\bscrolldelay\b)#',
		'#(\bshape\b)#', '#(\bshowintaskbar\b)#', '#(\bsingleinstance\b)#', '#(\bsize\b)#', '#(\bsrc\b)#', '#(\bstandby\b)#',
		'#(\bstart\b)#', '#(\bsummary\b)#', '#(\bsysmenu\b)#', '#(\btabindex\b)#', '#(\btarget\b)#', '#(\btopmargin\b)#',
		'#(\btype\b)#', '#(\burn\b)#', '#(\busemap\b)#', '#(\bvalign\b)#', '#(\bvalue\b)#', '#(\bvaluetype\b)#',
		'#(\bversion\b)#', '#(\bvlink\b)#', '#(\bvrml\b)#', '#(\bvspace\b)#', '#(\bwidth\b)#', '#(\bwindowstate\b)#',
		'#(\bwrap\b)#', '#(\bscrolling\b)#', '#(\bselected\b)#',
		/**
		* we replaced the text 'color' in 'Y29sb3I' by using
		* the function base64_encode().
		* So the next five texts are
		* [1] bordercolor
		* [2] bordercolordark
		* [3] bordercolorlight
		* [4] bgcolor
		* [5] color
		*/
		'#(\bborderY29sb3I\b)#',
		'#(\bborderY29sb3Idark\b)#',
		'#(\bborderY29sb3Ilight\b)#',
		'#(\bbgY29sb3I\b)#',
		'#(\bY29sb3I\b)#',
		/**
		* we replaced the text 'span' in 'c3Bhbg' by using
		* the function base64_encode().
		* so the next two texts are
		* [1] rowspan
		* [2] colspan
		*/
		'#(\browc3Bhbg\b)#',
		'#(\bcolc3Bhbg\b)#',
		/**
		* we replaced the text 'style' in 'c3R5bGU' by using
		* the function base64_encode().
		* so the next text is 'borderstyle'
		*/
		'#(\bborderc3R5bGU\b)#'
	);
	
	$commands_purple = array
	(
		'#(\b_blank\b)#', '#(\bblack\b)#', '#(\bblue\b)#', '#(\bbottom\b)#', '#(\bgreen\b)#', '#(\bhidden\b)#', '#(\bleft\b)#',
		'#(\bmagenta\b)#', '#(\bmiddle\b)#', '#(\borange\b)#', '#(\bpublic\b)#', '#(\bpurple\b)#', '#(\bred\b)#', '#(\bright\b)#',
		'#(\btop\b)#', '#(\bwhite\b)#', '#(\byellow\b)#'
	);

	/**
	* First we make the < and > special html characters, otherwise it won't echo
	*/
	$temp = str_replace('<', '&lt;', $input);
	$temp = str_replace('>', '&gt;', $temp);

	/**
	* Then we give the things in the arrays $commands_purple,
	* $commands_red and $commands_boldblue their own color
	*/
	$temp = preg_replace($commands_purple, '<span style="color: #9C029C;font-weight:bold">$1</span>', $temp);
	$temp = preg_replace($commands_boldblue, '<span style="color: #0000FF;font-weight:bold">$1</span>', $temp);
	$temp = preg_replace($commands_red, '<span style="color: red;font-weight:bold">$1</span>', $temp);
	$temp = preg_replace($commands_red, '<span style="color: red;font-weight:bold">$1</span>', $temp);
	return $temp;
}

but when i call this function it also indexes php, script and style. and as far as i know it is this 'preg_match_all('#(<.*?>)#is', $highlighted, $matches);'
so does anyone know how to index only html, and ignoring php, script, and style??

greetzz

oh btw, i encoded some things coz otherwise it wouldn't work proper. you'll see what they're in the array

#2 superaktieboy

superaktieboy
  • Members
  • PipPipPip
  • Advanced Member
  • 32 posts

Posted 30 October 2006 - 09:51 AM

nobody??

#3 effigy

effigy
  • Staff Alumni
  • Advanced Member
  • 3,600 posts
  • LocationIL

Posted 30 October 2006 - 03:18 PM

Try #(<(?!script|style|\?).*?>)#
Regexp | Unicode Article | Letter Database
/\A(e)?((1)?ff(?:(?:ig)?y)?|f(?:ig)?)\z/

#4 superaktieboy

superaktieboy
  • Members
  • PipPipPip
  • Advanced Member
  • 32 posts

Posted 01 November 2006 - 03:30 PM

well that didn't work :( but i have found something else, thanks anyway

#5 effigy

effigy
  • Staff Alumni
  • Advanced Member
  • 3,600 posts
  • LocationIL

Posted 01 November 2006 - 06:13 PM

Such as?
Regexp | Unicode Article | Letter Database
/\A(e)?((1)?ff(?:(?:ig)?y)?|f(?:ig)?)\z/

#6 superaktieboy

superaktieboy
  • Members
  • PipPipPip
  • Advanced Member
  • 32 posts

Posted 01 November 2006 - 06:28 PM

this is not realy regex, but in the function highlight_html() i put in the following before i change anything:
<?php
if(strstr('style', $input) OR strstr('<?', $input) OR strstr('<?php', $input) OR strstr('script', $input))
{
    return $input;
}
?>

(stupid i didn't came on that earlier :P)




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users