Jump to content

index everything between < and > but ignoring php, style and script


superaktieboy

Recommended Posts

hi

i have a code like this (selected from a Database, so is different as this, as it is posted by users)
[code]
<?
echo 'this is php higlighted';
?>

<html>
this is as html highlighted
</html>

script language="text/javascript" (with < and > but coz that posting problem)
// this is javascript highlighted
/script (with < and > but coz that posting problem)

<style>
/*
this is css highlighted
*/
</style>
[/code]

now i got for each highlighter it's own function called: highlight_php, highlight_html, highlight_js and highlight_css and next to that i got a function called 'highlight($input, $unparse = false)'

now i use this to highlight html
[code]
preg_match_all('#(<.*?>)#is', $highlighted, $matches);
for ($i = 0; $i <= count($matches[0]); $i++)
{
$replacer = highlight_html($matches[0][$i]);
$highlighted = str_replace($matches[0][$i], $replacer, $highlighted);
}
[/code]

this is the function highlight_html():
[code]
function highlight_html($input)
{
/**
* These three array will make it easyer to preg_replace to colors
*/
$commands_boldblue = array
(
'#(\ba\b)#', '#(\babbr\b)#', '#(\babove\b)#', '#(\bacronym\b)#', '#(\baddress\b)#', '#(\bapplet\b)#', '#(\barea\b)#',
'#(\barray\b)#', '#(\bb\b)#', '#(\bbase\b)#', '#(\bbdo\b)#', '#(\bbgsound\b)#', '#(\bbig\b)#', '#(\bxmp\b)#',
'#(\bblink\b)#', '#(\bblockquote\b)#', '#(\bbody\b)#', '#(\bbox\b)#', '#(\bbr\b)#', '#(\bbutton\b)#', '#(\bcaption\b)#',
'#(\bcenter\b)#', '#(\bcite\b)#', '#(\bcode\b)#', '#(\bcol\b)#', '#(\bcolgroup\b)#', '#(\bcomment\b)#', '#(\bdd\b)#',
'#(\bdel\b)#', '#(\bdfn\b)#', '#(\bdir\b)#', '#(\bdiv\b)#', '#(\bdl\b)#', '#(\bdoctype\b)#', '#(\bdt\b)#', '#(\bem\b)#',
'#(\bembed\b)#', '#(\bfieldset\b)#', '#(\bfig\b)#', '#(\bform\b)#', '#(\bframe\b)#', '#(\bframeset\b)#', '#(\btfoot\b)#',
'#(\bh\b)#', '#(\bh1\b)#', '#(\bh2\b)#', '#(\bh3\b)#', '#(\bh4\b)#', '#(\bh5\b)#', '#(\bh6\b)#', '#(\bhead\b)#',
'#(\bhr\b)#', '#(\bhta\b)#', '#(\bhtml\b)#', '#(\bi\b)#', '#(\biframe\b)#', '#(\bimg\b)#', '#(\binput\b)#', '#(\bins\b)#',
'#(\bisindex\b)#', '#(\bkbd\b)#', '#(\blabel\b)#', '#(\blegend\b)#', '#(\bli\b)#', '#(\blink\b)#', '#(\blisting\b)#',
'#(\bmap\b)#', '#(\bmarquee\b)#', '#(\bmenu\b)#', '#(\bmeta\b)#', '#(\bmulticol\b)#', '#(\bnextid\b)#', '#(\bnobr\b)#',
'#(\bnoframes\b)#', '#(\bnoscript\b)#', '#(\bnote\b)#', '#(\bol\b)#', '#(\boptgroup\b)#', '#(\boption\b)#', '#(\bp\b)#',
'#(\bparam\b)#', '#(\bplaintext\b)#', '#(\bpre\b)#', '#(\bq\b)#', '#(\brange\b)#', '#(\broot\b)#', '#(\bs\b)#',
'#(\bsamp\b)#', '#(\bselect\b)#', '#(\bsmall\b)#', '#(\bsound\b)#', '#(\bspacer\b)#', '#(\btbody\b)#', '#(\btd\b)#',
'#(\bsqrt\b)#', '#(\bstrike\b)#', '#(\bstrong\b)#', '#(\bsub\b)#', '#(\bsup\b)#', '#(\btext\b)#', '#(\btextarea\b)#',
'#(\btable\b)#', '#(\btextflow\b)#', '#(\bth\b)#', '#(\bthead\b)#', '#(\btitle\b)#', '#(\btr\b)#', '#(\btt\b)#',
'#(\bu\b)#', '#(\bul\b)#', '#(\bvar\b)#', '#(\bwbr\b)#',
/**
* we replaced the text 'span' in 'c3Bhbg' by using
* the function base64_encode().
* so the next text is 'span'
*/
'#(\bc3Bhbg\b)#',
/**
* we replaced the text 'style' in 'c3R5bGU' by using
* the function base64_encode().
* so the next text is 'style'
*/
'#(\bc3R5bGU\b)#',
/**
* we replaced the text 'font' in 'Zm9udA' by using
* the function base64_encode().
* so the next two texts are
* [1] basefont
* [2] font
*/
'#(\bbaseZm9udA\b)#',
'#(\bZm9udA\b)#'
);

$commands_red = array
(
'#(\baccept\b)#', '#(\baccesskey\b)#', '#(\baction\b)#', '#(\balign\b)#', '#(\balink\b)#', '#(\balt\b)#',
'#(\bapplicationname\b)#', '#(\barchive\b)#', '#(\baxis\b)#', '#(\bbackground\b)#', '#(\bbehavior\b)#', '#(\bbelow\b)#',
'#(\bcellpadding\b)#', '#(\bbgproperties\b)#', '#(\bborder\b)#', '#(\bcellspacing\b)#', '#(\bchar\b)#',
'#(\bcharoff\b)#', '#(\bcharset\b)#', '#(\bchecked\b)#', '#(\bclass\b)#', '#(\bclassid\b)#', '#(\bclear\b)#',
'#(\bcodebase\b)#', '#(\bcodetype\b)#', '#(\bcols\b)#',  '#(\bcompact\b)#', '#(\bhttp-equiv\b)#', '#(\bhttp-equiv\b)#',
'#(\bcontent\b)#', '#(\bcoords\b)#', '#(\bdata\b)#', '#(\bdatetime\b)#', '#(\bdeclare\b)#', '#(\bdefer\b)#',
'#(\bdirection\b)#', '#(\bdisabled\b)#', '#(\bdynsrc\b)#', '#(\benctype\b)#', '#(\bequiv\b)#', '#(\bface\b)#',
'#(\bfor\b)#', '#(\bframeborder\b)#', '#(\bframespacing\b)#', '#(\bgutter\b)#', '#(\bheaders\b)#', '#(\bheight\b)#',
'#(\bhref\b)#', '#(\bhreflang\b)#', '#(\bhspace\b)#', '#(\bicon\b)#', '#(\bid\b)#', '#(\bismap\b)#',
'#(\blanguage\b)#', '#(\bleftmargin\b)#', '#(\blongdesc\b)#', '#(\bloop\b)#', '#(\blowsrc\b)#', '#(\bmarginheight\b)#',
'#(\bmarginwidth\b)#', '#(\bmaximizebutton\b)#', '#(\bmaxlength\b)#', '#(\bmedia\b)#', '#(\bmethod\b)#', '#(\bmethods\b)#',
'#(\bminimizebutton\b)#', '#(\bmultiple\b)#', '#(\bname\b)#', '#(\bnohref\b)#', '#(\bnoresize\b)#', '#(\bnoshade\b)#',
'#(\bnowrap\b)#', '#(\bobject\b)#', '#(\bonabort\b)#', '#(\bonblur\b)#', '#(\bonchange\b)#', '#(\bonclick\b)#',
'#(\bondblclick\b)#', '#(\bonfocus\b)#', '#(\bonkeydown\b)#', '#(\bonkeypress\b)#', '#(\bonkeyup\b)#', '#(\bonload\b)#',
'#(\bonmousedown\b)#', '#(\bonmousemove\b)#', '#(\bonmouseout\b)#', '#(\bonmouseover\b)#', '#(\bonmouseup\b)#',
'#(\bonreset\b)#', '#(\bonselect\b)#', '#(\bonsubmit\b)#', '#(\bonunload\b)#', '#(\bprofile\b)#', '#(\bprompt\b)#',
'#(\breadonly\b)#', '#(\brel\b)#', '#(\brev\b)#', '#(\brows\b)#',
'#(\brules\b)#', '#(\brunat\b)#', '#(\bscheme\b)#', '#(\bscope\b)#', '#(\bscrollamount\b)#', '#(\bscrolldelay\b)#',
'#(\bshape\b)#', '#(\bshowintaskbar\b)#', '#(\bsingleinstance\b)#', '#(\bsize\b)#', '#(\bsrc\b)#', '#(\bstandby\b)#',
'#(\bstart\b)#', '#(\bsummary\b)#', '#(\bsysmenu\b)#', '#(\btabindex\b)#', '#(\btarget\b)#', '#(\btopmargin\b)#',
'#(\btype\b)#', '#(\burn\b)#', '#(\busemap\b)#', '#(\bvalign\b)#', '#(\bvalue\b)#', '#(\bvaluetype\b)#',
'#(\bversion\b)#', '#(\bvlink\b)#', '#(\bvrml\b)#', '#(\bvspace\b)#', '#(\bwidth\b)#', '#(\bwindowstate\b)#',
'#(\bwrap\b)#', '#(\bscrolling\b)#', '#(\bselected\b)#',
/**
* we replaced the text 'color' in 'Y29sb3I' by using
* the function base64_encode().
* So the next five texts are
* [1] bordercolor
* [2] bordercolordark
* [3] bordercolorlight
* [4] bgcolor
* [5] color
*/
'#(\bborderY29sb3I\b)#',
'#(\bborderY29sb3Idark\b)#',
'#(\bborderY29sb3Ilight\b)#',
'#(\bbgY29sb3I\b)#',
'#(\bY29sb3I\b)#',
/**
* we replaced the text 'span' in 'c3Bhbg' by using
* the function base64_encode().
* so the next two texts are
* [1] rowspan
* [2] colspan
*/
'#(\browc3Bhbg\b)#',
'#(\bcolc3Bhbg\b)#',
/**
* we replaced the text 'style' in 'c3R5bGU' by using
* the function base64_encode().
* so the next text is 'borderstyle'
*/
'#(\bborderc3R5bGU\b)#'
);

$commands_purple = array
(
'#(\b_blank\b)#', '#(\bblack\b)#', '#(\bblue\b)#', '#(\bbottom\b)#', '#(\bgreen\b)#', '#(\bhidden\b)#', '#(\bleft\b)#',
'#(\bmagenta\b)#', '#(\bmiddle\b)#', '#(\borange\b)#', '#(\bpublic\b)#', '#(\bpurple\b)#', '#(\bred\b)#', '#(\bright\b)#',
'#(\btop\b)#', '#(\bwhite\b)#', '#(\byellow\b)#'
);

/**
* First we make the < and > special html characters, otherwise it won't echo
*/
$temp = str_replace('<', '&lt;', $input);
$temp = str_replace('>', '&gt;', $temp);

/**
* Then we give the things in the arrays $commands_purple,
* $commands_red and $commands_boldblue their own color
*/
$temp = preg_replace($commands_purple, '<span style="color: #9C029C;font-weight:bold">$1</span>', $temp);
$temp = preg_replace($commands_boldblue, '<span style="color: #0000FF;font-weight:bold">$1</span>', $temp);
$temp = preg_replace($commands_red, '<span style="color: red;font-weight:bold">$1</span>', $temp);
$temp = preg_replace($commands_red, '<span style="color: red;font-weight:bold">$1</span>', $temp);
return $temp;
}
[/code]

but when i call this function it also indexes php, script and style. and as far as i know it is this 'preg_match_all('#(<.*?>)#is', $highlighted, $matches);'
so does anyone know how to index only html, and ignoring php, script, and style??

greetzz

oh btw, i encoded some things coz otherwise it wouldn't work proper. you'll see what they're in the array
Link to comment
Share on other sites

this is not realy regex, but in the function highlight_html() i put in the following before i change anything:
[code]
<?php
if(strstr('style', $input) OR strstr('<?', $input) OR strstr('<?php', $input) OR strstr('script', $input))
{
    return $input;
}
?>
[/code]

(stupid i didn't came on that earlier :P)
Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.