Jump to content

Finding Opening tags and Closting tags with Preg_Match()


spikeon

Recommended Posts

i'm trying to make a script that will re-format html into sexy, pretty, code

 

to do that i need to have indents

 

but indents only change in certian spots: they increase after a open tag, and decrease before a close tag

 

what i need is a script that will take this:

<u><b><i><a href ='http://www.gmail.com/'><img src='gmail.png' /></a></i></b></u>

modified with this:


        //remove spacing and newlines
        $file = str_replace("\n", " ", $file);
        $file = str_replace("\t", " ", $file);
        $file = preg_replace('/\s+/', ' ', $file);
        $file = str_replace("/ >", "/>", $file);
        
        
        //add newlines in all the right places
        $file = str_replace(">", ">\n", $file);
        $file = str_replace("<", "\n<", $file);
        $file = str_replace("\n ", "\n", $file);
        $file = str_replace("\n\n", "\n", $file);

(which becomes this:)

<u>
<b>
<i>
<a href ='http://www.gmail.com/'>
<img src='gmail.png' />
</a>
</i>
</b>
</u>

 

into this:

<u>
<b>
  <i>
   <a href ='http://www.gmail.com/'>
    <img src='gmail.png' />
   </a>
  </i>
</b>
</u>

 

heres what i got:

 

        foreach($lines as $line){
            
            if(preg_match("[<][^/].*[^/][>]", $line)){
                
                $indent = $indent + 1;
            }
            
            if(preg_match("[<][/].*[/][>]", $line)){
                
                $indent = $indent - 1;
            }
            
            if($indent > 0){
                
                $breaks = "";
             
                for($i = 0; $i <= $indent; $i++){
                    $breaks .= " ";        
                }
                
                $line = $breaks . $line;
            }
         
            $total .= $line . "\n";
        }

 

what am i doing wrong???

 

Link to comment
Share on other sites

Explode each line into an array, use a test to determine if the previous line is an opening or closing tag, and keep a running tally.

 

Suppose you start with

<html>\n<head>\n</head>\n<body>\n<h1>Here is a title\n</h1>\n<p>Here is a paragraph\n</p>\n</body>\n</html>

 

Turn the code into a massive string, then explode it into an array...

$string = '<html>\n<head>\n</head>\n<body>\n<h1>Here is a title\n</h1>\n<p>Here is a paragraph\n</p>\n</body>\n</html>';
$array = explode('\n',$string);

 

Then output the code keeping track of how many indents each line should have...

 


$indent_count = -1;
for($i=0; $i<sizeof($array); $i++){

   if(substr($array[$i], 2) == '</'){$indent_count--;} //if the array element is a closing tag, subtract an indent
   else if(substr($array[$i], 1) == '<'){$indent_count++;} //otherwise, check to see if it is an opening tag -- if so, add one indent to the count
   

   for($j=0; $j<sizeof($indent_count); $j++){ //print as many indents as there are in the count
        echo '\t';
   }
   echo $array[$i].'\n'; //print the actual piece of code, followed by a new line


}

Link to comment
Share on other sites

Found a couple errors in my last post....use this....

 

$string = '<html>\n<head>\n</head>\n<body>\n<h1>\nHere is a title\n</h1>\n<p>\nHere is a paragraph\n</p>\n</body>\n</html>';
$array = explode('\n',$string);
print_r($array);
$indent_count = -1;
for($i=0; $i<sizeof($array); $i++){

   if(substr($array[$i], 0, 1) == '<' && substr($array[$i], 0, 2) != '</'){$indent_count++;} //if an opening tag -- add one indent to the count
   
   //This next line is confusing but it just means:
   //"If this line is not a tag, and the last line was an opening tag, add an indent"
   if(substr($array[$i], 0, 1) != '<' && substr($array[$i-1], 0, 1) == '<' && substr($array[$i-1], 0, 2) != '</'){$indent_count++;}
   
   //Again confusing, but...
   //"If this line is a closing tag, and the last line was not a tag, subtract an indent" 
   if(substr($array[$i-1], 0, 1) != '<' && substr($array[$i],0, 2) == '</'){$indent_count--;}

   for($j=0; $j<$indent_count; $j++){ //print as many indents as there are in the count
        echo '\t';
   }
   echo $array[$i].'\n'; //print the actual piece of code, followed by a new line

   if(substr($array[$i],0, 2) == '</'){$indent_count--;} //if the array element is a closing tag, subtract an indent
}

 

Link to comment
Share on other sites

Hi, me again....I was thinking about this code, and realized that for tags that don't have closing tags (for example image tags), you will end up adding an extra indentation.  The way to fix it would just be to add another if statement to the loop that checks to see if the array element is an image tag, and if so, subtract an indentation...something like

 

   if(substr($array[$i],0, 4) == '<img'){$indent_count--;} //if the array element is an image tag, subtract the accidentally added indent

Link to comment
Share on other sites

OK, last post, I promise.   Here are the last three posts combined.  Same concept, more elegant coding:

 


//remove spacing and newlines
  $file = str_replace("\n", " ", $file);
  $file = str_replace("\t", " ", $file);
  $file = preg_replace('/\s+/', ' ', $file);
  $file = str_replace("/ >", "/>", $file);    
//add newlines in all the right places
  $file = str_replace(">", ">\n", $file);
  $file = str_replace("<", "\n<", $file);
  $file = str_replace("\n ", "\n", $file);
  $file = str_replace("\n\n", "\n", $file);

$array = explode('\n',$file);
$indent_count = 0;
$newfile = '';

for($i=0; $i<sizeof($array); $i++){

     /* If a closing tag, subtract an indent */
          if(substr($array[$i], 0, 2) == '</'){$indent_count--;}

     /* Add this line's indents */ 
          for($j=0; $j<$indent_count; $j++){$newfile .= '\t';}
  		
     /* Add the actual line followed by a \n (since it was destroyed in the array explode) */
          $newfile .= $array[$i].'\n';
   
    /* If a tag, and not a closing tag, must be an opening tag, so add an indent */
          if(substr($array[$i],0, 1) == '<' && substr($array[$i],0, 2) != '</'){$indent_count++;}
   
    /* Previous step doesn't account for image tags (or any tag without a closing) so test for an image tag, and subtract indent if found */
          if(substr($array[$i],0, 4) == '<img'){$indent_count--;}
}

echo $newfile;

Link to comment
Share on other sites

I implimented that, and it still didn't work.

 

i'll give out FULL CODE so i can get help

 

 

ok, heres a link to the page its on:

http://www.youwereloved.org/format.php

 

Heres the full code

<?php header("Content-type: text/html; charset=UTF-8") ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>Format Me</title>
</head><body>


<?php
    // edit number: 42
    function format_code($dir){
        //get site
        $file = file_get_contents($dir);
        //remove spacing and newlines
//remove spacing and newlines
  $file = str_replace("\n", " ", $file);
  $file = str_replace("\t", " ", $file);
  $file = preg_replace('/\s+/', ' ', $file);
  $file = str_replace("/ >", "/>", $file);
//add newlines in all the right places
  $file = str_replace(">", ">\n", $file);
  $file = str_replace("<", "\n<", $file);
  $file = str_replace("\n ", "\n", $file);
  $file = str_replace("\n\n", "\n", $file);

  $array = explode('\n',$file);
  $indent_count = 0;
  $newfile = '';

  for($i=0; $i<sizeof($array); $i++){

     /* If a closing tag, subtract an indent */
          if(substr($array[$i], 0, 2) == '</'){$indent_count--;}

     /* Add this line's indents */
          for($j=0; $j<$indent_count; $j++){$newfile .= '\s';}

     /* Add the actual line followed by a \n (since it was destroyed in the array explode) */
          $newfile .= $array[$i].'\n';

    /* If a tag, and not a closing tag, must be an opening tag, so add an indent */
          if(substr($array[$i],0, 1) == '<' && substr($array[$i],0, 2) != '</'){$indent_count++;}

    /* Previous step doesn't account for image tags (or any tag without a closing) so test for an image tag, and subtract indent if found */
          if(substr($array[$i],0, 4) == '<img' || substr($array[$i],0, 4) == '<br' || substr($array[$i],0, 4) == '<hr' ){$indent_count--;}
  }

        return $newfile;
    }
    $dir = "http://www.akirablaid.com/index.php";
    echo "<xmp>";
    echo format_code($dir);
    echo " ";
    echo "</xmp>"
?>
</body></html>

Link to comment
Share on other sites

played arround with it a little more....  still dosen't work

 


<?php header("Content-type: text/html; charset=UTF-8") ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>Format Me</title>
</head><body>


<?php
    // edit number: 50
    function format_code($dir){
        //get site
        $file = file_get_contents($dir);
        //remove spacing and newlines
        $file = str_replace("\n", " ", $file);
        $file = str_replace("\t", " ", $file);
        $file = preg_replace('/\s+/', ' ', $file);
        $file = str_replace("/ >", "/>", $file);
        //make br's, img's and hr's kosher
        //$file = str_replace("<br>", "<br />" $file);
        //$file = str_replace("<br/>", "<br />" $file);
        //add newlines in all the right places
        $file = str_replace(">", ">\n", $file);
        $file = str_replace("<", "\n<", $file);
        $file = str_replace("\n ", "\n", $file);
        $file = str_replace("\n\n", "\n", $file);
        $array = explode('\n',$file);
        $indent_count = 0;
        $newfile = '';
        for($i=0; $i<sizeof($array); $i++){
     /* If a closing tag, subtract an indent */
          if(substr($array[$i], 0, 2) == '</'){$indent_count--;}

     /* Add this line's indents */
          for($j=0; $j<$indent_count; $j++){$newfile .= '\s';}
          //$array[$i] = ereg_replace("<img\s*(.*)\s*/*>", "<img \\1 />", $array[$i]);
          //$array[$i] = ereg_replace("<hr\s*(.*)/*\s*>", "<hr \\1 />", $array[$i]);

     /* Add the actual line followed by a \n (since it was destroyed in the array explode) */
          $newfile .= $array[$i].'\n';

    /* If a tag, and not a closing tag, must be an opening tag, so add an indent */
          if(substr($array[$i],0, 1) == '<' && substr($array[$i],0, 2) != '</'){$indent_count++;}

    /* Previous step doesn't account for image tags (or any tag without a closing) so test for an image tag, and subtract indent if found */
          if(substr($array[$i],0, 4) == '<img' || substr($array[$i],0, 3) == '<br' || substr($array[$i],0, 3) == '<hr' ){
                  $indent_count--;
          }
        }
        return $newfile;
    }
    $dir = "http://www.akirablaid.com/index.php";
    echo "<xmp>";
    echo format_code($dir);
    echo " ";
    echo "</xmp>"
?>
</body></html>

Link to comment
Share on other sites

This works, but a few notes: 

 

1) For the "special" array (tags that shouldn't induce indents because they have no closing tags), things are case sensitive -- there's got to be a way to get around that, i'll leave that to you.

 

2) Your sample code contains errors (i.e. more </strong> tags than <strong> tags).  That's a pain, and it screws up the output of the script....I'm sure you could write a bit of code that could test for that, and prevent changes to the indent_count on errors...I'll leave that one to you too.  do something where you have a for loop that tests each line and for every opening tag of a particular sort do a ++ and for every closing tag of a particular sort do --.  Work from there...Or just don't write erroneous code!

3) I added a str_replace so that <br>(breaks) don't get their own line...Most script I've seen doesn't usually let each and every <br> tag have its own line, but feel free to take that out if you want.

4) Your meta keyword and description tags are RIDICULOUS.  CHANGE THEM NOW!  First of all, search engines hardly even look at the keyword tag anymore (although description is important).  Secondly, not only does keyword stuffing NOT help, in some cases can actually get you penalized!

 

Ok, with that...here is the code...

$file = 'text.txt';

$opFile = fopen ($file, "r");
$string = fread ($opFile, filesize ($file));
fclose ($opFile);

//remove spacing and newlines
  $string = str_replace("\n", " ", $string);
  $string = str_replace("\t", " ", $string);
  $string = preg_replace('/\s+/', ' ', $string);
  $string = str_replace("/ >", "/>", $string);
//add newlines in all the right places
  $string = str_replace(">", ">\n", $string);
  $string = str_replace("<", "\n<", $string);
  $string = str_replace("\n ", "\n", $string);
  $string = str_replace("\n\n", "\n", $string);
  $string = str_replace("\n<br", "<br", $string);

// next part, break by lines
$array = split("\n", $string);

  $indent_count = 0;
$newfile = '';
$specials = array('<br','<hr','<img','<?xml','<!DOC','<!--','<meta','<link','<INPUT');

  for($i=0; $i<sizeof($array); $i++){

 /* If a closing tag, subtract an indent */
      if(substr($array[$i], 0, 2) == '</'){$indent_count--;}

 /* Add this line's indents */
      for($j=0; $j<$indent_count; $j++){$newfile .= "\t";}

 /* Prepend the prestring onto the array element (aka line) */
      $newfile .= $array[$i]."\n";

/* If a tag, and not a closing tag, must be an opening tag, so add an indent */
      if(substr($array[$i],0, 1) == '<' && substr($array[$i],0, 2) != '</'){$indent_count++;}

/* Previous step doesn't account tags w/o closings, so test for them, and subtract indent if found */
      for ($z=0; $z<sizeof($specials); $z++){
	      if(substr($array[$i],0, strlen($specials[$z])) == $specials[$z]){$indent_count--;}
		  }
	}

echo '<pre>'.htmlspecialchars (print_r ($newfile, TRUE)).'</pre>'; 

 

 

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.