rossh Posted June 28, 2008 Share Posted June 28, 2008 Hi, I've been working on a sanitize script to clean html files. I'm trying to figure out how to delete lines from the top of the file until the <body> tag is found. Any help would be appreciated. R <?php $base_dir = 'c:\\xampp\\htdocs\\sanitize\\'; //CLEAN DATA HERE $sanitize = array( ' & ' => ' & ', '£' => '£', '£' => '£', '<b>' => '<strong>', '</b>' => '</strong>', '<i>' => '<em>', '</i>' => '</em>' ); //Output the current directories content function outputDir(){ global $base_dir; $my_dir = $base_dir.$my_dir; $dir = opendir($my_dir) or die('Couldn\'t open directory, please contact the web administrator'); while(($file = readdir($dir)) !== false){ if($file != '.' && $file != '..'){ $fname[] = $file; } } closedir($dir); foreach($fname as $file){ if(!is_dir($file) && $file != 'sanitize.php'){ $filenames .= $file.','; } } return $filenames = substr_replace($filenames ,'',-1); } //Read file, add title to anchor tags and output to new file function sanitizeFile($file){ global $base_dir, $sanitize; $lines = file($base_dir.$file); if(empty($lines)){ echo 'File: '.$file.' empty!'; } $new_lines = array(); foreach ($lines as $line){ foreach($sanitize as $search => $replace){ $line = str_replace($search, $replace, $line); } array_push ($new_lines, $line); } $content = implode('', $new_lines); $fp = fopen ($base_dir.'sanitized\\'.$file, w); fwrite ($fp, $content); fclose ($fp); } //Get filenames and sanitize if(isset($_POST['submit'])){ $filenames = outputDir(); $filenames = explode(',', $filenames); $response .= '<ul>'."\n"; foreach($filenames as $file){ sanitizeFile($file); $response .= '<li>Sanitized: '.$file.'</li>'."\n"; } $response .= '</ul>'."\n"; } ?> <p>Base Directory: <?php echo $base_dir; ?></p> <form name="sanitize_data" action="<?php echo $_SERVER['PHP_SELF']; ?>" method="post"> <input type="submit" name="submit" value="Sanitize!" id="submit" /> </form> <?php echo $response; ?> Quote Link to comment https://forums.phpfreaks.com/topic/112307-solved-delete-lines-from-htm-file-until-body-tag-is-found/ Share on other sites More sharing options...
Sulman Posted June 28, 2008 Share Posted June 28, 2008 Here is one way you could do it: <?php $unsanitesedHTMLstring;//use fopen & freads to get the html page in to a string $bits=explode("<body>", $unsanitesedHTMLstring); //explode on body to leave 2 bits $sanitisedHTMLstring="<body>".$bits[1]; //get the final string and add the body tag ?> The script would need tidying and error checking etc (for instance if the page has a body onload event you would need to alter it slightly) Quote Link to comment https://forums.phpfreaks.com/topic/112307-solved-delete-lines-from-htm-file-until-body-tag-is-found/#findComment-576589 Share on other sites More sharing options...
rossh Posted June 28, 2008 Author Share Posted June 28, 2008 I figured it out. <?php $base_dir = 'c:\\xampp\\htdocs\\work\\sanitize\\'; //CLEAN DATA HERE $sanitize = array( ' & ' => ' & ', '£' => '£', '£' => '£', '<b>' => '<strong>', '</b>' => '</strong>', '<i>' => '<em>', '</i>' => '</em>' ); //Output the current directories content function outputDir(){ global $base_dir; $my_dir = $base_dir.$my_dir; $dir = opendir($my_dir) or die('Couldn\'t open directory, please contact the web administrator'); while(($file = readdir($dir)) !== false){ if($file != '.' && $file != '..'){ $fname[] = $file; } } closedir($dir); foreach($fname as $file){ if(!is_dir($file) && $file != 'sanitize.php'){ $filenames .= $file.','; } } return $filenames = substr_replace($filenames ,'',-1); } //Read file, add title to anchor tags and output to new file function sanitizeFile($file){ global $base_dir, $sanitize; $lines = file($base_dir.$file); if(empty($lines)){ echo 'File: '.$file.' empty!'; } $new_lines = array(); foreach($lines as $line_num => $line){ if($found = strpos($line, 'body') == TRUE){ echo $body_tag_line = $line_num; } /*CLEAN DATA HERE*/ $line = preg_replace('/<a[^>]*?href=[\'"](.*?)[\'"][^>]*?>(.*?)<\/a>/si','<a href="$1" title="$2">$2</a>',$line); //Add titles to anchor tags foreach($sanitize as $search => $replace){ $line = str_replace($search, $replace, $line); } array_push ($new_lines, $line); } for($i=1;$i<$body_tag_line;$i++){ // unset($new_lines[$i]); } $content = implode('', $new_lines); $fp = fopen ($base_dir.'sanitized\\'.$file, w); fwrite ($fp, $content); fclose ($fp); } //Get filenames and sanitize if(isset($_POST['submit'])){ $filenames = outputDir(); $filenames = explode(',', $filenames); $response .= '<ul>'."\n"; foreach($filenames as $file){ sanitizeFile($file); $response .= '<li>Sanitized: '.$file.'</li>'."\n"; } $response .= '</ul>'."\n"; } ?> <p>Base Directory: <?php echo $base_dir; ?></p> <form name="sanitize_data" action="<?php echo $_SERVER['PHP_SELF']; ?>" method="post"> <input type="submit" name="submit" value="Sanitize!" id="submit" /> </form> <?php echo $response; ?> Quote Link to comment https://forums.phpfreaks.com/topic/112307-solved-delete-lines-from-htm-file-until-body-tag-is-found/#findComment-576591 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.