bschultz Posted August 27, 2014 Share Posted August 27, 2014 I need to strip some html tags out of an uploaded string of code. I need to keep the <td> tags...but some code that is being uploaded include <p> tags INSIDE the <td> tag. How would I go about stripping ALL other tags inside these allowed tags: <td> <tr><table> Quote Link to comment Share on other sites More sharing options...
Ch0cu3r Posted August 27, 2014 Share Posted August 27, 2014 See strip_tags Quote Link to comment Share on other sites More sharing options...
Jacques1 Posted August 27, 2014 Share Posted August 27, 2014 Do not use strip_tags(). This function mangles the user input based on a very primitive mechanism. If you're lucky, it will only remove the parts you want to remove. But chances are it will cut off the input somewhere, either because the markup is invalid, or because the function is simply too stupid to understand the markup. Why strip_tags() is still around and gets recommended is beyond me. Do yourself a favor and use a proper filter like HTML Purifier. Quote Link to comment Share on other sites More sharing options...
bschultz Posted August 27, 2014 Author Share Posted August 27, 2014 HTML Purifier didn't remove the inner tags either... Quote Link to comment Share on other sites More sharing options...
Jacques1 Posted August 27, 2014 Share Posted August 27, 2014 Of course it does. If you want to allow p elements outside of tables, you need to do more work: You have to actually parse the markup and then accept or reject tags depending on the context. But what's the point of this complicated logic? Quote Link to comment Share on other sites More sharing options...
bschultz Posted August 27, 2014 Author Share Posted August 27, 2014 That explains it...I put the allowed tags in brackets...thanks! Quote Link to comment Share on other sites More sharing options...
bschultz Posted August 28, 2014 Author Share Posted August 28, 2014 Ever heard of HTML Purifier adding whitespace at the end of tags? I'm trying to import sports jersey numbers into a DB...as int. Whitespace is killing the import. Quote Link to comment Share on other sites More sharing options...
Ch0cu3r Posted August 28, 2014 Share Posted August 28, 2014 Try using trim to clear the whitespace before and after the value? Alternatively typecast the value to int $value = (int) $value; // typecase value to int Quote Link to comment Share on other sites More sharing options...
bschultz Posted August 28, 2014 Author Share Posted August 28, 2014 I tried trim, ltrim and rtirm last night...none seemed to work. I'll try typecast today...thanks! Quote Link to comment Share on other sites More sharing options...
Ch0cu3r Posted August 28, 2014 Share Posted August 28, 2014 I'm trying to import sports jersey numbers into a DB...as int. I tried trim, ltrim and rtirm last night...none seemed to work. I'll try typecast today...thanks! How are you using HTML Purifier for getting the jersey numbers? Quote Link to comment Share on other sites More sharing options...
bschultz Posted August 28, 2014 Author Share Posted August 28, 2014 copy and paste the roster from the schools website into a wysiwyg text box on my site. That entry runs through HTML Purifier to clean up formatting other than the table tr and td tags...then put into an array. The array is then split by < td > to find each inidividual roster entry (number, name, height, weight etc.). I was using strip_tags to get rid of everything other than table tr and td...but a new school had embedded p tags inside the td tag...which killed the import. HTML Purifier is cleaning up the entered code...but I don't know where the extra whitespace is coming from. Still haven't had a chance to play with typecast, but will soon. Quote Link to comment Share on other sites More sharing options...
Ch0cu3r Posted August 28, 2014 Share Posted August 28, 2014 You may be better of scrapping the data you require using DOM (or alternatively using simple_html_dom). That way you can load the roaster webpage into the above libraries and target the specific HTML elements in the HTML document and extract the data you require. Quote Link to comment Share on other sites More sharing options...
bschultz Posted August 28, 2014 Author Share Posted August 28, 2014 (edited) No two schools put things in the same order or with the same div or column names. The method I am using lets the end user select the order of the roster they are loading at that time based on which column is which. IE: column 1 is number, column 2 is name, column 3 is a picture of the payer (which we don't use and ignore), column 4 is their height. Next week, it might be column 1 name, column 2 number...etc. One week the school may only provide a Word document in table format... Like I said, there are no set formats of what is being copied and pasted in. As long as the info being gathered is in table form (which is the case 95+ percent of the time), my system can adapt. If you can show me how DOM or html_simple_dom would work better...I'd love to hear it. Edited August 28, 2014 by bschultz Quote Link to comment Share on other sites More sharing options...
Ch0cu3r Posted August 28, 2014 Share Posted August 28, 2014 A quick example (requires simple_html_dom.php found here) <?php // include simple_html_dom require_once 'simple_html_dom.php'; // function scraps data from $url and returns data defined in $elements function findPlayerInfoByElements($url, $elements = array()) { // load page into simpleHtmlDom $html = file_get_html($url); // get the data alias keys. This will be used as the keys to associative array return by the function $aliases = array_keys($elements); // for each element $data = array(); foreach($elements as $element) { // find all data by element $columnDataFound = $html->find($element); // if the element was found if($columnDataFound) { // return the value of the element as plain text - removes any HTML $data[] = array_map(function($v) { return trim($v->plaintext); }, $columnDataFound); } } // format the players array $players = array(); // looping over the data add each players info into seperate associative arrays for ($i = 0; $i < count($data[0]); $i++) { $info = array(); foreach($aliases as $k => $alias) $info[$alias] = $data[$k][$i]; $players[] = $info; } // unset the orginal data unset($data); // return the players info return $players; } // url to scrap roster info from $roster_url = 'http://www.seahawks.com/team/roster.html'; /* Provide findPlayerInfoByElements() function - url to scrap roster table - provide an array of elements to get data required, eg jersey no, player name, hieght and weight */ $elements = array( 'no' => 'td.col-jersey', // gets the jersey numbers from the <td> element with the class of col-jersey 'name' => 'td.col-name', // gets the players name from the <td> element with the class of col-name 'height' => 'td.col-height', // their height from the <td> element with the class of col-height 'weight' => 'td.col-weight', // their weight from the <td> element with the class of col-weight ); // returns each players info in an associative array $players = findPlayerInfoByElements($roster_url, $elements); printf('<pre>%s</pre>', print_r($players, true)); Change $roster_url will your schools roster page Modify $elements array with the HTML elements you need to find the data from. $players will contain info for each player in the roster. Quote Link to comment Share on other sites More sharing options...
bschultz Posted August 29, 2014 Author Share Posted August 29, 2014 My clients have ZERO html knowledge...they copy and paste...there's no way for them to know what the elements are from one schol to the next. Quote Link to comment Share on other sites More sharing options...
CroNiX Posted August 29, 2014 Share Posted August 29, 2014 Create a form for them to input the data? Quote Link to comment Share on other sites More sharing options...
bschultz Posted August 29, 2014 Author Share Posted August 29, 2014 That's what I have for them. The WYSIWYG box shows the copied data...not the raw html. If they don't know how to read the html, how are they supposed to know what to label the elements for my code? Quote Link to comment Share on other sites More sharing options...
Solution bschultz Posted August 31, 2014 Author Solution Share Posted August 31, 2014 Typecast worked for the int values...but all the other fields had whitespace too. I ran trim on the foreach for the array...and it didn't work. I ran trim on the insert of the mysql table for each field...and it worked. Still don't know where the whitespace came from, but I got rid of it. Thanks! Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.