jwhite68 Posted September 27, 2007 Share Posted September 27, 2007 I am receiving some fairly poor HTML from a client. eg. it contains "· " instead of using <ul><li> tags to create bullets. Does anyone have any suggestions how I can convert this kind of information, eg to the <ul><li> tags - if I have a series of sentences preeceded by this "· " ? Quote Link to comment https://forums.phpfreaks.com/topic/70920-how-to-handle-poor-html-data-in-php/ Share on other sites More sharing options...
recklessgeneral Posted September 27, 2007 Share Posted September 27, 2007 Hi, What you'd need to do is use some kind of fancy regular expression stuff to replace the "· " with a <li> element, the next newline or break with a </li>. The trickier part then comes in placing the <ul> and </ul> tags before the first and after the last of a block of bullets. For reference, the WikiParser class might be worth a look. It can be downloaded from http://code.blitzaffe.com/pages/phpclasses/files/wiki_parser_52-13/view/1. Basically, it can convert lines beginning with asterisks into proper ul style html lists. It will also do a bunch of other conversions as well. Take a look at it and strip out the bits you don't need. Hope this helps, Darren. Quote Link to comment https://forums.phpfreaks.com/topic/70920-how-to-handle-poor-html-data-in-php/#findComment-356558 Share on other sites More sharing options...
jwhite68 Posted September 28, 2007 Author Share Posted September 28, 2007 I will look into that. Does anyone else know of any other techniques or functions that have been developed? Quote Link to comment https://forums.phpfreaks.com/topic/70920-how-to-handle-poor-html-data-in-php/#findComment-357038 Share on other sites More sharing options...
jaymc Posted September 28, 2007 Share Posted September 28, 2007 Edited: ignore Quote Link to comment https://forums.phpfreaks.com/topic/70920-how-to-handle-poor-html-data-in-php/#findComment-357045 Share on other sites More sharing options...
dingus Posted September 28, 2007 Share Posted September 28, 2007 mind if i see a little of the code you are getting as a block give me a better idea what you are looking at? Quote Link to comment https://forums.phpfreaks.com/topic/70920-how-to-handle-poor-html-data-in-php/#findComment-357046 Share on other sites More sharing options...
jwhite68 Posted September 28, 2007 Author Share Posted September 28, 2007 Heres an example with the bullet + nbsp I mentioned: $desc3 = "<P align=left><FONT size=2><FONT color=#ff0000>ABC1 1 is a modern luxury complex, located just 50 m away from the beach strip in the heart of town<BR></FONT></FONT><FONT size=2>· 6-storey complex, 5 sections <BR>· solid brick-built structure <BR>· each residential section has a separate lobby, reception, and lift <BR>· flats from 40sq.m -190 sq.m <BR>· maisonettes from 190 sq.m - 315 sq.m <BR>· on-site parking facilities in a 2-level basement - garage sections, parking space <BR>· each residential section is thermo-insulated <BR>· stone-panelled common parts <BR>· modern lifts <BR>· flooring: terracota, laminate <BR>· 3-layer window and door frames from the USA<BR>· 24-hour security service <BR></P></FONT>"; Quote Link to comment https://forums.phpfreaks.com/topic/70920-how-to-handle-poor-html-data-in-php/#findComment-357050 Share on other sites More sharing options...
keeve Posted September 28, 2007 Share Posted September 28, 2007 i agree with recklessgeneral, you should use regular expressions to solve it. treat the tags as strings and parse into every occurrence of ' ', and replace it with <li>. try to look into preg_replace function. Quote Link to comment https://forums.phpfreaks.com/topic/70920-how-to-handle-poor-html-data-in-php/#findComment-357058 Share on other sites More sharing options...
jwhite68 Posted September 28, 2007 Author Share Posted September 28, 2007 I was able to resolve this specific issue with: $output = preg_replace("/·/", "-", $data); Which replaces the bullet symbol that utf-8 cannot display (it displays as ?) as a hyphen symbol instead. Does anyone know the code for a bullet point symbol that will display in utf-8? Quote Link to comment https://forums.phpfreaks.com/topic/70920-how-to-handle-poor-html-data-in-php/#findComment-357074 Share on other sites More sharing options...
jwhite68 Posted September 28, 2007 Author Share Posted September 28, 2007 Can anyone spot why this doesnt work: [code $output = preg_replace('/·/', '\x{2022}/u', $data); The bullet point symbol is supposed to be hex 2022, and I understood that the /u is needed to identify as unicode value. But this just displays the text as \x{2022}/u. What am I doing wrong? Quote Link to comment https://forums.phpfreaks.com/topic/70920-how-to-handle-poor-html-data-in-php/#findComment-357087 Share on other sites More sharing options...
jwhite68 Posted September 28, 2007 Author Share Posted September 28, 2007 I was able to get the desired result with: $output = preg_replace('/·/', "•", $data); Quote Link to comment https://forums.phpfreaks.com/topic/70920-how-to-handle-poor-html-data-in-php/#findComment-357092 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.