jwhite68 Posted July 15, 2007 Share Posted July 15, 2007 I sometimes get the error - Parse error: syntax error, unexpected T_CONSTANT_ENCAPSED_STRING . when certain HTML strings are used. e.g. $html_string='<P class=MsoBodyText style="MARGIN: 0in 0in 0pt"><FONT size=3><FONT face=Arial><SPAN lang=EN-US>The house is a bungalow with 4 rooms: 3bedrooms,</SPAN><SPAN lang=EN-US style="mso-ansi-language: BG"> </SPAN><SPAN lang=EN-US>kitchen with oven</SPAN><SPAN lang=BG style="mso-ansi-language: BG">,</SPAN><SPAN lang=EN-US> extractor and hob</SPAN><SPAN lang=BG style="mso-ansi-language: BG">,</SPAN><SPAN lang=BG> </SPAN><SPAN lang=EN-US>3 bathrooms. It has a terrace and fully fenced wall. 2 air-conditioners are also available.</SPAN></FONT></FONT></P>\r\n<P class=MsoBodyText style="MARGIN: 0in 0in 0pt"><FONT size=3><FONT face=Arial><SPAN lang=EN-US>T</SPAN></FONT></FONT><FONT size=3><FONT face=Arial><SPAN lang=EN-US><SPAN lang=EN-US style="FONT-SIZE: 12pt; FONT-FAMILY: ''Times New Roman''; mso-ansi-language: EN-US; mso-fareast-font-family: ''Times New Roman''; mso-fareast-language: EN-US; mso-bidi-language: AR-SA"><SPAN lang=EN-US style="FONT-SIZE: 11pt; FONT-FAMILY: Arial; mso-bidi-font-size: 12.0pt; mso-ansi-language: EN-US; mso-fareast-font-family: ''Times New Roman''; mso-fareast-language: EN-US; mso-bidi-language: AR-SA">he size of the living area is 155m2</SPAN></SPAN></SPAN></FONT></FONT></P>\r\n<P class=MsoBodyText style="MARGIN: 0in 0in 0pt"><FONT size=3><FONT face=Arial><SPAN lang=EN-US><SPAN lang=EN-US style="FONT-SIZE: 12pt; FONT-FAMILY: ''Times New Roman''; mso-ansi-language: EN-US; mso-fareast-font-family: ''Times New Roman''; mso-fareast-language: EN-US; mso-bidi-language: AR-SA"><SPAN lang=EN-US style="FONT-SIZE: 11pt; FONT-FAMILY: Arial; mso-bidi-font-size: 12.0pt; mso-ansi-language: EN-US; mso-fareast-font-family: ''Times New Roman''; mso-fareast-language: EN-US; mso-bidi-language: AR-SA"></SPAN></SPAN></SPAN></FONT></FONT><SPAN lang=EN-US><FONT face=Arial size=3>The size of the plot is 1,100 sq.m</FONT></SPAN></P>\r\n<P class=MsoBodyText style="MARGIN: 0in 0in 0pt"><SPAN lang=EN-US></SPAN><SPAN lang=EN-US style="FONT-SIZE: 11pt; FONT-FAMILY: Arial; mso-bidi-font-size: 12.0pt; mso-ansi-language: EN-US; mso-fareast-font-family: ''Times New Roman''; mso-fareast-language: EN-US; mso-bidi-language: AR-SA">The price is <B>112,750 EUR / 77,759 GBP</B></SPAN> </P>'; $string=strip_tags($html_string); echo $string; Does anyone know whats causing this error. I will be reading html text from a file, and simply want to apply strip_tags to remove the html code. But it seems I cannot do this reliably. The same code works fine for a smaller html_string. Quote Link to comment Share on other sites More sharing options...
pocobueno1388 Posted July 15, 2007 Share Posted July 15, 2007 <?php $html_string="<P class=MsoBodyText style='MARGIN: 0in 0in 0pt'><FONT size=3><FONT face=Arial><SPAN lang=EN-US>The house is a bungalow with 4 rooms: 3bedrooms,</SPAN><SPAN lang=EN-US style='mso-ansi-language: BG'> </SPAN><SPAN lang=EN-US>kitchen with oven</SPAN><SPAN lang=BG style='mso-ansi-language: BG'>,</SPAN><SPAN lang=EN-US> extractor and hob</SPAN><SPAN lang=BG style='mso-ansi-language: BG'>,</SPAN><SPAN lang=BG> </SPAN><SPAN lang=EN-US>3 bathrooms. It has a terrace and fully fenced wall. 2 air-conditioners are also available.</SPAN></FONT></FONT></P>\r\n<P class=MsoBodyText style='MARGIN: 0in 0in 0pt'><FONT size=3><FONT face=Arial><SPAN lang=EN-US>T</SPAN></FONT></FONT><FONT size=3><FONT face=Arial><SPAN lang=EN-US><SPAN lang=EN-US style='FONT-SIZE: 12pt; FONT-FAMILY: ''Times New Roman''; mso-ansi-language: EN-US; mso-fareast-font-family: ''Times New Roman''; mso-fareast-language: EN-US; mso-bidi-language: AR-SA'><SPAN lang=EN-US style='FONT-SIZE: 11pt; FONT-FAMILY: Arial; mso-bidi-font-size: 12.0pt; mso-ansi-language: EN-US; mso-fareast-font-family: ''Times New Roman''; mso-fareast-language: EN-US; mso-bidi-language: AR-SA'>he size of the living area is 155m2</SPAN></SPAN></SPAN></FONT></FONT></P>\r\n<P class=MsoBodyText style='MARGIN: 0in 0in 0pt'><FONT size=3><FONT face=Arial><SPAN lang=EN-US><SPAN lang=EN-US style='FONT-SIZE: 12pt; FONT-FAMILY: ''Times New Roman''; mso-ansi-language: EN-US; mso-fareast-font-family: ''Times New Roman''; mso-fareast-language: EN-US; mso-bidi-language: AR-SA'><SPAN lang=EN-US style='FONT-SIZE: 11pt; FONT-FAMILY: Arial; mso-bidi-font-size: 12.0pt; mso-ansi-language: EN-US; mso-fareast-font-family: ''Times New Roman''; mso-fareast-language: EN-US; mso-bidi-language: AR-SA'></SPAN></SPAN></SPAN></FONT></FONT><SPAN lang=EN-US><FONT face=Arial size=3>The size of the plot is 1,100 sq.m</FONT></SPAN></P>\r\n<P class=MsoBodyText style='MARGIN: 0in 0in 0pt'><SPAN lang=EN-US></SPAN><SPAN lang=EN-US style='FONT-SIZE: 11pt; FONT-FAMILY: Arial; mso-bidi-font-size: 12.0pt; mso-ansi-language: EN-US; mso-fareast-font-family: ''Times New Roman''; mso-fareast-language: EN-US; mso-bidi-language: AR-SA'>The price is <B>112,750 EUR / 77,759 GBP</B></SPAN> </P>"; $string=strip_tags($html_string); echo $string; ?> Only use one type of quote, either single or double, throughout your entire string, unless you are escaping them. Quote Link to comment Share on other sites More sharing options...
jwhite68 Posted July 15, 2007 Author Share Posted July 15, 2007 The problem is that I dont have control over the HTML strings. They are set by an independent content management system. Is there some other way I can parse the HTML strings first to avoid this? Quote Link to comment Share on other sites More sharing options...
jwhite68 Posted July 16, 2007 Author Share Posted July 16, 2007 Does anyone else have any ideas? Quote Link to comment Share on other sites More sharing options...
ToonMariner Posted July 16, 2007 Share Posted July 16, 2007 this bit - FONT-FAMILY: ''Times New Roman'' note you have TWO single quotes! either escape the one set of quotes like so FONT-FAMILY: \'Times New Roman\' OR use doube quotes for that like so FONT-FAMILY: "Times New Roman" You could help your self and remove all those pointless font tags and style it properly! Quote Link to comment Share on other sites More sharing options...
jwhite68 Posted July 16, 2007 Author Share Posted July 16, 2007 Do you know of a way I can recognise if there are 2 single quotes side by side, and automatically replace with the \' escape idea? Quote Link to comment Share on other sites More sharing options...
ToonMariner Posted July 16, 2007 Share Posted July 16, 2007 you need to get rid of single quotes and replace with double.. str_replace('\'\'','"',$str); and get rid of duplicates. str_replace('""','"',$str); Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.