Jump to content

Remove carriage returns from scraped HTML


webhead2

Recommended Posts

Hello and thanks in advance.  This is becoming quite the pesky problem!

 

I am developing a script that scrapes some html from partner websites (I have no control over the formatting).  The content gets inserted to a Wordpress database :)

 

The problem I am having is when a carriage return is placed in the middle of a tag. Like so:

 

<body
style="color: rgb(0, 0, 0); background-color: rgb(79, 105, 59);"
alink="#000099" link="#000099" vlink="#990099">

 

When displayed in wordpress,  it actuall prints out

"style="color: rgb(0, 0, 0); background-color: rgb(79, 105, 59);"
alink="#000099" link="#000099" vlink="#990099">"

 

because of the carriage return. 

 

I've tried this code to remove them:

$body = str_replace(chr(13),' ',$body);
$body = str_replace("\r"," ",$body);
$body = str_replace("\n"," ",$body);

 

The above does not seem to work. 

 

Thanks for looking at my problem!

:chomp:

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.