johnny_up Posted June 23, 2009 Share Posted June 23, 2009 Hi, I have a bunch of flat html files that I need to import to our current site using an import html tool. However, the flat html files have img tags that aren't properly closed, and so the utility keeps choking. Anyone know how to turn: <img blah blah blah> into <img blah blah blah /> using regex or preg_replace? I'm sure there is a solution out there, but I don't know what it is... Any help at all would be soo appreciated. Thanks. Quote Link to comment Share on other sites More sharing options...
Alex Posted June 23, 2009 Share Posted June 23, 2009 $text = preg_replace('/<img(.*)>/' , "<img $1 />", $text); Quote Link to comment Share on other sites More sharing options...
.josh Posted June 23, 2009 Share Posted June 23, 2009 AlexWD your pattern won't work. Your * is greedy and therefore will keep gobbling up everything until it reaches the last > it can find before a new line. At the very least you need to make it non-greedy by adding a ? after the *, but the better thing to do would be to use a negative character class. Also, I threw an 'i' modifier in there for a case-insensitive match on 'img' just in case. $text = preg_replace('/<img([^>]*)>/i' , "<img $1 />", $text); Quote Link to comment Share on other sites More sharing options...
Alex Posted June 23, 2009 Share Posted June 23, 2009 AlexWD your pattern won't work. Your * is greedy and therefore will keep gobbling up everything until it reaches the last > it can find before a new line. At the very least you need to make it non-greedy by adding a ? after the *, but the better thing to do would be to use a negative character class. Also, I threw an 'i' modifier in there for a case-insensitive match on 'img' just in case. $text = preg_replace('/<img([^>]*)>/i' , "<img $1 />", $text); I tested my example, and it worked for the limited examples I fed into it. But I'm a Regex amateur. Quote Link to comment Share on other sites More sharing options...
nrg_alpha Posted June 25, 2009 Share Posted June 25, 2009 AlexWD, if you want a more elaborate explanation of what CV is talking about (with regards to greediness), you can have a look at this thread (post #11 and #14 sheds more light on these issues). So yeah, in general, .* and .+ are not the best things to use (they have their place, but should only be used in the right circumstances). Quote Link to comment Share on other sites More sharing options...
Alex Posted June 25, 2009 Share Posted June 25, 2009 AlexWD, if you want a more elaborate explanation of what CV is talking about (with regards to greediness), you can have a look at this thread (post #11 and #14 sheds more light on these issues). So yeah, in general, .* and .+ are not the best things to use (they have their place, but should only be used in the right circumstances). Thanks, I thought about that a little later that I was foolish because I didn't test it with more than one '>' in the text. And figured out why I was wrong, just too lazy to edit my post. Thanks for the help; I'm trying to get better at regex. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.