Jump to content

Regex (or preg_replace?) to close img tags


johnny_up

Recommended Posts

Hi,

I have a bunch of flat html files that I need to import to our current site using an import html tool.  However, the flat html files have img tags that aren't properly closed, and so the utility keeps choking.

 

Anyone know how to turn:

<img blah blah blah>

into <img blah blah blah />

 

using regex or preg_replace?

 

I'm sure there is a solution out there, but I don't know what it is...

 

Any help at all would be soo appreciated.

 

Thanks.

Link to comment
Share on other sites

AlexWD your pattern won't work.  Your * is greedy and therefore will keep gobbling up everything until it reaches the last > it can find before a new line.  At the very least you need to make it non-greedy by adding a ? after the *, but the better thing to do would be to use a negative character class. Also, I threw an 'i' modifier in there for a case-insensitive match on 'img' just in case.

 

$text = preg_replace('/<img([^>]*)>/i' , "<img $1 />", $text);

Link to comment
Share on other sites

AlexWD your pattern won't work.  Your * is greedy and therefore will keep gobbling up everything until it reaches the last > it can find before a new line.  At the very least you need to make it non-greedy by adding a ? after the *, but the better thing to do would be to use a negative character class. Also, I threw an 'i' modifier in there for a case-insensitive match on 'img' just in case.

 

$text = preg_replace('/<img([^>]*)>/i' , "<img $1 />", $text);

I tested my example, and it worked for the limited examples I fed into it. But I'm a Regex amateur.

Link to comment
Share on other sites

AlexWD, if you want a more elaborate explanation of what CV is talking about (with regards to greediness), you can have a look at this thread (post #11 and #14 sheds more light on these issues). So yeah, in general, .* and .+ are not the best things to use (they have their place, but should only be used in the right circumstances).

Link to comment
Share on other sites

AlexWD, if you want a more elaborate explanation of what CV is talking about (with regards to greediness), you can have a look at this thread (post #11 and #14 sheds more light on these issues). So yeah, in general, .* and .+ are not the best things to use (they have their place, but should only be used in the right circumstances).

 

:P Thanks, I thought about that a little later that I was foolish because I didn't test it with more than one '>' in the text. And figured out why I was wrong, just too lazy to edit my post. Thanks for the help; I'm trying to get better at regex.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.