cyberfox Posted June 6, 2011 Share Posted June 6, 2011 Hello, foulks I have ~500 files with ~3000 <img> tags and 1/3 of them doesn't contain alt attribute. I'm in a need of regexp that could find those tags. I'm searching all <img> tags with <[a-zA-Z]+(\s+[a-zA-Z]+\s*=\s*("([^"]*)"|'([^']*)'))*\s*/>. Could someone modify it so it would skip those that have alt tags? Thanks a billion:) Quote Link to comment Share on other sites More sharing options...
xyph Posted June 6, 2011 Share Posted June 6, 2011 This is a SLOW RegEx as it uses lookaround (negative lookahead), and I would only use it to root out problems and not in a script that gets called all the time. I'm assuming proper HTML formatting - all attributes are in double quotes, and there's no space between the attribute, operator and content. <img src="blah" alt="blah" width="x" height="x"> WILL BE IGNORED <img src = "blah" alt = "something"> WILL NOT BE IGNORED Here's the regex (<img(?:[^>](?!alt=))*+>) And in English (<img(?:[^>](?!alt=))*+>) Match the regular expression below and capture its match into backreference number 1 «(<img(?:[^>](?!alt=))*+>)» Match the characters “<img” literally «<img» Match the regular expression below «(?:[^>](?!alt=))*+» Between zero and unlimited times, as many times as possible, without giving back (possessive) «*+» Match any character that is NOT a “>” «[^>]» Assert that it is impossible to match the regex below starting at this position (negative lookahead) «(?!alt=)» Match the characters “alt=” literally «alt=» Match the character “>” literally «>» Let me know if there's anything you don't understand Quote Link to comment Share on other sites More sharing options...
cyberfox Posted June 7, 2011 Author Share Posted June 7, 2011 thank you v.m. I will be using this on dev side only Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.