Jump to content

Archived

This topic is now archived and is closed to further replies.

johnnyk

Regex

Recommended Posts

/<p>((A |An |The )?)<b>Theodore Roosevelt<\/b>.[^\.]*(\.|\?|!)/i

I'm newish at regex, so I can't figure this out out. The above regex basically matches up until it finds punctuation. For example, the following would be a match:
<p><b>Theodore Roosevent</b> text text text.

BUT, I want it to ignore certain punctiation, for example the period after Jr., so that it would be able to match

<p><b>Theodore Roosevent</b> (Theodore Roosevelt Jr.) is a really cool guy.

I would want it to ignore the . after Jr and go all the way up to the end of the sentence. I tried a bunch of things out but couldn't figure it out. Any help would be mucho appreciated.

Share this post


Link to post
Share on other sites
That's too much logic to be automatic. How is it supposed to discern between an abbreviation and the end of a sentence? Regexes are powerful but they can't read English. You could add in some things to ignore, like "Jr.", but your regex would have to keep growing as you add more.

example:
/<p>((A |An |The )?)<b>Theodore Roosevelt<\/b>([^\.]|Jr\.)*(\.|\?|!)/i

Share this post


Link to post
Share on other sites
Edit: Doesn't work. It looked like it would, don't know why it doesnt.

Edit dos: I got it to work with
$pattern = "/<p>((A |An |The )?)<b>$word<\/b> ((.[^\.]*)?(Jr\.)?(.[^\.]*))(\.|\?|!)/i";
but did I do more than I had to?

Edit 3: The above works, but also continues on to find second period even if Jr. isn't found.

Share this post


Link to post
Share on other sites

×

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.