Jump to content

Recommended Posts

Hi everyone,

 

I have a regex that is looking for URLs in a line of text (actually I have several but only one is giving me problems that I know of).  The idea is that I want to find strings of the form  something <dot> something <dot> (2|3 characters).  This will allow me to find things like:

 

a.b.cd

www.google.com

hello.world.tv

 

So here is the current line of code I am using:

 

preg_match('/.+\..+\.([a-zA-Z]{2}|[a-zA-Z]{3})/i', $msg)

 

 

The problem is that it is matching strings such as "forever...or" and I see why, I just don't know how to avoid it.  Is there any way that I can specify in my regex to match everything of that form EXCEPT a string that contains "..." in it?  Or do I simply need to rewrite the entire regular expression in a different way?

 

Thanks!

Link to comment
https://forums.phpfreaks.com/topic/177004-solved-excluding-values-from-a-regex/
Share on other sites

instead of using ".+", try specifying the valid domain characters explicitly, cause a .+ will actuall match a '.'

also, you don't need to use the pipe to match 2 or 3 chars, see below...

$msg = "skdaj www.google.com asd hello.world.tv jaksdjflajsk forever...or dfas as d.s.as askajjsl ke jhash kj";
preg_match_all('/[0-9a-zA-Z-]+\.[0-9a-zA-Z]+\.[a-zA-Z]{2,3}/i', $msg, $matches);
print_r($matches);

 

edit:

btw... i posted this in opera and it stripped certain characters, should be good now...

Thanks garethp -- your expression worked like a champ!  I know you're anchoring to the beginning of the expression but I'll have to really sit down and look it over at some point to truly understand what the heck it's doing.  ;)

Actually, I'm not. I'll show you

 

 

'~[^.]+\.[^.]+\.([a-zA-Z]{2,3})~i'

 

[^.] means any character that's not a dot (If it's the first character in a character class, it means whatever is NOT in this class)

+ means more than one

\. is dot

{2,3] is two OR three times.

 

So it all means

Match any character that's not a dot, any number of times. Then Dot. Then anything that's not a dot, any amount of times. Then dot. Then [a-zA-Z] two or three times

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.