Jump to content

php PCRE URL matching pattern


mrbean

Recommended Posts

Hi,

 

I have worked the whole day / all day to fix this but it still doesn't work :-[

 

I am trying to match only URL's

 

What I did try to do is use this pattern:

(((https|http):\/\/)|www\.|)[a-zA-Z1-9-]{0,9}(\.[a-zA-Z1-9-]{1,5}\.[a-zA-Z1-9-]{1,5}|\.[a-zA-Z1-9-]{1,5})

 

It must match these URL's:

google.com

www.google.com

http://google.com

https://google.com

http://www.google.com

https://www.google.com

google.co.uk

www.google.co.uk

http://google.co.uk

https://google.co.uk

http://www.google.co.uk

https://www.google.co.uk

 

But it doesn't work  :-[

 

Can someone please help me with this.

Thank you in advance for your support.

Link to comment
Share on other sites

Your RegExp is big and bad. Try my code

$str = 'www.google.com
http://google.com
https://google.com
http://www.google.com
https://www.google.com
google.co.uk
www.google.co.uk
http://google.co.uk
https://google.co.uk
http://www.google.co.uk
https://www.google.co.uk';

preg_match_all("#((?:https?://)?(?:www\.)?[-a-z\d]{1,9}\.[-a-z\d]{2,5}(?:\.[-a-z\d]{2,4})?)#is", $str, $match);
echo '<pre>'.htmlspecialchars(print_r($match, 1)).'</pre>';

Result is

Array
(
    [0] => Array
        (
            [0] => www.google.com
            [1] => http://google.com
            [2] => https://google.com
            [3] => http://www.google.com
            [4] => https://www.google.com
            [5] => google.co.uk
            [6] => www.google.co.uk
            [7] => http://google.co.uk
            [8] => https://google.co.uk
            [9] => http://www.google.co.uk
            [10] => https://www.google.co.uk
        )

    [1] => Array
        (
            [0] => www.google.com
            [1] => http://google.com
            [2] => https://google.com
            [3] => http://www.google.com
            [4] => https://www.google.com
            [5] => google.co.uk
            [6] => www.google.co.uk
            [7] => http://google.co.uk
            [8] => https://google.co.uk
            [9] => http://www.google.co.uk
            [10] => https://www.google.co.uk
        )

)

Link to comment
Share on other sites

www.goog isn't a complete url

www - is correct domain name

goog - goog, too, fits the pattern.

therefore believes it is right RegExp.

If you you want correct url get, you must to enumerate a list of domains

Try it

$str = '
www.google.com
http://google.com
https://google.com
http://www.google.com
https://www.google.com
google.co.uk
www.google.co.uk
http://google.co.uk
https://google.co.uk
http://www.google.co.uk
https://www.google.co.uk
www.goo
go.ru
google.lol
';
preg_match_all("#(?:https?://)?(?:www\.)?[-a-z\d]{2,9}\.(?(1)[-a-z\d]{2,5}|(?:co|com|uk|us|ru|org|net))(\.[-a-z\d]{2,4})?#is", $str, $match);
echo '<pre>'.(print_r($match, 1)).'</pre>';

Link to comment
Share on other sites

  • 2 weeks later...

Hi MrBean,

 

I made a simple expression to match all your urls but not www.goog

 

(?i)\b(?:http[s]?://)?(?(?=www.)www.)(?:[-a-z\d]+\.)+[a-z]{2,4}

 

There are a million ways to match urls, so depending on your needs, you may want to tweak it.

Is this what you were looking for?

Let me know if I can help further. :)

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.