Jump to content

php PCRE URL matching pattern


mrbean

Recommended Posts

Hi,

 

I have worked the whole day / all day to fix this but it still doesn't work :-[

 

I am trying to match only URL's

 

What I did try to do is use this pattern:

(((https|http):\/\/)|www\.|)[a-zA-Z1-9-]{0,9}(\.[a-zA-Z1-9-]{1,5}\.[a-zA-Z1-9-]{1,5}|\.[a-zA-Z1-9-]{1,5})

 

It must match these URL's:

google.com

www.google.com

http://google.com

https://google.com

http://www.google.com

https://www.google.com

google.co.uk

www.google.co.uk

http://google.co.uk

https://google.co.uk

http://www.google.co.uk

https://www.google.co.uk

 

But it doesn't work  :-[

 

Can someone please help me with this.

Thank you in advance for your support.

Link to comment
https://forums.phpfreaks.com/topic/252933-php-pcre-url-matching-pattern/
Share on other sites

Your RegExp is big and bad. Try my code

$str = 'www.google.com
http://google.com
https://google.com
http://www.google.com
https://www.google.com
google.co.uk
www.google.co.uk
http://google.co.uk
https://google.co.uk
http://www.google.co.uk
https://www.google.co.uk';

preg_match_all("#((?:https?://)?(?:www\.)?[-a-z\d]{1,9}\.[-a-z\d]{2,5}(?:\.[-a-z\d]{2,4})?)#is", $str, $match);
echo '<pre>'.htmlspecialchars(print_r($match, 1)).'</pre>';

Result is

Array
(
    [0] => Array
        (
            [0] => www.google.com
            [1] => http://google.com
            [2] => https://google.com
            [3] => http://www.google.com
            [4] => https://www.google.com
            [5] => google.co.uk
            [6] => www.google.co.uk
            [7] => http://google.co.uk
            [8] => https://google.co.uk
            [9] => http://www.google.co.uk
            [10] => https://www.google.co.uk
        )

    [1] => Array
        (
            [0] => www.google.com
            [1] => http://google.com
            [2] => https://google.com
            [3] => http://www.google.com
            [4] => https://www.google.com
            [5] => google.co.uk
            [6] => www.google.co.uk
            [7] => http://google.co.uk
            [8] => https://google.co.uk
            [9] => http://www.google.co.uk
            [10] => https://www.google.co.uk
        )

)

www.goog isn't a complete url

www - is correct domain name

goog - goog, too, fits the pattern.

therefore believes it is right RegExp.

If you you want correct url get, you must to enumerate a list of domains

Try it

$str = '
www.google.com
http://google.com
https://google.com
http://www.google.com
https://www.google.com
google.co.uk
www.google.co.uk
http://google.co.uk
https://google.co.uk
http://www.google.co.uk
https://www.google.co.uk
www.goo
go.ru
google.lol
';
preg_match_all("#(?:https?://)?(?:www\.)?[-a-z\d]{2,9}\.(?(1)[-a-z\d]{2,5}|(?:co|com|uk|us|ru|org|net))(\.[-a-z\d]{2,4})?#is", $str, $match);
echo '<pre>'.(print_r($match, 1)).'</pre>';

  • 2 weeks later...

Hi MrBean,

 

I made a simple expression to match all your urls but not www.goog

 

(?i)\b(?:http[s]?://)?(?(?=www.)www.)(?:[-a-z\d]+\.)+[a-z]{2,4}

 

There are a million ways to match urls, so depending on your needs, you may want to tweak it.

Is this what you were looking for?

Let me know if I can help further. :)

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.