How to match domain with this type ?

ankur0101 · February 5, 2012

Hi,

I am making a whois script, where textbox name is domain

$domain = $_POST['domain'];

A user should submit domain such as abc123-abc.tld

abc123-abc can consist of small alpha, numbers and '-'

tld can consist only of small alpha

So I want to write something like

if (resular ex condition as I asked above)

{

Success

}

else

{

Invalid domain name

}

Need help

joe92 · February 5, 2012

Will domains be suffixed with the http:// and www.? Will they be submitted with trailing GET information such as '.tdl/?index.php'?

Both those questions will change the regex. However, from what you have said so far, the following should suffice:

if(preg_match("/^[a-z0-9-]+\.[a-z]{2,10}([a-z]{2,5})?$/", $domain))

Hope that helps you,

Joe

Edit:: Noticed you said small alpha's so removed case insensitivity

ankur0101 · February 5, 2012

Thank you so much, problem solved.

ragax · February 5, 2012

Not sure, but it seems to me there might be a slight bug.

As Joe says,

It's always a damned typo!

As it is, the regex matches

a.zzzzzzzzzzzzzzz

That is because as they are, the last two character classes

[a-z]{2,10}([a-z]{2,5})?

do not really make sense unless something is missing: apart from the capture (which ankur doesn't seem to care about), the regex above is equivalent to a simple

[a-z]{2,15}

Joe, did you mean to say:

if(preg_match("/^[a-z0-9-]+\.[a-z]{2,5}$/", $domain))

Or am I missing something? Just waking up, so that's entirely possible.

Wishing you all a fun Sunday.

joe92 · February 6, 2012

Oops! It's always a damned typo! Haha.

That isn't supposed to be used as a capturing parenthesis. It was a questionable string and it was meant to start with a dot. I personally don't see the point in telling the regex engine not to remember a parenthesis when we're talking about a maximum of 5-10 bytes. That should say:

[a-z]{2,10}(\.[a-z]{2,5})?

That is to cover tdl's such as a .co.uk address or a .gov.uk address. Also, I was recently reading that personal tdl strings are very soon going to be launched onto the world wide web. In fact, I think the 'reveal' date of all the new one's is going to be 1st May. Tdl's such as .museum and .aero have already been introduced, although are reserved in this case for museum's and aerospace firms. Let's not forget the .name which allows individuals (I'm guessing just the fabulously wealthy/famous ones) to have their name in a tdl. Once they start becoming more common you could have tdl's such as .hammersmith or .frankenstein. Got to accommodate for the future now.

Anyway, good spot playful! Here is the complete code I suggest you use ankur. It will allow some wrong ones through (e.g. .zzzzzz.zz), but unless you put a very big OR statement at the end to capture every type of legal tdl ((com?|co\.uk|info|..etc.)) you won't be able to get around it I'm afraid.

if(preg_match("/^[a-z0-9-]+\.[a-z]{2,10}(\.[a-z]{2,5})?$/", $domain))

Here's the Root Zone Database if you do wish to do that though

Joe

ragax · February 6, 2012

Hi Joe,

That should say:
[a-z]{2,10}(\.[a-z]{2,5})?

But that matches aa

(No tld needed.)

So my question still stands: did you mean to say

if(preg_match("/^[a-z0-9-]+\.[a-z]{2,5}$/", $domain))

Thank you for your interesting information about the future of tlds, and the link!

Wishing you a fun day.

joe92 · February 6, 2012

Hi playful,

That should say:
[a-z]{2,10}(\.[a-z]{2,5})?

But that matches aa

(No tld needed.)

Yes, that alone does, but not when included in the entire regex.

So my question still stands: did you mean to say
if(preg_match("/^[a-z0-9-]+\.[a-z]{2,5}$/", $domain))

No, I meant what I posted. I'll explain. The OP states:

A user should submit domain such as abc123-abc.tld

That means the the domain can consist of three parts:

abc123-abd

.

tld

Each part is:

abc123-abd

-

[a-z0-9-]+

[/td]

[td].

-

\.

tdl

-

[a-z]{2,10}(\.[a-z]{2,5})?

The reason the tdl contains so much is because of how many types of tdl's there are. If we limit the tdl to just [a-z]{2,5} as you are suggesting then we are ruling out co.uk addresses etc., they will fail on the extra dot. However, saying that, the .uk is an optional part of the tdl as it .com's and .net's etc don't include them. That is why I encase it in parenthesis followed by a question mark. If it's there, include it, else if the regex fails here it doesn't matter as it's not imperative to the entire match.

Hence I stand by my original regex. It will allow for every type of tdl available and must match characters and a dot before it:

if(preg_match("/^[a-z0-9-]+\.[a-z]{2,10}(\.[a-z]{2,5})?$/", $domain))

Hope I explained myself properly,

Joe

ragax · February 6, 2012

Hi Joe,

Perfectly clear.

I thought that when you wrote

That should say:

[a-z]{2,10}(\.[a-z]{2,5})?

in your second message, you meant that you intended that to be the entire regex. This surprised me as it matches aa.

I missed the bottom of that message, where your corrected expression lived.

My bad.

Hence I stand by my original regex.

Your second version of your original regex.

For the record, then, this regex matches

a.aaaaaaaaaa

but not

a.aa.aaaaaaaaaa

In other words, you can have a ten-letter tld, but only if it is not preceded by a sub-tld. This makes me wonder if you wouldn't prefer the middle part to be the optional one, rather than the last part. Not a big deal, though, and I think I can hear your answer from here ("I stand by my regex" ?) I don't mean to enter a nitpicking contest. For the record, the history of the convo is that at first I thought I saw a bug, and there was one. Then I thought I saw a second bug, and I saw wrong. At this stage if you are happy with the feature above, no probs, just thought I'd bring it up. There are a million ways to match a url, all with their personalities. Nothing wrong with that, it's just nice to "date" a little and get to know a personality before getting married.

Wishing you a fun day

joe92 · February 6, 2012

Hey playful,

From what I've seen of tdl's, the optional extension on the end is usually the shorter of the two tdl components. That's why I made it the shorter one. It's neither here nor there though. Once you start trying to match url's in a block of text you can't even end at the tdl, for the url might be pointing to a file 3 sub directories down the tree. This is all specific to the OP's needs.

And true, I stand by the second version of my original regex ::) haha

Joe

ankur0101 · February 7, 2012

I would like to ask one more question.

I have an array such as >>

$whoisservers = array(
    "ac" =>"whois.nic.ac",
    "ae" =>"whois.nic.ae",
    "aero"=>"whois.aero",
    "af" =>"whois.nic.af",
    "ag" =>"whois.nic.ag",
    "al" =>"whois.ripe.net",
    "am" =>"whois.amnic.net",
    "arpa" =>"whois.iana.org",
    "as" =>"whois.nic.as",
    "asia" =>"whois.nic.asia",
    "at" =>"whois.nic.at",
    "au" =>"whois.aunic.net",
    "az" =>"whois.ripe.net",
    "ba" =>"whois.ripe.net",
    "be" =>"whois.dns.be",
    "bg" =>"whois.register.bg",
    "bi" =>"whois.nic.bi",
    "biz" =>"whois.biz",
    "bj" =>"whois.nic.bj",
    "br" =>"whois.registro.br",
    "bt" =>"whois.netnames.net",
    "by" =>"whois.ripe.net",
    "bz" =>"whois.belizenic.bz",
    "ca" =>"whois.cira.ca",
    "cat" =>"whois.cat",
    "cc" =>"whois.nic.cc",
    "cd" =>"whois.nic.cd",
    "ch" =>"whois.nic.ch",
    "ci" =>"whois.nic.ci",
    "ck" =>"whois.nic.ck",
    "cl" =>"whois.nic.cl",
    "cn" =>"whois.cnnic.net.cn",
    "com" =>"whois.verisign-grs.com",
    "coop" =>"whois.nic.coop",
    "cx" =>"whois.nic.cx",
    "cy" =>"whois.ripe.net",
    "cz" =>"whois.nic.cz",
    "de" =>"whois.denic.de",
    "dk" =>"whois.dk-hostmaster.dk",
    "dm" =>"whois.nic.cx",
    "dz" =>"whois.ripe.net",
    "edu" =>"whois.educause.edu",
    "ee" =>"whois.eenet.ee",
    "eg" =>"whois.ripe.net",
    "es" =>"whois.ripe.net",
    "eu" =>"whois.eu",
    "fi" =>"whois.ficora.fi",
    "fo" =>"whois.ripe.net",
    "fr" =>"whois.nic.fr",
    "gb" =>"whois.ripe.net",
    "gd" =>"whois.adamsnames.com",
    "ge" =>"whois.ripe.net",
    "gg" =>"whois.channelisles.net",
    "gi" =>"whois2.afilias-grs.net",
    "gl" =>"whois.ripe.net",
    "gm" =>"whois.ripe.net",
    "gov" =>"whois.nic.gov",
    "gr" =>"whois.ripe.net",
    "gs" =>"whois.nic.gs",
    "gw" =>"whois.nic.gw",
    "gy" =>"whois.registry.gy",
    "hk" =>"whois.hkirc.hk",
    "hm" =>"whois.registry.hm",
    "hn" =>"whois2.afilias-grs.net",
    "hr" =>"whois.ripe.net",
    "hu" =>"whois.nic.hu",
    "ie" =>"whois.domainregistry.ie",
    "il" =>"whois.isoc.org.il",
    "in" =>"whois.inregistry.net",
    "info" =>"whois.afilias.net",
    "int" =>"whois.iana.org",
    "io" =>"whois.nic.io",
    "iq" =>"vrx.net",
    "ir" =>"whois.nic.ir",
    "is" =>"whois.isnic.is",
    "it" =>"whois.nic.it",
    "je" =>"whois.channelisles.net",
    "jobs" =>"jobswhois.verisign-grs.com",
    "jp" =>"whois.jprs.jp",
    "ke" =>"whois.kenic.or.ke",
    "kg" =>"www.domain.kg",
    "ki" =>"whois.nic.ki",
    "kr" =>"whois.nic.or.kr",
    "kz" =>"whois.nic.kz",
    "la" =>"whois.nic.la",
    "li" =>"whois.nic.li",
    "lt" =>"whois.domreg.lt",
    "lu" =>"whois.dns.lu",
    "lv" =>"whois.nic.lv",
    "ly" =>"whois.nic.ly",
    "ma" =>"whois.iam.net.ma",
    "mc" =>"whois.ripe.net",
    "md" =>"whois.ripe.net",
    "me" =>"whois.meregistry.net",
    "mg" =>"whois.nic.mg",
    "mil" =>"whois.nic.mil",
    "mn" =>"whois.nic.mn",
    "mobi" =>"whois.dotmobiregistry.net",
    "ms" =>"whois.adamsnames.tc",
    "mt" =>"whois.ripe.net",
    "mu" =>"whois.nic.mu",
    "museum" =>"whois.museum",
    "mx" =>"whois.nic.mx",
    "my" =>"whois.mynic.net.my",
    "na" =>"whois.na-nic.com.na",
    "name" =>"whois.nic.name",
    "net" =>"whois.verisign-grs.net",
    "nf" =>"whois.nic.nf",
    "nl" =>"whois.domain-registry.nl",
    "no" =>"whois.norid.no",
    "nu" =>"whois.nic.nu",
    "nz" =>"whois.srs.net.nz",
    "org" =>"whois.pir.org",
    "pl" =>"whois.dns.pl",
    "pm" =>"whois.nic.pm",
    "pr" =>"whois.uprr.pr",
    "pro" =>"whois.registrypro.pro",
    "pt" =>"whois.dns.pt",
    "re" =>"whois.nic.re",
    "ro" =>"whois.rotld.ro",
    "ru" =>"whois.ripn.net",
    "sa" =>"whois.nic.net.sa",
    "sb" =>"whois.nic.net.sb",
    "sc" =>"whois2.afilias-grs.net",
    "se" =>"whois.iis.se",
    "sg" =>"whois.nic.net.sg",
    "sh" =>"whois.nic.sh",
    "si" =>"whois.arnes.si",
    "sk" =>"whois.ripe.net",
    "sm" =>"whois.ripe.net",
    "st" =>"whois.nic.st",
    "su" =>"whois.ripn.net",
    "tc" =>"whois.adamsnames.tc",
    "tel" =>"whois.nic.tel",
    "tf" =>"whois.nic.tf",
    "th" =>"whois.thnic.net",
    "tj" =>"whois.nic.tj",
    "tk" =>"whois.dot.tk",
    "tl" =>"whois.nic.tl",
    "tm" =>"whois.nic.tm",
    "tn" =>"whois.ripe.net",
    "to" =>"whois.tonic.to",
    "tp" =>"whois.nic.tl",
    "tr" =>"whois.nic.tr",
    "travel" =>"whois.nic.travel",
    "tv" => "tvwhois.verisign-grs.com",
    "tw" =>"whois.twnic.net.tw",
    "ua" =>"whois.net.ua",
    "ug" =>"whois.co.ug",
    "uk" =>"whois.nic.uk",
    "us" =>"whois.nic.us",
    "uy" =>"nic.uy",
    "uz" =>"whois.cctld.uz",
    "va" =>"whois.ripe.net",
    "vc" =>"whois2.afilias-grs.net",
    "ve" =>"whois.nic.ve",
    "vg" =>"whois.adamsnames.tc",
    "wf" =>"whois.nic.wf",
    "ws" =>"whois.website.ws",
    "yt" =>"whois.nic.yt",
    "yu" =>"whois.ripe.net");

The TLD should match to given tlds in array.

If it wont match, it will go back to index.php with header()

I am confused , how to do this ?

ankur0101 · February 23, 2012

For the record, then, this regex matches

a.aaaaaaaaaa

but not

a.aa.aaaaaaaaaa

In other words, you can have a ten-letter tld, but only if it is not preceded by a sub-tld.

Yes, I forgot that point.

What to do for domains such as >>

something.co.in

something.com.mx

Thanks

ankur0101 · February 23, 2012

Hi,

I am using following syntax >>

/^[a-z0-9][a-z0-9\-]+[a-z0-9](\.[a-z]{2,4})+$/i

Is that right ?

Sign In

How to match domain with this type ?

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Archived

Important Information