Jump to content

How to match domain with this type ?


ankur0101

Recommended Posts

Hi,

I am making a whois script, where textbox name is domain

 

$domain = $_POST['domain'];

 

A user should submit domain such as abc123-abc.tld

 

abc123-abc can consist of small alpha, numbers and '-'

 

tld can consist only of small alpha

 

So I want to write something like

 

 

if (resular ex condition as I asked above)

{

Success

}

else

{

Invalid domain name

}

 

Need help

 

Link to comment
Share on other sites

Will domains be suffixed with the http:// and www.? Will they be submitted with trailing GET information such as '.tdl/?index.php'?

 

Both those questions will change the regex. However, from what you have said so far, the following should suffice:

 

if(preg_match("/^[a-z0-9-]+\.[a-z]{2,10}([a-z]{2,5})?$/", $domain))

 

Hope that helps you,

Joe

 

 

Edit:: Noticed you said small alpha's so removed case insensitivity

Link to comment
Share on other sites

Not sure, but it seems to me there might be a slight bug.

As Joe says,

It's always a damned typo!

 

As it is, the regex matches

a.zzzzzzzzzzzzzzz

 

That is because as they are, the last two character classes

[a-z]{2,10}([a-z]{2,5})?

do not really make sense unless something is missing: apart from the capture (which ankur doesn't seem to care about), the regex above is equivalent to a simple

[a-z]{2,15}

 

Joe, did you mean to say:

if(preg_match("/^[a-z0-9-]+\.[a-z]{2,5}$/", $domain))

 

Or am I missing something? Just waking up, so that's entirely possible. ;)

 

Wishing you all a fun Sunday.

Link to comment
Share on other sites

Oops! It's always a damned typo! Haha.

 

That isn't supposed to be used as a capturing parenthesis. It was a questionable string and it was meant to start with a dot. I personally don't see the point in telling the regex engine not to remember a parenthesis when we're talking about a maximum of 5-10 bytes. That should say:

[a-z]{2,10}(\.[a-z]{2,5})?

 

That is to cover tdl's such as a .co.uk address or a .gov.uk address. Also, I was recently reading that personal tdl strings are very soon going to be launched onto the world wide web. In fact, I think the 'reveal' date of all the new one's is going to be 1st May. Tdl's such as .museum and .aero have already been introduced, although are reserved in this case for museum's and aerospace firms. Let's not forget the .name which allows individuals (I'm guessing just the fabulously wealthy/famous ones) to have their name in a tdl. Once they start becoming more common you could have tdl's such as .hammersmith or .frankenstein. Got to accommodate for the future now.

 

Anyway, good spot playful! Here is the complete code I suggest you use ankur. It will allow some wrong ones through (e.g. .zzzzzz.zz), but unless you put a very big OR statement at the end to capture every type of legal tdl ((com?|co\.uk|info|..etc.)) you won't be able to get around it I'm afraid.

 

if(preg_match("/^[a-z0-9-]+\.[a-z]{2,10}(\.[a-z]{2,5})?$/", $domain))

 

Here's the Root Zone Database if you do wish to do that though :)

Joe

Link to comment
Share on other sites

Hi Joe,

 

That should say:

[a-z]{2,10}(\.[a-z]{2,5})?

 

But that matches aa

(No tld needed.)

 

So my question still stands: did you mean to say

if(preg_match("/^[a-z0-9-]+\.[a-z]{2,5}$/", $domain))

;)

 

Thank you for your interesting information about the future of tlds, and the link!

 

Wishing you a fun day.

Link to comment
Share on other sites

Hi playful,

 

That should say:

[a-z]{2,10}(\.[a-z]{2,5})?

 

But that matches aa

(No tld needed.)

 

Yes, that alone does, but not when included in the entire regex.

 

So my question still stands: did you mean to say

if(preg_match("/^[a-z0-9-]+\.[a-z]{2,5}$/", $domain))

;)

 

No, I meant what I posted. I'll explain. The OP states:

 

A user should submit domain such as abc123-abc.tld

 

That means the the domain can consist of three parts:

abc123-abd

.

tld

 

Each part is:

abc123-abd

  - 

[a-z0-9-]+

[/td]

[td].

  - 

\.

tdl

  - 

[a-z]{2,10}(\.[a-z]{2,5})?

 

The reason the tdl contains so much is because of how many types of tdl's there are. If we limit the tdl to just [a-z]{2,5} as you are suggesting then we are ruling out co.uk addresses etc., they will fail on the extra dot. However, saying that, the .uk is an optional part of the tdl as it .com's and .net's etc don't include them. That is why I encase it in parenthesis followed by a question mark. If it's there, include it, else if the regex fails here it doesn't matter as it's not imperative to the entire match.

 

Hence I stand by my original regex. It will allow for every type of tdl available and must match characters and a dot before it:

if(preg_match("/^[a-z0-9-]+\.[a-z]{2,10}(\.[a-z]{2,5})?$/", $domain))

 

Hope I explained myself properly,

Joe

Link to comment
Share on other sites

Hi Joe,

 

Perfectly clear.

 

I thought that when you wrote

That should say:

[a-z]{2,10}(\.[a-z]{2,5})?

 

in your second message, you meant that you intended that to be the entire regex. This surprised me as it matches aa.

 

I missed the bottom of that message, where your corrected expression lived.

My bad.

:)

 

Hence I stand by my original regex.

 

Your second version of your original regex. ;)

 

For the record, then, this regex matches

a.aaaaaaaaaa

but not

a.aa.aaaaaaaaaa

 

In other words, you can have a ten-letter tld, but only if it is not preceded by a sub-tld. This makes me wonder if you wouldn't prefer the middle part to be the optional one, rather than the last part. Not a big deal, though, and I think I can hear your answer from here ("I stand by my regex" ?) I don't mean to enter a nitpicking contest. For the record, the history of the convo is that at first I thought I saw a bug, and there was one. Then I thought I saw a second bug, and I saw wrong. At this stage if you are happy with the feature above, no probs, just thought I'd bring it up. There are a million ways to match a url, all with their personalities. Nothing wrong with that, it's just nice to "date" a little and get to know a personality before getting married. :)

 

Wishing you a fun day

Link to comment
Share on other sites

Hey playful,

 

From what I've seen of tdl's, the optional extension on the end is usually the shorter of the two tdl components. That's why I made it the shorter one. It's neither here nor there though. Once you start trying to match url's in a block of text you can't even end at the tdl, for the url might be pointing to a file 3 sub directories down the tree. This is all specific to the OP's needs.

 

And true, I stand by the second version of my original regex ::) haha

 

Joe

Link to comment
Share on other sites

I would like to ask one more question.

 

I have an array such as >>

$whoisservers = array(
    "ac" =>"whois.nic.ac",
    "ae" =>"whois.nic.ae",
    "aero"=>"whois.aero",
    "af" =>"whois.nic.af",
    "ag" =>"whois.nic.ag",
    "al" =>"whois.ripe.net",
    "am" =>"whois.amnic.net",
    "arpa" =>"whois.iana.org",
    "as" =>"whois.nic.as",
    "asia" =>"whois.nic.asia",
    "at" =>"whois.nic.at",
    "au" =>"whois.aunic.net",
    "az" =>"whois.ripe.net",
    "ba" =>"whois.ripe.net",
    "be" =>"whois.dns.be",
    "bg" =>"whois.register.bg",
    "bi" =>"whois.nic.bi",
    "biz" =>"whois.biz",
    "bj" =>"whois.nic.bj",
    "br" =>"whois.registro.br",
    "bt" =>"whois.netnames.net",
    "by" =>"whois.ripe.net",
    "bz" =>"whois.belizenic.bz",
    "ca" =>"whois.cira.ca",
    "cat" =>"whois.cat",
    "cc" =>"whois.nic.cc",
    "cd" =>"whois.nic.cd",
    "ch" =>"whois.nic.ch",
    "ci" =>"whois.nic.ci",
    "ck" =>"whois.nic.ck",
    "cl" =>"whois.nic.cl",
    "cn" =>"whois.cnnic.net.cn",
    "com" =>"whois.verisign-grs.com",
    "coop" =>"whois.nic.coop",
    "cx" =>"whois.nic.cx",
    "cy" =>"whois.ripe.net",
    "cz" =>"whois.nic.cz",
    "de" =>"whois.denic.de",
    "dk" =>"whois.dk-hostmaster.dk",
    "dm" =>"whois.nic.cx",
    "dz" =>"whois.ripe.net",
    "edu" =>"whois.educause.edu",
    "ee" =>"whois.eenet.ee",
    "eg" =>"whois.ripe.net",
    "es" =>"whois.ripe.net",
    "eu" =>"whois.eu",
    "fi" =>"whois.ficora.fi",
    "fo" =>"whois.ripe.net",
    "fr" =>"whois.nic.fr",
    "gb" =>"whois.ripe.net",
    "gd" =>"whois.adamsnames.com",
    "ge" =>"whois.ripe.net",
    "gg" =>"whois.channelisles.net",
    "gi" =>"whois2.afilias-grs.net",
    "gl" =>"whois.ripe.net",
    "gm" =>"whois.ripe.net",
    "gov" =>"whois.nic.gov",
    "gr" =>"whois.ripe.net",
    "gs" =>"whois.nic.gs",
    "gw" =>"whois.nic.gw",
    "gy" =>"whois.registry.gy",
    "hk" =>"whois.hkirc.hk",
    "hm" =>"whois.registry.hm",
    "hn" =>"whois2.afilias-grs.net",
    "hr" =>"whois.ripe.net",
    "hu" =>"whois.nic.hu",
    "ie" =>"whois.domainregistry.ie",
    "il" =>"whois.isoc.org.il",
    "in" =>"whois.inregistry.net",
    "info" =>"whois.afilias.net",
    "int" =>"whois.iana.org",
    "io" =>"whois.nic.io",
    "iq" =>"vrx.net",
    "ir" =>"whois.nic.ir",
    "is" =>"whois.isnic.is",
    "it" =>"whois.nic.it",
    "je" =>"whois.channelisles.net",
    "jobs" =>"jobswhois.verisign-grs.com",
    "jp" =>"whois.jprs.jp",
    "ke" =>"whois.kenic.or.ke",
    "kg" =>"www.domain.kg",
    "ki" =>"whois.nic.ki",
    "kr" =>"whois.nic.or.kr",
    "kz" =>"whois.nic.kz",
    "la" =>"whois.nic.la",
    "li" =>"whois.nic.li",
    "lt" =>"whois.domreg.lt",
    "lu" =>"whois.dns.lu",
    "lv" =>"whois.nic.lv",
    "ly" =>"whois.nic.ly",
    "ma" =>"whois.iam.net.ma",
    "mc" =>"whois.ripe.net",
    "md" =>"whois.ripe.net",
    "me" =>"whois.meregistry.net",
    "mg" =>"whois.nic.mg",
    "mil" =>"whois.nic.mil",
    "mn" =>"whois.nic.mn",
    "mobi" =>"whois.dotmobiregistry.net",
    "ms" =>"whois.adamsnames.tc",
    "mt" =>"whois.ripe.net",
    "mu" =>"whois.nic.mu",
    "museum" =>"whois.museum",
    "mx" =>"whois.nic.mx",
    "my" =>"whois.mynic.net.my",
    "na" =>"whois.na-nic.com.na",
    "name" =>"whois.nic.name",
    "net" =>"whois.verisign-grs.net",
    "nf" =>"whois.nic.nf",
    "nl" =>"whois.domain-registry.nl",
    "no" =>"whois.norid.no",
    "nu" =>"whois.nic.nu",
    "nz" =>"whois.srs.net.nz",
    "org" =>"whois.pir.org",
    "pl" =>"whois.dns.pl",
    "pm" =>"whois.nic.pm",
    "pr" =>"whois.uprr.pr",
    "pro" =>"whois.registrypro.pro",
    "pt" =>"whois.dns.pt",
    "re" =>"whois.nic.re",
    "ro" =>"whois.rotld.ro",
    "ru" =>"whois.ripn.net",
    "sa" =>"whois.nic.net.sa",
    "sb" =>"whois.nic.net.sb",
    "sc" =>"whois2.afilias-grs.net",
    "se" =>"whois.iis.se",
    "sg" =>"whois.nic.net.sg",
    "sh" =>"whois.nic.sh",
    "si" =>"whois.arnes.si",
    "sk" =>"whois.ripe.net",
    "sm" =>"whois.ripe.net",
    "st" =>"whois.nic.st",
    "su" =>"whois.ripn.net",
    "tc" =>"whois.adamsnames.tc",
    "tel" =>"whois.nic.tel",
    "tf" =>"whois.nic.tf",
    "th" =>"whois.thnic.net",
    "tj" =>"whois.nic.tj",
    "tk" =>"whois.dot.tk",
    "tl" =>"whois.nic.tl",
    "tm" =>"whois.nic.tm",
    "tn" =>"whois.ripe.net",
    "to" =>"whois.tonic.to",
    "tp" =>"whois.nic.tl",
    "tr" =>"whois.nic.tr",
    "travel" =>"whois.nic.travel",
    "tv" => "tvwhois.verisign-grs.com",
    "tw" =>"whois.twnic.net.tw",
    "ua" =>"whois.net.ua",
    "ug" =>"whois.co.ug",
    "uk" =>"whois.nic.uk",
    "us" =>"whois.nic.us",
    "uy" =>"nic.uy",
    "uz" =>"whois.cctld.uz",
    "va" =>"whois.ripe.net",
    "vc" =>"whois2.afilias-grs.net",
    "ve" =>"whois.nic.ve",
    "vg" =>"whois.adamsnames.tc",
    "wf" =>"whois.nic.wf",
    "ws" =>"whois.website.ws",
    "yt" =>"whois.nic.yt",
    "yu" =>"whois.ripe.net");

 

The TLD should match to given tlds in array.

If it wont match, it will go back to index.php with header()

I am confused , how to do this ?

 

Link to comment
Share on other sites

  • 3 weeks later...

For the record, then, this regex matches

a.aaaaaaaaaa

but not

a.aa.aaaaaaaaaa

 

In other words, you can have a ten-letter tld, but only if it is not preceded by a sub-tld.

 

Yes, I forgot that point.

What to do for domains such as >>

 

something.co.in

something.com.mx

 

Thanks

 

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.