Jump to content

regex to allow text and punctuation marks


shaddf

Recommended Posts

You'll have to be a bit more specific. What kind of puncuation marks? Period, comma, semicolon? And where exactly are they allowed? Are you OK with “.......”?

at the end of a sentence, just like in literature i.e period,comma,semicolon,colon,hyphen and question mark

Link to comment
Share on other sites

That regex makes absolutely no sense.

 

For example, this will match

aaaa<

but not

This is an English sentence, definitely.

And what is the quantifier combination “{1}+” supposed to do?

 

Sentences look more like this:

<?php

$word      = '[a-zA-Z]+';                     // a single word
$words     = "$word( $word)*";                // a sequence of space-separated words
$sentence  = "$words([,;:] $words)*[.!?]";    // a sentence

$sentences = "/$sentence+/";

$testInput = 'This is an English sentence, definitely. And another one.';
var_dump(preg_match($sentences, $testInput));
Edited by Jacques1
Link to comment
Share on other sites

There are many characters that can legitimately be within and at the end of a sentence " . . . just like in literature". If this is for a real-world application that will presumably produce an error for sentences entered by users, you are going to have the very likely possibility of false negatives.

Link to comment
Share on other sites

My regex makes no sense? I think it's the other way around. The OP asked for a regex that allows for punctuation marks at the end of a sentence, yours allows for repeated punctuation marks anywhere but the beginning of the sentence;;;;;;;:,, which makes no sense to me! If you want to strictly match a English literature sentence, then something like this will work.

 

/^(?:\w+(\, |\; |\: | )){1,}(?:\w+(\.|\!|\?))$/i

Edited by printf
Link to comment
Share on other sites

This is, again, nonsense. Please learn the basic syntax of regular expressions before you try to give advice.

  • The \w character class includes the underscore and digits, which means you consider “_ _!” or “123 123.” valid English sentences. This obviously makes no sense.
  • What's the point of “{1,}”? I guess what you're looking for is the + quantifier.
  • Why are you escaping characters that don't require any escaping in the first place?  Characters like “,” or “;” or “:” can be written down verbatim, you know? They don't have any meaning in regexes.
  • Why do you require more than one word? “Go!” is a valid sentence, don't you think?
Link to comment
Share on other sites

Anyway, I agree with Psycho that trying to “validate” text causes more harm than good.

 

The English language is much, much more complex than anything a regex could cover. Of course we can check the basic structure, but this will exclude a large amount of perfectly valid text. Who are we to decide that a text is “invalid”, anyway? People use all kinds of nonstandard language constructs, and that doesn't mean they're all wrong.

 

Assuming that words only consist of a-zA-Z is already a misconception. What about “Raison d'être”? What about “O'Neil”?

 

So unless you have a good reason why you want to annoy your users and force them to use some primitive subset of the English language, just forget about it.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.