Jump to content

[SOLVED] how can you negate a string literal in perl regex (preg_match_all)


dsaba

Recommended Posts

I know how to negate a single character using a character class like so:

[^a]

which will mean not "a"

 

I want to negate an entire string literal, not just 1 character, where a string literal will compose of many characters, I know i could probably say it this way:

if the string literal is "this"

[^t][^h][^i][^s]

 

my question is, is there an easier way to negate a string literal?

can't I say this:

^(this)

 

I realize ^ carrot symbol outside of [ ] brackets means start of a new line, so the above will surely cause problems

 

-------------------------------------------------------------------------------

The above is my main question, now here is some some explanation of why I am asking this question, do not be confused by this extra explanation, it is my main question that I wanted answered overall.

In a tutorial I read about laziness and greediness, how you can make a pattern not so greedy by making it lazy, yet how this laziness will in effect slow down your regex engine by making it backtrack. So the alternative I read, was to negate something to make it get it right the first time, and not back track. For example an original pattern could be this:

Str: This is a <EM>first</EM> test.

Pattern: <.+?>

 

This will match the <html> tags , but will cause unneccesary backtracking. So the alternative they give is this:

 

str: This is a <EM>first</EM> test.

Pattern: <[^>]+>

 

 

I want to do the same thing, except to negate an entire string literal instead of a single character

Str: left marker grab this! right marker

pattern: left marker ^(right marker)+ right marker

 

this is where my question stemmed from.

-thanks for reading, remem my main question is most important

 

Thank You

Link to comment
Share on other sites

I wanted to grab/match "grab this!"  (..I thought this was pretty obvious)

 

your supplied regex matches the entire string, which is not what i wanted

 

code:

$str = 'left marker grab this! right marker';
$pat = '~left marker(??!right marker).)+right marker~';
$whatever = preg_match_all($pat, $str, $out);
print_r($out);

 

output:

Array
(
    [0] => Array
        (
            [0] => left marker grab this! right marker
        )

)

 

So maybe you can show me something that works please?

 

 

Also, you say that effectiveness of greediness and lazyness depends.. duh?

Well, in my specific example what would you say the effectiveness is?

Link to comment
Share on other sites

Are you being rude, or was something "lost in translation" here?

 

This pattern will capture: ~left marker((?:(?!right marker).)+)right marker~

 

I mentioned efficiency because you stated that laziness will slow down the expression because it backtracks. Greediness backtracks also, so based on the data you're working with, it's a matter of which will backtrack less. The more specific the expression, the better, such as [^>]+.

Link to comment
Share on other sites

hey dsaba

 

if you go here, http://www.php.net/manual/en/function.preg-match.php

you will see a note

 

 

Notes

Tip

 

Do not use preg_match() if you only want to check if one string is contained in another string. Use strpos() or strstr() instead as they will be faster.

 

 

this is the 2nd thread you've done on this... http://www.phpfreaks.com/forums/index.php/topic,170536.0.html

 

what you want can't be done the way you want it. Regex deals in absolutes not ambiguities

 

this is close, if you know it going to be the first after pattern match

 

<?php

$subject = "^whatever 123 this$";
$pattern = '/.* (.*?) .*/i';
preg_match($pattern, $subject, $matches);
echo $matches[1];

?>

 

 

Link to comment
Share on other sites

Are you being rude, or was something "lost in translation" here?

 

This pattern will capture: ~left marker((?:(?!right marker).)+)right marker~

 

I mentioned efficiency because you stated that laziness will slow down the expression because it backtracks. Greediness backtracks also, so based on the data you're working with, it's a matter of which will backtrack less. The more specific the expression, the better, such as [^>]+.

 

I've read some things about efficiency with backtracking effects in greediness and lazyness from the link you supplied me effigy, and other places I've gone... However, I realize I must not know EVERYTHING about this subject, you seemed to know a lot. So this is why I asked you this, not because I was trying to be rude, but  because I was trying to learn all that I can. So don't be offended, I suppose in "learning" sometimes there are questions that challenge what the person has said before, but this is done in order to learn. I find that a lot of people get offended too easily when people try to have a conversation where any amount of knowledge gets spread because, one person eventually feels like the other is provoking or questioning his current knowledge at that moment. This probably can be offensive, but it is not meant to be.

 

I said duh, because what you said seemed pretty obvious, i thought there was something more I don't know about. Yes, if there are more instances where one must backtrack in any regex search, then this will reduce the effectiveness of the search. More backtracking = less efficiency. This seemed to be obvious to me, yet again I asked you because maybe there is something else you knew that affects the "efficiency" of the regex related to backtracking. I learn very well through example, and I had given a very specific example of the string and pattern I wanted to match. Why not just tell me the effectiveness that you speak of in my specific example, and by doing this, this can define efficency I was not 100% sure about. This is why I asked you this question.

 

On the pattern, I was trying to match what in between the two markers, hence the string I wanted to grabbed was named so accordingly "grab me!". You must have misunderstood me and though I wanted to match the entire string. No, I didn't. In order for me to learn how to correctly use the metacharacters..etc.. to find matches in between two string literals, it would probably help to see an example of that woudln't it? This is why I asked you again, because your earlier example did not do this.

You might as well have said this is the pattern I should be using:

/left marker grab this! right marker/

This would have achieved the same results as this would have:

/left marker(?:(?!right marker).)+right marker/

 

You see what I'm getting? I thought it was obvious that I wanted to match the string literal in between the two string literal markers and do this in this "non-bactracking method" and by learning how to "negate an whole literal string" to replicate a pattern similar to what they did in the tutorial I read. I appreciate your help effigy, and I'd really rather not type all these explanations of why I asked things and just get to business in spreading the "knowledge", but you asked me too, and I don't like being accused of something i'm not.

 

Thank you

Link to comment
Share on other sites

here are more examples of what i'm trying to accomplish:

 

$str = 'lala grab this! right marker distraction lala grab this! right marker';
$pat = '/lala [^l][^a][^l}[^a]+ right marker/';
$whatever = preg_match_all($pat, $str, $out);
print_r($out)
THIS doesn't work, why not?

//<[^>]+>

$str = 'here is some text <em> that is nice </em> this is bold <b> hi </b>';
$pat = '/<[^>]+>/';
$whatever = preg_match_all($pat, $str, $out);
print_r($out);
This does, i'm trying to mimic this one except with the right marker being more than 1 character, take note this right marker i'm referring to is ">", in the other example, the right marker is the string literal "right marker", I want to grab whats in between the two markers

 

ben

$pattern = '/.* (.*?) .*/i';

(.*?) I've seen before but it makes the greediness lazy, which requires backtracking, I'm am looking for a way similar to the "alternative to this" that it was talking about in the tutorial, except by negating a string literal.

 

Link to comment
Share on other sites

ah I just tried your pattern in a string with duplicates of my original string:

$str = 'left marker grab this! right marker distraction left marker grab this! right marker';
$pat = '~left marker((??!right marker).)+)right marker~';
$whatever = preg_match_all($pat, $str, $out);
o($out);

 

this does indeed match it, why it didn't match it when there was only one, I didn't know, but if somebody would have told me that it acts this way with only 1 of it, then I would have known sooner.

 

Link to comment
Share on other sites

what you want can't be done the way you want it. Regex deals in absolutes not ambiguities

 

This is entirely wrong.

 

So don't be offended, I suppose in "learning" sometimes there are questions that challenge what the person has said before, but this is done in order to learn.

 

You haven't offended me at all. I was simply confused by the wording and styling of your response and wanted to see if I misunderstood something. By all means, ask away.

 

I'd really rather not type all these explanations of why I asked things and just get to business in spreading the "knowledge", but you asked me too, and I don't like being accused of something i'm not.

 

No, I did not ask you to ;) You could have simply said "No, I wasn't. There must have been a misunderstanding." And having read your reply, I see that there was. I'm just picky about words and formatting--that's all. For instance, to me "Duh" is a complete smart-ass answer, and I assume you do not perceive it the same. That's why I made the "lost in translation" suggestion. So... we're OK :)

 

this does indeed match it, why it didn't match it when there was only one, I didn't know,

 

This should work with only one. Did you try the updated pattern, or the original?

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.