Jump to content

another regex question...


mck.workman

Recommended Posts

Hey! I was wondering if you can specify something like below or if you would have to use two different regex's

 

I am trying to match only the .gh files below by saying:

 

/http:\/\/www.grasshopper3d.com\/forum\/attachment\/download\?id=2985220%3AUploadedFile%3A[0-9]{6}[^.+\.gh]/      meaning include the files that are .gh files but don't include the .gh in the match. (ie. exclude the .jpg, etc files)

 

Data:

"http://www.grasshopper3d.com/forum/attachment/download?id=2985220%3AUploadedFile%3A501843">01.jpg</a>

"http://www.grasshopper3d.com/forum/attachment/download?id=2985220%3AUploadedFile%3A506981">SURFACE-DIAGRID-TEST.gh</a> 

 

Thank you!

McK

 

Link to comment
Share on other sites

Hi, mck.workman,

 

you can use positive lookahead to check for the presence of ".gh" without including it to the match.

 

<?php

$regex = '/http:\/\/www.grasshopper3d.com\/forum\/attachment\/download\?id=2985220%3AUploadedFile%3A[0-9]{6}">[^.]+(?=\.gh)/';

$data = '"http://www.grasshopper3d.com/forum/attachment/download?id=2985220%3AUploadedFile%3A501843">01.jpg</a> 
"http://www.grasshopper3d.com/forum/attachment/download?id=2985220%3AUploadedFile%3A506981">SURFACE-DIAGRID-TEST.gh</a>';

if (preg_match($regex, $data, $matches))
print_r( $matches );

 

Read more about lookahead.

 

Hope this helps.

Link to comment
Share on other sites

Hi McK!

 

aba is right that lookaheads are a nice way to do it!

 

Here's code for another solution without lookaheads, which has several benefits.

1. It's a bit more general, in case you'd like to capture files with various numbers,

2. It also works for files that have a dot in them, like try.this.gh

 

It also matches a bit faster (61 steps vs 112 for the gh string you supplied), but that's immaterial.

 

Input:

"http://www.grasshopper3d.com/forum/attachment/download?id=2985220%3AUploadedFile%3A501843">01.jpg</a>

"http://www.grasshopper3d.com/forum/attachment/download?id=2985220%3AUploadedFile%3A506981">SURFACE-DIAGRID-TEST.gh</a>

"http://www.grasshopper3d.com/forum/attachment/download?id=88UploadedFile%3A981">AnotherOne.gh</a>'

 

Code:

<?php 
$string = '"http://www.grasshopper3d.com/forum/attachment/download?id=2985220%3AUploadedFile%3A501843">01.jpg</a>
"http://www.grasshopper3d.com/forum/attachment/download?id=2985220%3AUploadedFile%3A506981">SURFACE-DIAGRID-TEST.gh</a> 
"http://www.grasshopper3d.com/forum/attachment/download?id=88UploadedFile%3A981">Another.One.gh</a>';

$pattern = ',(http://www\.grassh[^?]+\?id[^U]+Up[^>]+>(([^.<]*?\.?)*))\.gh,';
$hit = preg_match_all($pattern,$string,$matches,PREG_PATTERN_ORDER);
$sz=count($matches[0]);
for ($i=0;$i<$sz;$i++) {
echo "Match: ".$matches[1][$i]."<br />";
echo "File: ".$matches[2][$i]."<br /><br />";
}
?>

 

Output:

Match: http://www.grasshopper3d.com/forum/attachment/download?id=2985220%3AUploadedFile%3A506981">SURFACE-DIAGRID-TEST

File: SURFACE-DIAGRID-TEST

 

Match: http://www.grasshopper3d.com/forum/attachment/download?id=88UploadedFile%3A981">Another.One

File: Another.One

 

Nothing wrong with aba's solution, just wanting to give you another option.

Let us know if these work for you.

:)

 

Link to comment
Share on other sites

Hey guys!

 

Thanks for the input--things are working beautifully :) Can you use pre and post at same time?

 

Playful, I do have a question about understanding the '(([^.<]*?\.?)*))\.gh' part of your regex to match '01.jpg</a>'

 

Translation: "(([not including any character <] zero or more times)maybe)any character maybe)zero or more times).gh"

 

Question: Where is my translation wrong because you need to say something like ([not including any character]one or more times)<\/a>\.gh)" right?

 

Thank you again for your help!

Link to comment
Share on other sites

Hey McK,

 

Great to hear from you, and to hear that the expressions from Aba and myself are helping with your project.

 

I do have a question about understanding the '(([^.<]*?\.?)*))\.gh' part of your regex

 

Sure! Here is a commented / unrolled version, using comment mode (aka whitespace mode).

(This expression will actually work in preg_match if you put it inside a pattern string with some delimiters.)

 

(?x)           # comment mode
(              # Start group 1 capture: the whole url without .gh
STUB>          # This is the part of the url up to >
(              # Start Group 2 capture: this is the file name without  .gh
               # On the line below, you could use (?: instead as it is not intended to be capturing
(              # Expression "A": Zero or More times... (set by the * at the end)
[^.<]*?        # Lazily Match characters that are neither dots nor <, expanding as needed
\.?            # Then match one dot if available, but give it back if necessary to complete the overall match
)*             # End Expression A that has repeated zero or more time 
               # Expression A has matched a series of zero or many stuffDOT, more_stuffDOT, but gives up the last DOT to allow .gh to match.
)              # End Group 2 capture
)              # End group 1 capture
\.gh           # Match .gh (but dont capture)

 

Note that this exact regex will work on STUB>AnotherOne.gh</a>

 

It is the original expression minus everything up to the >.

 

I hope this answers your question, please don't hesitate to ask if any of it is unclear!

 

:)

 

 

Link to comment
Share on other sites

Couldn't resist posting working php code for this:

 

<?php
$string = 'STUB>AnotherOne.gh</a>';
if (preg_match('~(?x)           # comment mode
(              # Start group 1 capture: the whole url without .gh
STUB>          # This is the part of the url up to >
(              # Start Group 2 capture: this is the file name without  .gh
               # On the line below, you could use (?: instead as it is not intended to be capturing
(              # Expression "A": Zero or More times... (set by the * at the end)
[^.<]*?        # Lazily Match characters that are neither dots nor <, expanding as needed
\.?            # Then match one dot if available, but give it back if necessary to complete the overall match
)*             # End Expression A that has repeated zero or more time 
               # Expression A has matched a series of zero or many stuffDOT, more_stuffDOT, but gives up the last DOT to allow .gh to match.
)              # End Group 2 capture
)              # End group 1 capture
\.gh           # Match .gh (but dont capture)~', $string,$match))
{
echo "Match: ".$match[1]."<br />";
echo "File: ".$match[2]."<br /><br />";
}
?>

 

Ouput:

Match: STUB>AnotherOne

File: AnotherOne

Link to comment
Share on other sites

Got it. That makes perfect sense. If you don't mind I have just one more for you. You introduced me to using groups with regex's which I read a bit about and have been playing with. However, when I try to use a positive look ahead and positive look behind together and they don't work...but individually they do. I found anything that sheds light on why.

//This works:
$url = file_get_contents("http://protege-ontology-editor-knowledge-acquisition-system.136.n4.nabble.com/template/NamlServlet.jtp?macro=user_nodes&user=68583");
$pattern1 = "/user\/SendEmail\.jtp\?type=user.+;user=\d+/";
$pattern2 = "/(?<=\">Send Email to ).+(?=<)/";
preg_match_all($pattern1, $url, $useremail);
preg_match_all($pattern2, $url, $username);
print_r($useremail);
print_r($username);

 

//This doesn't:
$url = file_get_contents("http://protege-ontology-editor-knowledge-acquisition-system.136.n4.nabble.com/template/NamlServlet.jtp?macro=user_nodes&user=68583");
$pattern = "/(user\/SendEmail\.jtp\?type=user&user=\d+)((?<="<Send Email to ).+(?=<))/";
preg_match_all($pattern, $url, $userInfo);
echo 'UserEmail: '.$userInfo[1][0]
echo 'UserName: '.$userInfo[2][0]

 

 

 

Link to comment
Share on other sites

Sorry, I copied it to here from a regex tester where I didn't need to escape the the quote but in my php code I actually did and its still feeding me empty arrays.

 

Array ( [0] => Array ( ) [1] => Array ( ) [2] => Array ( ) )

 

$url = file_get_contents("http://protege-ontology-editor-knowledge-acquisition-system.136.n4.nabble.com/template/NamlServlet.jtp?macro=user_nodes&user=68583");
$pattern = "/(user\/SendEmail\.jtp\?type=user&user=\d+)((?<=\"<Send Email to ).+(?=<))/";
preg_match_all($pattern, $url, $userInfo);
print_r($userInfo);
// echo('Email: '.$userInfo[1][0]);
// echo('Name: '.$userInfo[2][0]);

Link to comment
Share on other sites

Ah, yes, I should go splash some cold water on my face to wake myself up.

 

Can you paste some of the actual text that the pattern is supposed to match?

Without that, I have a hard time troubleshooting an expression.

 

 

Link to comment
Share on other sites

Okay, focus on this part of your expression:

 

\d+((?<="<Send Email to ).+)

 

After the digits (\d+), you want to match STUFF (.+) that is preceded by "<Send Email to

 

But there is no such stuff.

After the digits, you go straight to "<Send Email

 

Let me explain in detail, as this is a key point of lookarounds.

 

See, the lookbehind does not JUMP over characters.

After the digits, the regex engine is standing between the 9 and the "

At this stage, if you use a lookaround, you stay PLANTED in that position between the 9 and the "

With a lookbehind, you look to the left for "<Send, and of course you're not going to find that, there are only digits.

If you used a lookahead, you'd be looking to the right of that spot between 9 and ", so you'd be seeing a double quote and some stuff.

And after each lookbehind or lookaround, you're still standing in the same spot!

 

This might make your head spin for a moment because your current understanding of lookarounds is a different paradigm. It's like these images you can see with two geometries, with the stairs either going up or going down...

Once it clicks, it will be clear as day. :)

Ctrl + F conditionals on my Tut for more on this topic. (I'm doing a major revamp but it's not ready.)

 

Talk soon bro!

 

Link to comment
Share on other sites

No. No. No. I am learning to use a software called Protege for building ontologies and would like to be able to get more involved with the Protege user community but there is not way to tell if there are any users in my area. I was learning to use KML with google maps and thought that if people that are members of the forum could see other members tagged on google maps with a link to their email they can contact local users in their area by clicking their email link. AND its perfect because I don't know a lot about security so the forum takes care of that by not letting them log in to send an email if they are not registered! I am not a spammer. I have morals.

 

Check out the pic attached of the website I am trying to build for this to happen.

 

Ultimately...I would like to send what I have done to them and ask if they would be willing to put a link on their site to my site that allows users to connect with others in their area. If they say no...well, I will have learning a lot from the exercise.

 

No problem! You have every right to ask.

McKinnley

post-130932-13482403193209_thumb.png

Link to comment
Share on other sites

McK, I'm sorry. As a geek, I'm paranoidally suspicious :)

 

Your regex will work if you include the page address into lookbehind:

 

(?<=(user/SendEmail\.jtp\?type=user&user=\d+)">Send Email to ).+(?=<)

 

However, most regex engines don't support variable-length lookbehind (\d+ can have any length, from one character to infinity), so it will work only in .NET, RegexBuddy, or my tool.

 

In PHP, you can use the usual capturing groups:

 

<?php
$url = '<a href="/user/SendEmail.jtp?type=user&user=195799">Send Email to shreyes</a>';
$pattern = "/(user\/SendEmail\.jtp\?type=user&user=\d+)\">Send Email to (.+)(?=<)/";
preg_match_all($pattern, $url, $userInfo);
echo 'UserAddress: '.$userInfo[1][0] . "<br>\n";
echo 'UserName: '.$userInfo[2][0];

 

Good luck with your project! It should be very useful for the Protege community.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.