mck.workman Posted January 19, 2012 Share Posted January 19, 2012 Hey! I was wondering if you can specify something like below or if you would have to use two different regex's I am trying to match only the .gh files below by saying: /http:\/\/www.grasshopper3d.com\/forum\/attachment\/download\?id=2985220%3AUploadedFile%3A[0-9]{6}[^.+\.gh]/ meaning include the files that are .gh files but don't include the .gh in the match. (ie. exclude the .jpg, etc files) Data: "http://www.grasshopper3d.com/forum/attachment/download?id=2985220%3AUploadedFile%3A501843">01.jpg</a> "http://www.grasshopper3d.com/forum/attachment/download?id=2985220%3AUploadedFile%3A506981">SURFACE-DIAGRID-TEST.gh</a> Thank you! McK Quote Link to comment https://forums.phpfreaks.com/topic/255338-another-regex-question/ Share on other sites More sharing options...
abareplace Posted January 19, 2012 Share Posted January 19, 2012 Hi, mck.workman, you can use positive lookahead to check for the presence of ".gh" without including it to the match. <?php $regex = '/http:\/\/www.grasshopper3d.com\/forum\/attachment\/download\?id=2985220%3AUploadedFile%3A[0-9]{6}">[^.]+(?=\.gh)/'; $data = '"http://www.grasshopper3d.com/forum/attachment/download?id=2985220%3AUploadedFile%3A501843">01.jpg</a> "http://www.grasshopper3d.com/forum/attachment/download?id=2985220%3AUploadedFile%3A506981">SURFACE-DIAGRID-TEST.gh</a>'; if (preg_match($regex, $data, $matches)) print_r( $matches ); Read more about lookahead. Hope this helps. Quote Link to comment https://forums.phpfreaks.com/topic/255338-another-regex-question/#findComment-1309134 Share on other sites More sharing options...
ragax Posted January 19, 2012 Share Posted January 19, 2012 Hi McK! aba is right that lookaheads are a nice way to do it! Here's code for another solution without lookaheads, which has several benefits. 1. It's a bit more general, in case you'd like to capture files with various numbers, 2. It also works for files that have a dot in them, like try.this.gh It also matches a bit faster (61 steps vs 112 for the gh string you supplied), but that's immaterial. Input: "http://www.grasshopper3d.com/forum/attachment/download?id=2985220%3AUploadedFile%3A501843">01.jpg</a> "http://www.grasshopper3d.com/forum/attachment/download?id=2985220%3AUploadedFile%3A506981">SURFACE-DIAGRID-TEST.gh</a> "http://www.grasshopper3d.com/forum/attachment/download?id=88UploadedFile%3A981">AnotherOne.gh</a>' Code: <?php $string = '"http://www.grasshopper3d.com/forum/attachment/download?id=2985220%3AUploadedFile%3A501843">01.jpg</a> "http://www.grasshopper3d.com/forum/attachment/download?id=2985220%3AUploadedFile%3A506981">SURFACE-DIAGRID-TEST.gh</a> "http://www.grasshopper3d.com/forum/attachment/download?id=88UploadedFile%3A981">Another.One.gh</a>'; $pattern = ',(http://www\.grassh[^?]+\?id[^U]+Up[^>]+>(([^.<]*?\.?)*))\.gh,'; $hit = preg_match_all($pattern,$string,$matches,PREG_PATTERN_ORDER); $sz=count($matches[0]); for ($i=0;$i<$sz;$i++) { echo "Match: ".$matches[1][$i]."<br />"; echo "File: ".$matches[2][$i]."<br /><br />"; } ?> Output: Match: http://www.grasshopper3d.com/forum/attachment/download?id=2985220%3AUploadedFile%3A506981">SURFACE-DIAGRID-TEST File: SURFACE-DIAGRID-TEST Match: http://www.grasshopper3d.com/forum/attachment/download?id=88UploadedFile%3A981">Another.One File: Another.One Nothing wrong with aba's solution, just wanting to give you another option. Let us know if these work for you. Quote Link to comment https://forums.phpfreaks.com/topic/255338-another-regex-question/#findComment-1309155 Share on other sites More sharing options...
ragax Posted January 19, 2012 Share Posted January 19, 2012 It also matches a bit faster (61 steps vs 112 for the gh string you supplied), but that's immaterial. Edit: I have it in reverse. Aba's is the faster one. Quote Link to comment https://forums.phpfreaks.com/topic/255338-another-regex-question/#findComment-1309159 Share on other sites More sharing options...
mck.workman Posted January 20, 2012 Author Share Posted January 20, 2012 Hey guys! Thanks for the input--things are working beautifully Can you use pre and post at same time? Playful, I do have a question about understanding the '(([^.<]*?\.?)*))\.gh' part of your regex to match '01.jpg</a>' Translation: "(([not including any character <] zero or more times)maybe)any character maybe)zero or more times).gh" Question: Where is my translation wrong because you need to say something like ([not including any character]one or more times)<\/a>\.gh)" right? Thank you again for your help! Quote Link to comment https://forums.phpfreaks.com/topic/255338-another-regex-question/#findComment-1309471 Share on other sites More sharing options...
ragax Posted January 20, 2012 Share Posted January 20, 2012 Hey McK, Great to hear from you, and to hear that the expressions from Aba and myself are helping with your project. I do have a question about understanding the '(([^.<]*?\.?)*))\.gh' part of your regex Sure! Here is a commented / unrolled version, using comment mode (aka whitespace mode). (This expression will actually work in preg_match if you put it inside a pattern string with some delimiters.) (?x) # comment mode ( # Start group 1 capture: the whole url without .gh STUB> # This is the part of the url up to > ( # Start Group 2 capture: this is the file name without .gh # On the line below, you could use (?: instead as it is not intended to be capturing ( # Expression "A": Zero or More times... (set by the * at the end) [^.<]*? # Lazily Match characters that are neither dots nor <, expanding as needed \.? # Then match one dot if available, but give it back if necessary to complete the overall match )* # End Expression A that has repeated zero or more time # Expression A has matched a series of zero or many stuffDOT, more_stuffDOT, but gives up the last DOT to allow .gh to match. ) # End Group 2 capture ) # End group 1 capture \.gh # Match .gh (but dont capture) Note that this exact regex will work on STUB>AnotherOne.gh</a> It is the original expression minus everything up to the >. I hope this answers your question, please don't hesitate to ask if any of it is unclear! Quote Link to comment https://forums.phpfreaks.com/topic/255338-another-regex-question/#findComment-1309487 Share on other sites More sharing options...
ragax Posted January 20, 2012 Share Posted January 20, 2012 Couldn't resist posting working php code for this: <?php $string = 'STUB>AnotherOne.gh</a>'; if (preg_match('~(?x) # comment mode ( # Start group 1 capture: the whole url without .gh STUB> # This is the part of the url up to > ( # Start Group 2 capture: this is the file name without .gh # On the line below, you could use (?: instead as it is not intended to be capturing ( # Expression "A": Zero or More times... (set by the * at the end) [^.<]*? # Lazily Match characters that are neither dots nor <, expanding as needed \.? # Then match one dot if available, but give it back if necessary to complete the overall match )* # End Expression A that has repeated zero or more time # Expression A has matched a series of zero or many stuffDOT, more_stuffDOT, but gives up the last DOT to allow .gh to match. ) # End Group 2 capture ) # End group 1 capture \.gh # Match .gh (but dont capture)~', $string,$match)) { echo "Match: ".$match[1]."<br />"; echo "File: ".$match[2]."<br /><br />"; } ?> Ouput: Match: STUB>AnotherOne File: AnotherOne Quote Link to comment https://forums.phpfreaks.com/topic/255338-another-regex-question/#findComment-1309489 Share on other sites More sharing options...
mck.workman Posted January 20, 2012 Author Share Posted January 20, 2012 Got it. That makes perfect sense. If you don't mind I have just one more for you. You introduced me to using groups with regex's which I read a bit about and have been playing with. However, when I try to use a positive look ahead and positive look behind together and they don't work...but individually they do. I found anything that sheds light on why. //This works: $url = file_get_contents("http://protege-ontology-editor-knowledge-acquisition-system.136.n4.nabble.com/template/NamlServlet.jtp?macro=user_nodes&user=68583"); $pattern1 = "/user\/SendEmail\.jtp\?type=user.+;user=\d+/"; $pattern2 = "/(?<=\">Send Email to ).+(?=<)/"; preg_match_all($pattern1, $url, $useremail); preg_match_all($pattern2, $url, $username); print_r($useremail); print_r($username); //This doesn't: $url = file_get_contents("http://protege-ontology-editor-knowledge-acquisition-system.136.n4.nabble.com/template/NamlServlet.jtp?macro=user_nodes&user=68583"); $pattern = "/(user\/SendEmail\.jtp\?type=user&user=\d+)((?<="<Send Email to ).+(?=<))/"; preg_match_all($pattern, $url, $userInfo); echo 'UserEmail: '.$userInfo[1][0] echo 'UserName: '.$userInfo[2][0] Quote Link to comment https://forums.phpfreaks.com/topic/255338-another-regex-question/#findComment-1309534 Share on other sites More sharing options...
ragax Posted January 20, 2012 Share Posted January 20, 2012 Hi McK, ((?<="<Send Email to ).+(?=<)) It looks to me like the quote in (?>=" closes the pattern string. On your earlier tests, you escaped the double quote, so it worked. Quote Link to comment https://forums.phpfreaks.com/topic/255338-another-regex-question/#findComment-1309615 Share on other sites More sharing options...
mck.workman Posted January 20, 2012 Author Share Posted January 20, 2012 Sorry, I copied it to here from a regex tester where I didn't need to escape the the quote but in my php code I actually did and its still feeding me empty arrays. Array ( [0] => Array ( ) [1] => Array ( ) [2] => Array ( ) ) $url = file_get_contents("http://protege-ontology-editor-knowledge-acquisition-system.136.n4.nabble.com/template/NamlServlet.jtp?macro=user_nodes&user=68583"); $pattern = "/(user\/SendEmail\.jtp\?type=user&user=\d+)((?<=\"<Send Email to ).+(?=<))/"; preg_match_all($pattern, $url, $userInfo); print_r($userInfo); // echo('Email: '.$userInfo[1][0]); // echo('Name: '.$userInfo[2][0]); Quote Link to comment https://forums.phpfreaks.com/topic/255338-another-regex-question/#findComment-1309682 Share on other sites More sharing options...
ragax Posted January 20, 2012 Share Posted January 20, 2012 its still feeding me empty arrays Hey McK, If that's the actual code you're running, are you sure you have the right test string? For instance, I don't see SendEmail in the string. Quote Link to comment https://forums.phpfreaks.com/topic/255338-another-regex-question/#findComment-1309696 Share on other sites More sharing options...
mck.workman Posted January 20, 2012 Author Share Posted January 20, 2012 The url isn't the test string. When I say file_get_contents it returns a string of the html contents of the page so that is the string it is searching. Quote Link to comment https://forums.phpfreaks.com/topic/255338-another-regex-question/#findComment-1309702 Share on other sites More sharing options...
ragax Posted January 21, 2012 Share Posted January 21, 2012 Ah, yes, I should go splash some cold water on my face to wake myself up. Can you paste some of the actual text that the pattern is supposed to match? Without that, I have a hard time troubleshooting an expression. Quote Link to comment https://forums.phpfreaks.com/topic/255338-another-regex-question/#findComment-1309705 Share on other sites More sharing options...
mck.workman Posted January 21, 2012 Author Share Posted January 21, 2012 Sure! <a href="/user/SendEmail.jtp?type=user&user=195799">Send Email to shreyes</a> Quote Link to comment https://forums.phpfreaks.com/topic/255338-another-regex-question/#findComment-1309720 Share on other sites More sharing options...
ragax Posted January 21, 2012 Share Posted January 21, 2012 Okay, focus on this part of your expression: \d+((?<="<Send Email to ).+) After the digits (\d+), you want to match STUFF (.+) that is preceded by "<Send Email to But there is no such stuff. After the digits, you go straight to "<Send Email Let me explain in detail, as this is a key point of lookarounds. See, the lookbehind does not JUMP over characters. After the digits, the regex engine is standing between the 9 and the " At this stage, if you use a lookaround, you stay PLANTED in that position between the 9 and the " With a lookbehind, you look to the left for "<Send, and of course you're not going to find that, there are only digits. If you used a lookahead, you'd be looking to the right of that spot between 9 and ", so you'd be seeing a double quote and some stuff. And after each lookbehind or lookaround, you're still standing in the same spot! This might make your head spin for a moment because your current understanding of lookarounds is a different paradigm. It's like these images you can see with two geometries, with the stairs either going up or going down... Once it clicks, it will be clear as day. Ctrl + F conditionals on my Tut for more on this topic. (I'm doing a major revamp but it's not ready.) Talk soon bro! Quote Link to comment https://forums.phpfreaks.com/topic/255338-another-regex-question/#findComment-1309727 Share on other sites More sharing options...
mck.workman Posted January 21, 2012 Author Share Posted January 21, 2012 Okay I see. That makes sense that you can't skip over a part. Thank you for the explanation. Quote Link to comment https://forums.phpfreaks.com/topic/255338-another-regex-question/#findComment-1309794 Share on other sites More sharing options...
abareplace Posted January 21, 2012 Share Posted January 21, 2012 McK, may I ask, for what are you using the regular expression? Are you trying to collect the email addresses for marketing purposes (i.e. spam)? I'm sorry if the question is rude. Quote Link to comment https://forums.phpfreaks.com/topic/255338-another-regex-question/#findComment-1309811 Share on other sites More sharing options...
mck.workman Posted January 21, 2012 Author Share Posted January 21, 2012 No. No. No. I am learning to use a software called Protege for building ontologies and would like to be able to get more involved with the Protege user community but there is not way to tell if there are any users in my area. I was learning to use KML with google maps and thought that if people that are members of the forum could see other members tagged on google maps with a link to their email they can contact local users in their area by clicking their email link. AND its perfect because I don't know a lot about security so the forum takes care of that by not letting them log in to send an email if they are not registered! I am not a spammer. I have morals. Check out the pic attached of the website I am trying to build for this to happen. Ultimately...I would like to send what I have done to them and ask if they would be willing to put a link on their site to my site that allows users to connect with others in their area. If they say no...well, I will have learning a lot from the exercise. No problem! You have every right to ask. McKinnley Quote Link to comment https://forums.phpfreaks.com/topic/255338-another-regex-question/#findComment-1309817 Share on other sites More sharing options...
ragax Posted January 21, 2012 Share Posted January 21, 2012 Darnit, McK, that's a disappointment. I thought I was helping you build a spam robot. Quote Link to comment https://forums.phpfreaks.com/topic/255338-another-regex-question/#findComment-1309885 Share on other sites More sharing options...
abareplace Posted January 22, 2012 Share Posted January 22, 2012 McK, I'm sorry. As a geek, I'm paranoidally suspicious Your regex will work if you include the page address into lookbehind: (?<=(user/SendEmail\.jtp\?type=user&user=\d+)">Send Email to ).+(?=<) However, most regex engines don't support variable-length lookbehind (\d+ can have any length, from one character to infinity), so it will work only in .NET, RegexBuddy, or my tool. In PHP, you can use the usual capturing groups: <?php $url = '<a href="/user/SendEmail.jtp?type=user&user=195799">Send Email to shreyes</a>'; $pattern = "/(user\/SendEmail\.jtp\?type=user&user=\d+)\">Send Email to (.+)(?=<)/"; preg_match_all($pattern, $url, $userInfo); echo 'UserAddress: '.$userInfo[1][0] . "<br>\n"; echo 'UserName: '.$userInfo[2][0]; Good luck with your project! It should be very useful for the Protege community. Quote Link to comment https://forums.phpfreaks.com/topic/255338-another-regex-question/#findComment-1309973 Share on other sites More sharing options...
mck.workman Posted January 22, 2012 Author Share Posted January 22, 2012 Thanks! No prob. Its funny. Yesterday and today I have been running into security issues---500 errors. Apparently their servers block the file_get_contents function for personal pages..... Oh well, if I don't find another way---live and learn. Thanks for your help. McK Quote Link to comment https://forums.phpfreaks.com/topic/255338-another-regex-question/#findComment-1310121 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.