jrw4 Posted October 6, 2009 Share Posted October 6, 2009 If I have a string that looks like: <content type="xhtml" xmlns:xhtml="http://www.w3.org/1999/xhtml"> or <content type="xhtml" xmlns:x="http://www.w3.org/1999/xhtml"> I am trying to get the value that is between xmlns: and the equal sign. <content type="xhtml" xmlns:[THIS IS WHAT IM LOOKING FOR]="http://www.w3.org/1999/xhtml"> So I have tried the following code: $string = '<content type="xhtml" xmlns:xhtml="http://www.w3.org/1999/xhtml">'; preg_match("xmlns:[\w]+\=", $string, $matches); var_dump($matches); This returns null So I am not sure how to find that. What am I doing wrong and what should I do to find that part of the string? Quote Link to comment Share on other sites More sharing options...
nrg_alpha Posted October 6, 2009 Share Posted October 6, 2009 The problem is that you're delimiters are lacking..., plus you are not isolating what you are looking for in your pattern. Here is an example of how I would tackle it (throwing both url examples you listed as an array): $html = array('<content type="xhtml" xmlns:x="http://www.w3.org/1999/xhtml">', '<content type="xhtml" xmlns:xhtml="http://www.w3.org/1999/xhtml">'); foreach($html as $val){ preg_match('#xmlns:\K[^=]+#', $val, $match); echo $match[0] . "<br />\n"; } Output: x xhtml This way, $match[0] will only contain what is between xmlns and =. EDIT - In our resources page, you read up about the 'Why delimiters?' thread, as well as delimiters in the php manual. Quote Link to comment Share on other sites More sharing options...
cags Posted October 6, 2009 Share Posted October 6, 2009 Sorry to hijack the thread slightly, but it looks like nrg_alpha has solved it anyway, what does the \K modifer do, I tried looking it up and came up with 'Named Capturing Groups', but the references I found seemed to indicate \k was a .NET syntax. EDIT: Nevermind I found it in one of the links you provided. Quote Link to comment Share on other sites More sharing options...
nrg_alpha Posted October 6, 2009 Share Posted October 6, 2009 Sorry to hijack the thread slightly, but it looks like nrg_alpha has solved it anyway, what does the \K modifer do.. http://www.phpfreaks.com/blog/pcre-regex-spotlight-k Quote Link to comment Share on other sites More sharing options...
cags Posted October 6, 2009 Share Posted October 6, 2009 Excellent, thanks for the link. I really should get around to checking out the articles here, I've only been around a few days and only focused on the forum. Quote Link to comment Share on other sites More sharing options...
nrg_alpha Posted October 6, 2009 Share Posted October 6, 2009 No problem..take your time.. you're doing great Quote Link to comment Share on other sites More sharing options...
jrw4 Posted October 7, 2009 Author Share Posted October 7, 2009 Thanks for asking that as I had the same question on \K. Now if I wanted to match something like: <content[ANYTHING]> My pattern would be: $pattern = "/<content\K[\w]*>/"; Quote Link to comment Share on other sites More sharing options...
nrg_alpha Posted October 7, 2009 Share Posted October 7, 2009 Thanks for asking that as I had the same question on \K. Now if I wanted to match something like: <content[ANYTHING]> My pattern would be: $pattern = "/<content\K[\w]*>/"; Keep in mind that in that case, $0 (or if using preg_match, index[ 0 ] - either of which is the value that the entire pattern matched / captured is stored as) would be 'ANYTHING>' (I'm assuming that the [] brackets surrounding ANYTHING in the source string isn't there.. just displayed to surround ANYTHING for illustrative purposes...) Note that the > is included.. so chances are this is not what you would want. In this case, you have a few options.. you can either put the > into a lookahead assertion like so: $pattern = "/<content\s?\K\w*(?=>)/"; Since assertions don't consume any text, the > part is not included with the base match... Or, depending on the string's circumstances (like let's assume that after '<content ' and the sequence of \w characters, it closes off with >, you might be able to even outright omit > completely: $pattern = "/<content\s?\K\w*/"; This way, in either sample, the > character is not included in the base variable $0 (or if using preg_match, index[ 0 ]), which cleans things up a bit. Also note that I didn't use [\w], as \w is already a character class short hand in and of itself, which for all intents and purposes (without delving into the topic of locales, is understood as saying [a-zA-Z0-9_].. so if something like \w, or \d etc.. is the only thing being placed inside a character class, the character class is useless. And finally, if stuff follows <content, you probably don't want to include the initial space in your match, so I threw in the \s? just in case... However, I don't think I would even use \w.. perhaps instead of that, I would use a character class to grab everything up to > like so: $pattern = "/<content\s?\K[^>]*/"; Quote Link to comment Share on other sites More sharing options...
jrw4 Posted October 7, 2009 Author Share Posted October 7, 2009 Well I was trying along the lines of: $input = preg_replace("/<content\K[^\>]+/", "", $input); Which just takes <content[ANYTHING]> and turns it into <content> which is alright but then I have to do a second line of code to remove that too: $input = str_replace("<content>", "", $input); I meant to ask how to do that in one line? Quote Link to comment Share on other sites More sharing options...
nrg_alpha Posted October 7, 2009 Share Posted October 7, 2009 If I understand correctly, you are wiping out [ANYTHING] from <content[ANYTHING]> if applicable, but then you want to wipe out <content> itself? I'll provide three samples which remove various levels of the <content> tags: example: $input = <<<EOF Some text. <content class="whatever">Some content</content> Some more text yet again! <content>And yes, some more content!</content> EOF; # exmaple1: remove <content[ANYTHING]> only! $input1 = preg_replace('#<content[^>]*>#i', '', $input); echo $input1 . "<br />\n"; // Output: Some text. Some content</content> Some more text yet again! And yes, some more content!</content> # example2: remove complete content tags $input2 = preg_replace('#<content[^>]*>.*?</content>#is', '', $input); echo $input2 . "<br />\n"; // Output: Some text. Some more text yet again! # example3: remove only the content tags (yet leave the text inside those tags in place) $input3 = preg_replace('#<content[^>]*>(.*?)</content>#is', '$1', $input); echo $input3 . "<br />\n"; // Output: Some text. Some content Some more text yet again! And yes, some more content! Quote Link to comment Share on other sites More sharing options...
thebadbad Posted October 7, 2009 Share Posted October 7, 2009 Lol, I was about to post this, but then you beat me to it, haha: Then you don't need \K at all, but just $input = preg_replace('~<content[^>]*>~i', '', $input); And my advance apologies goes to nrg, who is probably in the process of writing an elaborate answer (no offence - you do a great job explaining things in detail, while my answers just often aren't that long). Quote Link to comment Share on other sites More sharing options...
nrg_alpha Posted October 7, 2009 Share Posted October 7, 2009 lol no harm, no foul Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.