Matching value between certain tags

WillyTheFish · July 30, 2010

Hey guys,

this is probably pretty simple, but I really have a problem with regex.

This is the input string:

</w:r><w:r><w:rPr><w:b/><w:sz w:val="24"/></w:rPr><w:br/></w:r><w:r><w:rPr><w:b/>
<w:sz w:val="24"/></w:rPr><w:br/></w:r><aml:annotation aml:id="2" w:type="Word.Bookmark.Start" w:name="something"/>
<w:r><w:rPr><w:sz w:val="24"/>
</w:rPr><w:t>THIS IS THE TEXT I WANT TO EXTRACT!</w:t></w:r><w:r><w:rPr><w:sz w:val="24"/></w:rPr><w:br/>
</w:r><aml:annotation aml:id="2" w:type="Word.Bookmark.End"/><w:r><w:rPr><w:sz w:val="24"/></w:rPr><w:br/></w:r><w:r><w:rPr>
<w:sz w:val="24"/></w:rPr><w:br/></w:r><w:r><w:rPr><w:sz w:val="24"/><w:highlight w:val="light-gray"/></w:rPr><w:t>NO MATCH</w:t>

I want to match the text between "Word.Bookmark.Start" and "Word.Bookmark.End",

but not the tags inbetween. So the match of the RegEx in the example above should be:

"THIS IS THE TEXT I WANT TO EXTRACT!"

I do not want to extract the text "NO MATCH", since it is not located between "Word.Bookmark.Start" and "Word.Bookmark.End".

I wanna gather all matches in an array.

Please help! Many thanks

Philip · July 30, 2010

And why are we using regex instead of a xml parser or something of the like?

WillyTheFish · July 30, 2010

Well I tried simplexml_load_file() but that resulted in an empty object... but to be honest, i don't know how to do it... any suggestions?

sasa · July 31, 2010

try

<?php
$test = '</w:r><w:r><w:rPr><w:b/><w:sz w:val="24"/></w:rPr><w:br/></w:r><w:r><w:rPr><w:b/>
<w:sz w:val="24"/></w:rPr><w:br/></w:r><aml:annotation aml:id="2" w:type="Word.Bookmark.Start" w:name="something"/>
<w:r><w:rPr><w:sz w:val="24"/>
</w:rPr><w:t>THIS IS THE TEXT I WANT TO EXTRACT!</w:t></w:r><w:r><w:rPr><w:sz w:val="24"/></w:rPr><w:br/>
</w:r><aml:annotation aml:id="2" w:type="Word.Bookmark.End"/><w:r><w:rPr><w:sz w:val="24"/></w:rPr><w:br/></w:r><w:r><w:rPr>
<w:sz w:val="24"/></w:rPr><w:br/></w:r><w:r><w:rPr><w:sz w:val="24"/><w:highlight w:val="light-gray"/></w:rPr><w:t>NO MATCH</w:t>';
preg_match('/"Word\.Bookmark\.Start"[^>]*>(.*)<[^>]*"Word\.Bookmark\.End"/s', $test, $out);
$out = strip_tags($out[1]);
print_r($out);
?>

WillyTheFish · July 31, 2010

thanks sasa, almost... this regex gives me two arrays:

array(2) { [0]=> string(242) ""Word.Bookmark.Start" w:name="something"/> THIS IS THE TEXT I WANT TO EXTRACT! string(147) " THIS IS THE TEXT I WANT TO EXTRACT! " }

could you have another look at it? thank you so much!

sasa · July 31, 2010

you want 2nd element of array, isn't it?

WillyTheFish · July 31, 2010

I tested your regex with another file, but that didn't work at all

the format is still the same though. But yeah, the second element of the array is correct!

sasa · July 31, 2010

can you post 2nd string

WillyTheFish · July 31, 2010

ops, sorry i'm stupid, forgot to print it right... actually, both elements are incorrect

Array

(

[0] => "Word.Bookmark.Start" w:name="something"/>

<w:r><w:rPr><w:sz w:val="24"/>

</w:rPr><w:t>THIS IS THE TEXT I WANT TO EXTRACT!</w:t></w:r><w:r><w:rPr><w:sz w:val="24"/></w:rPr><w:br/>

</w:r><aml:annotation aml:id="2" w:type="Word.Bookmark.End"

[1] =>

<w:r><w:rPr><w:sz w:val="24"/>

</w:rPr><w:t>THIS IS THE TEXT I WANT TO EXTRACT!</w:t></w:r><w:r><w:rPr><w:sz w:val="24"/></w:rPr><w:br/>

</w:r>

)

sasa · July 31, 2010

do you use strip_tags function on 2nd element of the array?

WillyTheFish · July 31, 2010

nope, unfortunately i cannot use strip_tags... i have to solve this using regex, because i need to determine the position where the data was extracted later on and will be using the same regex twice =/

sasa · July 31, 2010

<?php
$test = '</w:r><w:r><w:rPr><w:b/><w:sz w:val="24"/></w:rPr><w:br/></w:r><w:r><w:rPr><w:b/>
<w:sz w:val="24"/></w:rPr><w:br/></w:r><aml:annotation aml:id="2" w:type="Word.Bookmark.Start" w:name="something"/>
<w:r>  <w:rPr><w:sz w:val="24"/>   <blkah>  <blah>
</w:rPr><w:t>THIS IS THE TEXT I WANT TO EXTRACT!</w:t></w:r><w:r><w:rPr><w:sz w:val="24"/></w:rPr><w:br/>
</w:r><aml:annotation aml:id="2" w:type="Word.Bookmark.End"/><w:r><w:rPr><w:sz w:val="24"/></w:rPr><w:br/></w:r><w:r><w:rPr>
<w:sz w:val="24"/></w:rPr><w:br/></w:r><w:r><w:rPr><w:sz w:val="24"/><w:highlight w:val="light-gray"/></w:rPr><w:t>NO MATCH</w:t>';
preg_match('/"Word\.Bookmark\.Start".*?>[\n\s]*([^<\n\s][^<\n]+)<.*?"Word\.Bookmark\.End"/s', $test, $out);
//$out = strip_tags($out[1]);
print_r($out);
?>

WillyTheFish · August 1, 2010

ahh thank god.. works fine with preg_match_all(). First array is trash though, but i can live with that, second one is beautiful thanks alot sasa!!!!!!

Sign In

Matching value between certain tags

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Archived

Important Information