WillyTheFish Posted July 30, 2010 Share Posted July 30, 2010 Hey guys, this is probably pretty simple, but I really have a problem with regex. This is the input string: </w:r><w:r><w:rPr><w:b/><w:sz w:val="24"/></w:rPr><w:br/></w:r><w:r><w:rPr><w:b/> <w:sz w:val="24"/></w:rPr><w:br/></w:r><aml:annotation aml:id="2" w:type="Word.Bookmark.Start" w:name="something"/> <w:r><w:rPr><w:sz w:val="24"/> </w:rPr><w:t>THIS IS THE TEXT I WANT TO EXTRACT!</w:t></w:r><w:r><w:rPr><w:sz w:val="24"/></w:rPr><w:br/> </w:r><aml:annotation aml:id="2" w:type="Word.Bookmark.End"/><w:r><w:rPr><w:sz w:val="24"/></w:rPr><w:br/></w:r><w:r><w:rPr> <w:sz w:val="24"/></w:rPr><w:br/></w:r><w:r><w:rPr><w:sz w:val="24"/><w:highlight w:val="light-gray"/></w:rPr><w:t>NO MATCH</w:t> I want to match the text between "Word.Bookmark.Start" and "Word.Bookmark.End", but not the tags inbetween. So the match of the RegEx in the example above should be: "THIS IS THE TEXT I WANT TO EXTRACT!" I do not want to extract the text "NO MATCH", since it is not located between "Word.Bookmark.Start" and "Word.Bookmark.End". I wanna gather all matches in an array. Please help! Many thanks Quote Link to comment Share on other sites More sharing options...
Philip Posted July 30, 2010 Share Posted July 30, 2010 And why are we using regex instead of a xml parser or something of the like? Quote Link to comment Share on other sites More sharing options...
WillyTheFish Posted July 30, 2010 Author Share Posted July 30, 2010 Well I tried simplexml_load_file() but that resulted in an empty object... but to be honest, i don't know how to do it... any suggestions? Quote Link to comment Share on other sites More sharing options...
sasa Posted July 31, 2010 Share Posted July 31, 2010 try <?php $test = '</w:r><w:r><w:rPr><w:b/><w:sz w:val="24"/></w:rPr><w:br/></w:r><w:r><w:rPr><w:b/> <w:sz w:val="24"/></w:rPr><w:br/></w:r><aml:annotation aml:id="2" w:type="Word.Bookmark.Start" w:name="something"/> <w:r><w:rPr><w:sz w:val="24"/> </w:rPr><w:t>THIS IS THE TEXT I WANT TO EXTRACT!</w:t></w:r><w:r><w:rPr><w:sz w:val="24"/></w:rPr><w:br/> </w:r><aml:annotation aml:id="2" w:type="Word.Bookmark.End"/><w:r><w:rPr><w:sz w:val="24"/></w:rPr><w:br/></w:r><w:r><w:rPr> <w:sz w:val="24"/></w:rPr><w:br/></w:r><w:r><w:rPr><w:sz w:val="24"/><w:highlight w:val="light-gray"/></w:rPr><w:t>NO MATCH</w:t>'; preg_match('/"Word\.Bookmark\.Start"[^>]*>(.*)<[^>]*"Word\.Bookmark\.End"/s', $test, $out); $out = strip_tags($out[1]); print_r($out); ?> Quote Link to comment Share on other sites More sharing options...
WillyTheFish Posted July 31, 2010 Author Share Posted July 31, 2010 thanks sasa, almost... this regex gives me two arrays: array(2) { [0]=> string(242) ""Word.Bookmark.Start" w:name="something"/> THIS IS THE TEXT I WANT TO EXTRACT! string(147) " THIS IS THE TEXT I WANT TO EXTRACT! " } could you have another look at it? thank you so much! Quote Link to comment Share on other sites More sharing options...
sasa Posted July 31, 2010 Share Posted July 31, 2010 you want 2nd element of array, isn't it? Quote Link to comment Share on other sites More sharing options...
WillyTheFish Posted July 31, 2010 Author Share Posted July 31, 2010 I tested your regex with another file, but that didn't work at all the format is still the same though. But yeah, the second element of the array is correct! Quote Link to comment Share on other sites More sharing options...
sasa Posted July 31, 2010 Share Posted July 31, 2010 can you post 2nd string Quote Link to comment Share on other sites More sharing options...
WillyTheFish Posted July 31, 2010 Author Share Posted July 31, 2010 ops, sorry i'm stupid, forgot to print it right... actually, both elements are incorrect Array ( [0] => "Word.Bookmark.Start" w:name="something"/> <w:r><w:rPr><w:sz w:val="24"/> </w:rPr><w:t>THIS IS THE TEXT I WANT TO EXTRACT!</w:t></w:r><w:r><w:rPr><w:sz w:val="24"/></w:rPr><w:br/> </w:r><aml:annotation aml:id="2" w:type="Word.Bookmark.End" [1] => <w:r><w:rPr><w:sz w:val="24"/> </w:rPr><w:t>THIS IS THE TEXT I WANT TO EXTRACT!</w:t></w:r><w:r><w:rPr><w:sz w:val="24"/></w:rPr><w:br/> </w:r> ) Quote Link to comment Share on other sites More sharing options...
sasa Posted July 31, 2010 Share Posted July 31, 2010 do you use strip_tags function on 2nd element of the array? Quote Link to comment Share on other sites More sharing options...
WillyTheFish Posted July 31, 2010 Author Share Posted July 31, 2010 nope, unfortunately i cannot use strip_tags... i have to solve this using regex, because i need to determine the position where the data was extracted later on and will be using the same regex twice =/ Quote Link to comment Share on other sites More sharing options...
sasa Posted July 31, 2010 Share Posted July 31, 2010 <?php $test = '</w:r><w:r><w:rPr><w:b/><w:sz w:val="24"/></w:rPr><w:br/></w:r><w:r><w:rPr><w:b/> <w:sz w:val="24"/></w:rPr><w:br/></w:r><aml:annotation aml:id="2" w:type="Word.Bookmark.Start" w:name="something"/> <w:r> <w:rPr><w:sz w:val="24"/> <blkah> <blah> </w:rPr><w:t>THIS IS THE TEXT I WANT TO EXTRACT!</w:t></w:r><w:r><w:rPr><w:sz w:val="24"/></w:rPr><w:br/> </w:r><aml:annotation aml:id="2" w:type="Word.Bookmark.End"/><w:r><w:rPr><w:sz w:val="24"/></w:rPr><w:br/></w:r><w:r><w:rPr> <w:sz w:val="24"/></w:rPr><w:br/></w:r><w:r><w:rPr><w:sz w:val="24"/><w:highlight w:val="light-gray"/></w:rPr><w:t>NO MATCH</w:t>'; preg_match('/"Word\.Bookmark\.Start".*?>[\n\s]*([^<\n\s][^<\n]+)<.*?"Word\.Bookmark\.End"/s', $test, $out); //$out = strip_tags($out[1]); print_r($out); ?> Quote Link to comment Share on other sites More sharing options...
WillyTheFish Posted August 1, 2010 Author Share Posted August 1, 2010 ahh thank god.. works fine with preg_match_all(). First array is trash though, but i can live with that, second one is beautiful thanks alot sasa!!!!!! Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.