Jump to content

regex for parsing span tags


dsaba

Recommended Posts

I need some help stripping certain text out of some span tags

here's a sample string:

<?php
$string = '<span onmouseover="_tipon(this)" onmouseout="_tipoff()" style="direction: rtl; text-align: right"><span class="google-src-text" style="direction: ltr; text-align: left">othertext</span>hello</span><br /> <span onmouseover="_tipon(this)" onmouseout="_tipoff()" style="direction: rtl; text-align: right"><span class="google-src-text" style="direction: ltr; text-align: left">blablabla</span>how</span><br /> <span onmouseover="_tipon(this)" onmouseout="_tipoff()" style="direction: rtl; text-align: right"><span class="google-src-text" style="direction: ltr; text-align: left">blaijsalk2</span>are</span><br /> <span onmouseover="_tipon(this)" onmouseout="_tipoff()" style="direction: rtl; text-align: right"><span class="google-src-text" style="direction: ltr; text-align: left">superblabla</span>you</span><br />';
?>

I want to strip everything out of the string except:

<?php
$newstring = 'hello<br />how<br />are<br />you<br />';
?>

 

how can I do this with regex or just any kind of parsing string algorithim?

I'm making this post for help after trying many different failed methods, so I have tried first to do it myself and have run out of ideas...so now i'm asking for the community's help/advice

 

-thank you

Link to comment
Share on other sites

Because of the similarity of the spans, and the randomness by which you've chosen spans to read, I'm pretty sure it's either gonna be the most complex regexp I've ever seen, or it'll be impossible.  (You could maybe try to write a regexp that does it based on the order of spans?  Or maybe one that does something to do with the order of the spans....

 

Are the spans that you're trying to pull text out of always in the same order, and are there always the same number of spans?  If both of those are yes, then I can write a regexp that'll pull the stuff out, but if not, then I have no idea ;p.

 

 

Edit: Fert's thing will work to grab text from all the span's but the way you worded your question, I'm under the impression that you only want text from certain spans (in which case you could use Fert's in conjunction with preg_replace and just use certain array keys of the variable set to the matches ;p)

Link to comment
Share on other sites

i edited fert's code to do this:

<?php
$string = '<span onmouseover="_tipon(this)" onmouseout="_tipoff()" style="direction: rtl; text-align: right"><span class="google-src-text" style="direction: ltr; text-align: left">othertext</span>hello</span><br /> <span onmouseover="_tipon(this)" onmouseout="_tipoff()" style="direction: rtl; text-align: right"><span class="google-src-text" style="direction: ltr; text-align: left">blablabla</span>how</span><br /> <span onmouseover="_tipon(this)" onmouseout="_tipoff()" style="direction: rtl; text-align: right"><span class="google-src-text" style="direction: ltr; text-align: left">blaijsalk2</span>are</span><br /> <span onmouseover="_tipon(this)" onmouseout="_tipoff()" style="direction: rtl; text-align: right"><span class="google-src-text" style="direction: ltr; text-align: left">superblabla</span>you</span><br />';
$array = explode('<br />', $string);
foreach ($array as $value) {
$str=preg_replace("/<span(.*)>(.*)<\/span>/","$2",$value);
$newArray[] = $str;
}

$fulltext = implode('<br>', $newArray);
echo $fulltext;
?>

 

it gives me my desired result :)

 

-thanks fert!

Link to comment
Share on other sites

ok this code does what I wanted before:

$newstring = preg_replace("/<span(.*)>(.*)<\/span>/","$2", $oldstring);

 

 

however I tried using it on this code and it doesn't do the job, i don' t know why

 

here is the original string:

<?php
$string = '<span onmouseover="_tipon(this)" onmouseout="_tipoff()" style="direction: rtl; text-align: right"><span class="google-src-text" style="direction: ltr; text-align: left">blablabla<a href="http://64.233.179.104/translate_c?hl=en&langpair=en%7Car&u=http://www.google.com/">blablalink</a></span>hello how are you <a href="http://64.233.179.104/translate_c?hl=en&langpair=en%7Car&u=http://www.google.com/">click here</a> </span>';
?>

 

here is the new desired result:

<?php
$string = 'hello how are you <a href="http://64.233.179.104/translate_c?hl=en&langpair=en%7Car&u=http://www.google.com/">click here</a>';
?>

 

there are two spans, one span inside of another, I want the text from inside the second span as you can see, it worked before, only difference now is there is a <a> tag as part of the text in the second span

 

What do I need to edit to this? that worked before?

$newstring = preg_replace("/<span(.*)>(.*)<\/span>/","$2", $oldstring);

Link to comment
Share on other sites

Hmmm I think you might be looking for

$string = '<span onmouseover="_tipon(this)" onmouseout="_tipoff()" style="direction: rtl; text-align: right"><span class="google-src-text" style="direction: ltr; text-align: left">blablabla<a href="http://64.233.179.104/translate_c?hl=en&langpair=en%7Car&u=http://www.google.com/">blablalink</a></span>hello how are you <a href="http://64.233.179.104/translate_c?hl=en&langpair=en%7Car&u=http://www.google.com/">click here</a> </span>';
$string = preg_replace("/<span (.*)>/Ui", "", $string);
$string = preg_replace("#<\/span>#Ui", "", $string);
echo $string;

 

I think I might have misunderstood what you're trying to get from the spans though....

Link to comment
Share on other sites

no corbin thats not what i'm looking for

 

let me try to re-explain myself, because fert had the right idea, nobody seems to know what i'm trying to do, which makes it harder huh if you're trying to help me?

 

breakin' it down now:

 

here is the original string:

<?php
$originalstring= '<span onmouseover="_tipon(this)" onmouseout="_tipoff()" style="direction: rtl; text-align: right"><span class="google-src-text" style="direction: ltr; text-align: left">blablabla<a href="http://www.blabla.com">blablalink</a></span>hello how are you <a href="http://www.clickhere.com">click here</a> </span>';
?>

 

I want to make this string into:

<?php
$newstring = 'hello how are you <a href="http://www.clickhere.com">click here</a>';
?>

 

some observations:

1. there are two spans in the original string

2. one span is inside of the other (something like that...)

3. the text that is in the $newstring is inside of the second span

 

Fert's code:

$newstring = preg_replace("/<span(.*)>(.*)<\/span>/","$2", $originalstring);

 

worked when there weren't any <a> tags in the text I want to keep (the one thats in the inside of the second span remember, look above)

 

So i need to edit this in some way to make it also accept the <a> tag as text to keep, this is where I NEED YOUR HELP

 

 

now corbit:

your code took everything out of BOTH spans not just the second span, but taking stuff only out of the second span can be done look at fert's preg_replace does it, I dont' know how but it does, how can I edit it to let it also work with the <a> tag as well?

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.