Jump to content

Extract text from nexted tags


tibberous

Recommended Posts

I have some html that looks like this.

 

<body>

 

<td id="top" align="center">

<div id="topNav">Rate

PeopleMeet

PeopleBest OfMeet Jim and

James</div>

<div id="userPanel">

 

<td>

Login</td>

<td>Join<br>

HOTorNOT</td>

</body>

 

This html will always have the right amount of opening to closing tags. Basically, I just need the content from the inner cells. If a tag is nested, like <div>Outter div<div>Hello</div>text</div>, then I want the inner div to get the value Hello, and the outer div to get the value 'outer div text'. I'm not sure if I should use xml functions for this, or explodes, or preg_match and replace. Anyone know a good way to do this?

Link to comment
https://forums.phpfreaks.com/topic/63606-extract-text-from-nexted-tags/
Share on other sites

I almost got this. Basically I am cleaning it with Tidy, cleaning it with str_replace and preg_replace, wrapping it in a root node and then parsing it as XML.

 

Only think that is still giving me a problem is with html entities (&lt, &nbsp, &gt, ect). I replaced &nbsp with ' ', but I can't do the same thing for &lt because it will break my xml. Is there some way I can have the PHP XML parser ignore HTML entities? I thought about replacing & with a random string, then replacing it back, but it seems inefficient and tacked together.

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.