Jump to content

Extract text from nexted tags


tibberous

Recommended Posts

I have some html that looks like this.

 

<body>

 

<td id="top" align="center">

<div id="topNav">Rate

PeopleMeet

PeopleBest OfMeet Jim and

James</div>

<div id="userPanel">

 

<td>

Login</td>

<td>Join<br>

HOTorNOT</td>

</body>

 

This html will always have the right amount of opening to closing tags. Basically, I just need the content from the inner cells. If a tag is nested, like <div>Outter div<div>Hello</div>text</div>, then I want the inner div to get the value Hello, and the outer div to get the value 'outer div text'. I'm not sure if I should use xml functions for this, or explodes, or preg_match and replace. Anyone know a good way to do this?

Link to comment
Share on other sites

I almost got this. Basically I am cleaning it with Tidy, cleaning it with str_replace and preg_replace, wrapping it in a root node and then parsing it as XML.

 

Only think that is still giving me a problem is with html entities (&lt, &nbsp, &gt, ect). I replaced &nbsp with ' ', but I can't do the same thing for &lt because it will break my xml. Is there some way I can have the PHP XML parser ignore HTML entities? I thought about replacing & with a random string, then replacing it back, but it seems inefficient and tacked together.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.