Jump to content

parsing non-well formed XML


bhikkhu

Recommended Posts

I'm new here and I've been trying to find the answer to my question without posting but I have been unsuccessful. If this has already been covered, please provide me with a link, and accept my apology for asking the same question again. As I begin using these forums more often, it won't happen.

I am trying to parse out an XML file, but the file itself is not well formed. Here is an example snippet.

[code]
<root>
   <book id="foo">
      <chapter id="1">
         <sentence id="1" />This is the 'CDATA' that I need..
         <sentence id="2" />Another sentence example...
      </chapter>
   </book>
</root>
[/code]

I need to pull out the data from the sentence, but it isn't wrapped in <sentence>data</sentence> format. The sentence element is closed immediately.

I know this is a bit of an XML question, but I can't change the XML, I have to parse it as it is, and I'm using PHP to do it.

Any help is greatly appreciated. I don't have the code I've got in front of me, but it isn't much anyway, and I'm really just looking for direction.

Thanks again.
Link to comment
Share on other sites

For xml to be valid it MUST have a closing tag for each element!

That particular file is not valid xml and should not be parsed by any compliant app (only a superdooper error friendly one MAY still do it but as far as I am aware, or concerned for that matter, that example should fail hands down.

You could make it valid by parsing the content of the file and looking for elements with no closing tag and give them one (no entendre!!!!).
Link to comment
Share on other sites

[!--quoteo(post=385665:date=Jun 19 2006, 11:35 AM:name=ToonMariner)--][div class=\'quotetop\']QUOTE(ToonMariner @ Jun 19 2006, 11:35 AM) [snapback]385665[/snapback][/div][div class=\'quotemain\'][!--quotec--]You could make it valid by parsing the content of the file and looking for elements with no closing tag and give them one (no entendre!!!!).
[/quote]
Thanks for the reply. I thought about this... but, ironically... how do I parse it to add the closing tag?!?!

[img src=\"style_emoticons/[#EMO_DIR#]/unsure.gif\" style=\"vertical-align:middle\" emoid=\":unsure:\" border=\"0\" alt=\"unsure.gif\" /] I'm really out of luck here aren't I?... [img src=\"style_emoticons/[#EMO_DIR#]/wink.gif\" style=\"vertical-align:middle\" emoid=\":wink:\" border=\"0\" alt=\"wink.gif\" /] [img src=\"style_emoticons/[#EMO_DIR#]/wink.gif\" style=\"vertical-align:middle\" emoid=\":wink:\" border=\"0\" alt=\"wink.gif\" /]

Thanks again. [img src=\"style_emoticons/[#EMO_DIR#]/smile.gif\" style=\"vertical-align:middle\" emoid=\":smile:\" border=\"0\" alt=\"smile.gif\" /]
Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.