Jump to content


Photo

parsing non-well formed XML


  • Please log in to reply
2 replies to this topic

#1 bhikkhu

bhikkhu
  • New Members
  • Pip
  • Newbie
  • 5 posts

Posted 19 June 2006 - 03:03 PM

I'm new here and I've been trying to find the answer to my question without posting but I have been unsuccessful. If this has already been covered, please provide me with a link, and accept my apology for asking the same question again. As I begin using these forums more often, it won't happen.

I am trying to parse out an XML file, but the file itself is not well formed. Here is an example snippet.

<root>
   <book id="foo">
      <chapter id="1">
         <sentence id="1" />This is the 'CDATA' that I need..
         <sentence id="2" />Another sentence example...
      </chapter>
   </book>
</root>

I need to pull out the data from the sentence, but it isn't wrapped in <sentence>data</sentence> format. The sentence element is closed immediately.

I know this is a bit of an XML question, but I can't change the XML, I have to parse it as it is, and I'm using PHP to do it.

Any help is greatly appreciated. I don't have the code I've got in front of me, but it isn't much anyway, and I'm really just looking for direction.

Thanks again.

#2 ToonMariner

ToonMariner
  • Members
  • PipPipPip
  • Advanced Member
  • 3,342 posts
  • LocationNewcastle upon Tyne, UK

Posted 19 June 2006 - 03:35 PM

For xml to be valid it MUST have a closing tag for each element!

That particular file is not valid xml and should not be parsed by any compliant app (only a superdooper error friendly one MAY still do it but as far as I am aware, or concerned for that matter, that example should fail hands down.

You could make it valid by parsing the content of the file and looking for elements with no closing tag and give them one (no entendre!!!!).
follow me on twitter @PHPsycho

#3 bhikkhu

bhikkhu
  • New Members
  • Pip
  • Newbie
  • 5 posts

Posted 19 June 2006 - 03:46 PM

[!--quoteo(post=385665:date=Jun 19 2006, 11:35 AM:name=ToonMariner)--][div class=\'quotetop\']QUOTE(ToonMariner @ Jun 19 2006, 11:35 AM) View Post[/div][div class=\'quotemain\'][!--quotec--]You could make it valid by parsing the content of the file and looking for elements with no closing tag and give them one (no entendre!!!!).
[/quote]
Thanks for the reply. I thought about this... but, ironically... how do I parse it to add the closing tag?!?!

[img src=\"style_emoticons/[#EMO_DIR#]/unsure.gif\" style=\"vertical-align:middle\" emoid=\":unsure:\" border=\"0\" alt=\"unsure.gif\" /] I'm really out of luck here aren't I?... [img src=\"style_emoticons/[#EMO_DIR#]/wink.gif\" style=\"vertical-align:middle\" emoid=\":wink:\" border=\"0\" alt=\"wink.gif\" /] [img src=\"style_emoticons/[#EMO_DIR#]/wink.gif\" style=\"vertical-align:middle\" emoid=\":wink:\" border=\"0\" alt=\"wink.gif\" /]

Thanks again. [img src=\"style_emoticons/[#EMO_DIR#]/smile.gif\" style=\"vertical-align:middle\" emoid=\":smile:\" border=\"0\" alt=\"smile.gif\" /]




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users