Jump to content

XML parsing with PHP question


sKunKbad

Recommended Posts

This script I got from w3schools parses XML, but i dont want it to output the character data from one of the elements. For instance, if element "target" is included in the XML file, the script outputs "no print". Is there a simple change to the code so that I can specify that I dont want the character data of certain elements to be output? Id like to be able to declare that multiple elements character data not be output.

 

The script:

<?php

//Initialize the XML parser
$parser=xml_parser_create();

//Function to use at the start of an element
function start($parser,$element_name,$element_attrs)
  {
  switch($element_name)
    {
    case "NOTE":
    echo "-- Note --<br />";
    break;
case "TARGET":
    echo "";
    break;
    case "TO":
    echo "To: ";
    break; 
    case "FROM":
    echo "From: ";
    break; 
    case "HEADING":
    echo "Heading: ";
    break; 
    case "BODY":
    echo "Message: ";
    }
  }

//Function to use at the end of an element
function stop($parser,$element_name)
  {
  echo "<br />";
  }

//Function to use when finding character data
function char($parser,$data)
  {
  echo $data;
  }

//Specify element handler
xml_set_element_handler($parser,"start","stop");

//Specify data handler
xml_set_character_data_handler($parser,"char");

//Open XML file
$fp=fopen("test.xml","r"); //the R means "read only"

//Read data
while ($data=fread($fp,4096))
  {
  xml_parse($parser,$data,feof($fp)) or //feof tests for end of file on the file pointer
  die (sprintf("XML Error: %s at line %d", 
  xml_error_string(xml_get_error_code($parser)),
  xml_get_current_line_number($parser)));
  }

//Free the XML parser
xml_parser_free($parser);

?>

 

The XML:

<?xml version="1.0" encoding="ISO-8859-1"?>
<note>
<to>Tove</to>
<target>no print</target>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
<whatever>yes</whatever>
</note>

Link to comment
Share on other sites

$parser=xml_parser_create();

//Function to use at the start of an element
function start($parser,$element_name,$element_attrs)
  {
  switch(strtolower($element_name))    {
    case "note":
    echo "-- Note --<br />";
    break;
case "target:
    echo "";
    break;
    case "to":
    echo "To: ";
    break; 
    case "from":
    echo "From: ";
    break; 
    case "heading":
    echo "Heading: ";
    break; 
    case "body":
    echo "Message: ";
    }

 

I may be wrong but it might be case sensitive, if so try the above code and see if it works for you.

Link to comment
Share on other sites

Try this


$arfiles= array("basic2.xml","dataset.xml"); 

foreach($arfiles as $arFile) 
{ 
    echo "<BR>-------------------------------------------------<BR>"; 
    echo "<BR>Parsing ".$arFile."<BR>"; 
    $insXmlParser= new clsXmlParser($arFile); 
     if($aArray=$insXmlParser->Parse()) {
    echo "<pre></pre>";
print_r($aArray);
//echo LIST_CONTENTS($aArray); 
}
    echo "<BR> -------------------------------------------------<tag>"; 
} 
// Simple XML Parser  

class clsXmlParser { 

// general vars 
var $sTitle = ""; 
var $sLink = ""; 
var $sDescription = ""; 
var $arItems = array(); 
var $arsub = array(); 
var $itemCount = 0; 
var $prvTag="";   
var $uFiles = ''; 
var $xml_parser; 
var $curTag=""; 

function clsXmlParser($uFiles) 
{ 
     
    $this->uFiles = $uFiles; 
     
} 

function startElement($parser, $name, $attrs) { 
  
  $this->curTag .= "^$name";   
  //echo "start:  ".$this->curTag." <BR>"; 
  
} 

function endElement($parser, $name) 
{ 
   
  $caret_pos = strrpos($this->curTag,'^'); 
  $this->curTag = substr($this->curTag,0,$caret_pos); 
  //echo "end:  ".$this->curTag." <BR>"; 
} 

function characterData($parser, $data) {  
  
  if(trim($data) != "") 
  { 
        if(trim($this->prvTag) == "") 
              $this->prvTag=$this->curTag; 
      elseif(trim($this->prvTag) == trim($this->curTag)) 
      { 
          $this->arItems[] = $this->arsub; 
          $this->arsub = array(); 
         
        } 
   
        //find current element 
            $c_pos = strrpos($this->curTag,'^'); 
        $c_len = strlen($this->curTag); 
        $c_val = substr($this->curTag,($c_pos+1),$c_len); 

        //set data to sub array with the element name as the key 
        $this->arsub[$c_val] = $data; 
         
       
  } 

} 



function Parse() 
{ 
    $this->xml_parser = xml_parser_create(); 
         
     
    xml_set_object($this->xml_parser, &$this); 
     
    xml_set_element_handler($this->xml_parser, "startElement", "endElement"); 
    xml_set_character_data_handler($this->xml_parser, "characterData"); 
     
    if (!($fp = fopen($this->uFiles,"r")))  
    { 
      die ("could not open XML for input"); 
    } 
     
    while ($data = fread($fp, 4096))  
    { 
      if (!xml_parse($this->xml_parser, $data, feof($fp))) 
      { 
        die(sprintf("XML error: %s at line %d", xml_error_string(xml_get_error_code($this->xml_parser)), xml_get_current_line_number($this->xml_parser))); 
      } 
    } 
    xml_parser_free($this->xml_parser); 
         
    //to handle the last array element 
    if(count($this->arsub)>0) 
    { 
         $this->arItems[] = $this->arsub; 
         $this->arsub = array(); 
    } 
         
    return $this->arItems; 

} 

} 

///----------END OF CLASS 


function LIST_CONTENTS($arrayname,$tab="&nbsp&nbsp&nbsp&nbsp",$indent=0)  
    {    
    // recursively displays contents of the array and sub-arrays:  
        // This function (c) Peter Kionga-Kamau  
        // Free for unrestricted use, except sale - do not resell.  
            // use: echo LIST_CONTENTS(array $arrayname, string $tab, int $indent);  
        // $tab = string to use as a tab, $indent = number of tabs to indent result  
        $retval=$currenttab=""; 
        while(list($key, $value) = each($arrayname))  
        {  
               for($i=0; $i<$indent; $i++) $currenttab .= $tab;  
            if (is_array($value))  
            {  
                $retval .= "$currenttab$key : Array: <BR>$currenttab{<BR>";  
                  $retval .= LIST_CONTENTS($value,$tab,$indent+1)."$currenttab}<BR>";  
            }  
               else $retval .= "$currenttab$key => $value<BR>";  
               $currenttab = NULL;  
        }  
        return $retval;  
    }  



Link to comment
Share on other sites

frost110, thats doesn't change the output. What I'm trying to achieve is to not have the character data of what is between the <target> element and <whatever> element tags (and possibly more).

 

jitesh, im trying to learn php, and my code is already hard enough to understand!!!

 

 

Link to comment
Share on other sites

You can always remove those elements before you parse it. IE:

 

$file = file_get_contents'filexml.xml');
list($before, $after) = spliti("<target>", $file);
list(, $after) = spliti("</target>", $after);

$file = $before . $after;

// do the xml code here.

 

See if that works for ya.

Link to comment
Share on other sites

actually, thanks! I did some shifting around of code, and this worked:

 

while ($data=fread($fp,4096))
  {
  

  list($before, $after) = spliti("<target>", $data);
  list(, $after) = spliti("</target>", $after);
  $data = $before . $after;
  
  list($before, $after) = spliti("<whatever>", $data);
  list(, $after) = spliti("</whatever>", $after);
  $data = $before . $after;
  
  xml_parse($parser,$data,feof($fp)) or //feof tests for end-of-file on the file pointer
  die (sprintf("XML Error: %s at line %d", 
  xml_error_string(xml_get_error_code($parser)),
  xml_get_current_line_number($parser)));
  }

 

it would be great if there was some sort of array that would list the elements i dont want included, but this is a good lesson for me tonight.

Link to comment
Share on other sites

$listArr = array("target", "whatever");
while ($data=fread($fp,4096))
  {
foreach ($listArr as $split) {
     list($before, $after) = spliti("<".$split.">", $data);
     list(, $after) = spliti("</".$split.">", $after);
     $data = $before . $after;
  }
  
  xml_parse($parser,$data,feof($fp)) or //feof tests for end-of-file on the file pointer
  die (sprintf("XML Error: %s at line %d", 
  xml_error_string(xml_get_error_code($parser)),
  xml_get_current_line_number($parser)));
  }

Link to comment
Share on other sites

This array works, however, one thing I noticed with both the array and non-array, is that if an element is listed more than once, there is an error "XML Error:no element found on line 9" which is the line in the XML where the element is listed the second time.

 

 

<?xml version="1.0" encoding="ISO-8859-1"?>
<note>
<to>Tove</to>
<target>no print</target>           // <-- this is the first instance of <target/>, which is removed along with it's character data "no print".
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
<to>Tove</to>
<target>no print</target>           // <-- this is the second instance of <target/>, on line 9, which would also need to be removed.
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
<whatever>yes</whatever>
</note>

Link to comment
Share on other sites

$listArr = array("target", "whatever");
while ($data=fread($fp,4096))
  {
foreach ($listArr as $split) {
    while (eregi($split, $data)) {
         list($before, $after) = spliti("<".$split.">", $data, 2);
         list(, $after) = spliti("</".$split.">", $after, 2);
         $data = $before . $after;
     }
  }
  
  xml_parse($parser,$data,feof($fp)) or //feof tests for end-of-file on the file pointer
  die (sprintf("XML Error: %s at line %d", 
  xml_error_string(xml_get_error_code($parser)),
  xml_get_current_line_number($parser)));
  }

 

should work.

Link to comment
Share on other sites

frost110,

Thanks for helping me so far. I need some time to study the code, because I'd like to understand it more before proceeding (and right now I'm at work and probably shouldn't be working on this!). My goal is to display products from Amazon on my website, and have a shopping cart, and a search. There are other scripts out there to do this, but I just want to make my own. I've wanted to become an experienced php programmer for a long time, but it just seems so hard to learn for me. How did you learn php?

thanks again,

brian

Link to comment
Share on other sites

www.php.net

 

The best learning place there is. I actually started with JScript so I had all the basics, but yea 100% self-taught.

 

All I did was search php.net for what I wanted to do and I have yet had the need to ask anyone for help of any type with PHP. Everything I need to know or do is in that manual. 7 years and still never had the need to ask for help thanks to php.net =)

Link to comment
Share on other sites

Well, you certainly seem to whip out the code like you know it well. Is this just a hobby for you, or part of your job?

 

I was wondering tonight about something. You have shown me how to remove elements from the output. What if the opposite was required? How would you create an array and impliment outputting only the elements in the array?

 

Tonight I worked on adding style to the output. I don't know if this would ever help anybody else learn what I'm trying to learn, but here is what I got so far:

 

the xml in a file called test.xml:

<?xml version="1.0" encoding="ISO-8859-1"?>
<note>
<to>Tove</to>
<target>no print</target>
<from>Jani</from>
<turkey>
<heading>Reminder</heading>
</turkey>
<chicken>
<heading>Chicken Reminder</heading>
</chicken>
<body>Don't forget me this weekend!</body>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
<whatever>yes</whatever>
</note>

 

The parser with some style applied:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
   "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
<head>
<title>Title</title>
<style type="text/css">
#parsewrap {margin:0px auto; border: solid red 2px; width:500px;}
</style>
</head>

<body>

<?php
echo "<div id=\"parsewrap\" style=\"text-align:center;\">";

$parser=xml_parser_create();

function start($parser,$element_name,$element_attrs)
  {
  switch(strtolower($element_name))
    {
    case "note":
    echo "<div style=\"color:red;background-color:yellow;\">RED TEXT";
    break;
    case "to":
    echo "</div><div style=\"color:green;\">To:";
    break; 
    case "from":
    echo "</div><div style=\"color:blue;\">From: ";
    break; 
    case "heading":
    echo "</div><div style=\"color:purple;\">Heading: ";
    break; 
    case "body":
    echo "</div><div style=\"color:orange; background-color:black;\">Message: ";
    }
  }

function stop($parser,$element_name)
  {
  echo "";
  }

function char($parser,$data)
  {
  echo $data;
  }

xml_set_element_handler($parser,"start","stop");

xml_set_character_data_handler($parser,"char");

$fp=fopen("test.xml","r"); 

$listArr = array("target", "whatever", "hork", "tacos", "chicken");
while ($data=fread($fp,4096))
  {
foreach ($listArr as $split) {
    while (eregi($split, $data)) { 
         list($before, $after) = spliti("<".$split.">", $data, 2); 
         list(, $after) = spliti("</".$split.">", $after, 2);
         $data = $before . $after;
     }
  }
  
  xml_parse($parser,$data,feof($fp)) or 
  die (sprintf("XML Error: %s at line %d", 
  xml_error_string(xml_get_error_code($parser)),
  xml_get_current_line_number($parser)));
  }

xml_parser_free($parser);

?>
</div>
</div>
</body>
</html>

Link to comment
Share on other sites

The adding is a bit easier. All you have to do is find something that is static, such as the </body> tag and replace it with something like this:

 

$data = str_replace("</body>", "</body>\n<newelement>datajhere</newelement>", $data);

 

But yea it used to be a job but now it is just a hobby.

Link to comment
Share on other sites

For now, I need to try to understand the foreach construct, and list array functions in the while loop. I really want to understand and not just be a copy and paster! So, until then, I guess I'll be studying.

Thanks for your help frost110!

Link to comment
Share on other sites

OK, now i understand the code I have so far, thanks to this site and php.net, and I'm working on this new code which is trying to only include certain elements that are in the $listArr array, but it doesn't work...

 

<?php

$parser=xml_parser_create();

function start($parser,$element_name,$element_attrs)
  {
  switch(strtolower($element_name))
    {
    case "note":
    echo "\t<div style=\"color:red;background-color:yellow;\">\n\t\tRED TEXT";
    break;
    case "to":
    echo "\n\t</div>\n\t<div style=\"color:green;\">\n\t\tTo: ";
    break; 
    case "from":
    echo "\n\t</div>\n\t<div style=\"color:blue;\">\n\t\tFrom: ";
    break; 
    case "heading":
    echo "\n\t</div>\n\t<div style=\"color:purple;\">\n\t\tHeading: ";
    break; 
    case "body":
    echo "\n\t</div>\n\t<div style=\"color:orange; background-color:black;\">\n\t\tMessage: ";
    }
  }

function stop($parser,$element_name)
  {
  echo "";
  }

function char($parser,$content)
  {
  echo trim($content);
  }

xml_set_element_handler($parser,"start","stop");

xml_set_character_data_handler($parser,"char");

$fp=fopen("test.xml","r"); 
$listArr = array("to", "from", "heading", "body"); 
$content = "";
while ($data=fread($fp,4096)) 
  {
foreach ($listArr as $split) { 
    while (eregi($split, $data)) {
list($before, $after) = spliti("<".$split.">", $data, 2);
list($content, $after) = spliti("</".$split.">", $after, 2);
$data = $before . $after;
$content++
}
  }  
  xml_parse($parser,$data,feof($fp)) or 
  die (sprintf("XML Error: %s at line %d", 
  xml_error_string(xml_get_error_code($parser)),
  xml_get_current_line_number($parser)));
  }

xml_parser_free($parser);
?>

 

I know you gave me the advice on using something like this:

$data = str_replace("</body>", "</body>\n<newelement>datajhere</newelement>", $data);

But I'm not trying to replace anything am I... heck I dont know!!

Link to comment
Share on other sites

In order to add it in there you can do that replace method, or if you have each line you can simply add a line to the new line.

 

Whichever works. I prefer the replace method as long as there will always be a static element such as body. If there is not than simply add an extra line to $data before you attempt to parse it.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.