Jump to content

[SOLVED] Parse Large XML Document


NJordan72

Recommended Posts

Hello,

 

I'm trying to edit a rather large XML Document (52MB) and the following code works, but it takes over 3 hours to run.  I was wondering if anyone could point me in the direction of something more efficient.

 

What I'm trying to do:  I'm receiving this large XML file from a reporting API.  The API returns multiple unneeded "subtotal" rows (identified by the type attribute being equal to subtotal in any ROW element) for each useful row of data.  Basically I'm trying to remove all of those subtotal rows and am piping the results into a new XML file.

 

 

The PHP:

<?php
// process the report
$xmlDoc = new DOMDocument();
$xmlDoc->load("test3.xml");
$root = $xmlDoc->documentElement;
$rows = $xmlDoc->getElementsByTagName('ROW');

for ($i = $rows->length - 1; $i >=0; $i--) {
  if($rows->item($i)->getAttribute('type') == "subtotal"){
	$rows->item($i)->parentNode->removeChild($rows->item($i));
  }
}

print $xmlDoc->saveXML();
?>

 

An XML Snippit

<?xml version="1.0" encoding="ISO-8859-9"?>
<RWResponse>
<RESPONSE>
<DATA type="hi">
<ROW>
  <COLUMN data_type="text" id="1" >Some Name</COLUMN>
  <COLUMN data_type="text" id="119886" >Another Value</COLUMN>
  <COLUMN data_type="text" id="24251" >Address</COLUMN>
  <COLUMN data_type="text" id="110191" >Pass-through</COLUMN>
  <COLUMN data_type="text" id="6" >728x90</COLUMN>
  <COLUMN data_type="text" >728x90</COLUMN>
  <COLUMN data_type="text" id="1" >United States</COLUMN>
  <COLUMN data_type="text" >Advertiser learning</COLUMN>
  <COLUMN data_type="text" >Not applicable (CPM or Dynamic)</COLUMN>
  <COLUMN data_type="text" >Linked</COLUMN>
  <COLUMN data_type="text" >Managed</COLUMN>
  <COLUMN data_type="text" >Oct 8, 2007 00:00</COLUMN>
  <COLUMN data_type="numeric" >186</COLUMN>
  <COLUMN data_type="numeric" >0</COLUMN>
  <COLUMN data_type="numeric" >0</COLUMN>
  <COLUMN data_type="money" >0.1468720390000</COLUMN>
  <COLUMN data_type="money" >0.1468720390000</COLUMN>
</ROW>
<ROW type="subtotal">
  <COLUMN data_type="text" id="1" >Some Name</COLUMN>
  <COLUMN data_type="text" id="119886" >Another Value</COLUMN>
  <COLUMN data_type="text" id="24251" >Address</COLUMN>
  <COLUMN data_type="text" id="110191" >Pass-through</COLUMN>
  <COLUMN data_type="text" id="6" >728x90</COLUMN>
  <COLUMN data_type="text"/>
  <COLUMN data_type="text" id="1" >United States</COLUMN>
  <COLUMN data_type="text"/>
  <COLUMN data_type="text"/>
  <COLUMN data_type="text"/>
  <COLUMN data_type="text"/>
  <COLUMN data_type="text"/>
  <COLUMN data_type="numeric" >186</COLUMN>
  <COLUMN data_type="numeric" >0</COLUMN>
  <COLUMN data_type="numeric" >0</COLUMN>
  <COLUMN data_type="money" >0.1468720390000</COLUMN>
  <COLUMN data_type="money" >0.1468720390000</COLUMN>
</ROW>
<ROW type="subtotal">
  <COLUMN data_type="text" id="1" >Some Name</COLUMN>
  <COLUMN data_type="text" id="119886" >Another Value</COLUMN>
  <COLUMN data_type="text" id="24251" >Address</COLUMN>
  <COLUMN data_type="text" id="110191" >Pass-through</COLUMN>
  <COLUMN data_type="text" id="6" >728x90</COLUMN>
  <COLUMN data_type="text"/>
  <COLUMN data_type="text" id="1" >United States</COLUMN>
  <COLUMN data_type="text"/>
  <COLUMN data_type="text"/>
  <COLUMN data_type="text"/>
  <COLUMN data_type="text"/>
  <COLUMN data_type="text"/>
  <COLUMN data_type="numeric" >186</COLUMN>
  <COLUMN data_type="numeric" >0</COLUMN>
  <COLUMN data_type="numeric" >0</COLUMN>
  <COLUMN data_type="money" >0.1468720390000</COLUMN>
  <COLUMN data_type="money" >0.1468720390000</COLUMN>
</ROW>
<ROW type="subtotal">
  <COLUMN data_type="text" id="1" >Some Name</COLUMN>
  <COLUMN data_type="text" id="119886" >Another Value</COLUMN>
  <COLUMN data_type="text" id="24251" >Address</COLUMN>
  <COLUMN data_type="text" id="110191" >Pass-through</COLUMN>
  <COLUMN data_type="text" id="6" >728x90</COLUMN>
  <COLUMN data_type="text"/>
  <COLUMN data_type="text" id="1" >United States</COLUMN>
  <COLUMN data_type="text"/>
  <COLUMN data_type="text"/>
  <COLUMN data_type="text"/>
  <COLUMN data_type="text"/>
  <COLUMN data_type="text"/>
  <COLUMN data_type="numeric" >186</COLUMN>
  <COLUMN data_type="numeric" >0</COLUMN>
  <COLUMN data_type="numeric" >0</COLUMN>
  <COLUMN data_type="money" >0.1468720390000</COLUMN>
  <COLUMN data_type="money" >0.1468720390000</COLUMN>
</ROW>
</DATA>
<METADATA rows="60863" columns="17" domain="network" timeend="2007-10-08 01:00:00" timestart="2007-10-08 00:00:00"/>
</RESPONSE>
</RWResponse>

 

Thanks for any help.

Link to comment
Share on other sites

For the record, I think I got it...

 

<?xml version='1.0'?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="xml" indent="yes"/>

<xsl:template match="@* | node()">
     <xsl:copy>
       <xsl:apply-templates select="@* | node()" />
     </xsl:copy>
</xsl:template>

<xsl:template match="*[@type = 'subtotal']" />
</xsl:stylesheet> 

 

<?php
// load the xslt transformation
$xsl = new DomDocument(); 
$xsl->load("remove_subtotal.xsl");

// load the xml to transform
$xmlDoc = new DOMDocument();
$xmlDoc->load("test3.xml");

// create the xsld processor
$proc = new XsltProcessor(); 
$proc->registerPhpFunctions(); 

// import the stylesheet
$xsl = $proc->importStylesheet($xsl);

// transform the document
$newDoc = $proc->transformToDoc($xmlDoc); 

// output results
print $newDoc->saveXML();
?>

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.