NJordan72 Posted October 10, 2007 Share Posted October 10, 2007 Hello, I'm trying to edit a rather large XML Document (52MB) and the following code works, but it takes over 3 hours to run. I was wondering if anyone could point me in the direction of something more efficient. What I'm trying to do: I'm receiving this large XML file from a reporting API. The API returns multiple unneeded "subtotal" rows (identified by the type attribute being equal to subtotal in any ROW element) for each useful row of data. Basically I'm trying to remove all of those subtotal rows and am piping the results into a new XML file. The PHP: <?php // process the report $xmlDoc = new DOMDocument(); $xmlDoc->load("test3.xml"); $root = $xmlDoc->documentElement; $rows = $xmlDoc->getElementsByTagName('ROW'); for ($i = $rows->length - 1; $i >=0; $i--) { if($rows->item($i)->getAttribute('type') == "subtotal"){ $rows->item($i)->parentNode->removeChild($rows->item($i)); } } print $xmlDoc->saveXML(); ?> An XML Snippit <?xml version="1.0" encoding="ISO-8859-9"?> <RWResponse> <RESPONSE> <DATA type="hi"> <ROW> <COLUMN data_type="text" id="1" >Some Name</COLUMN> <COLUMN data_type="text" id="119886" >Another Value</COLUMN> <COLUMN data_type="text" id="24251" >Address</COLUMN> <COLUMN data_type="text" id="110191" >Pass-through</COLUMN> <COLUMN data_type="text" id="6" >728x90</COLUMN> <COLUMN data_type="text" >728x90</COLUMN> <COLUMN data_type="text" id="1" >United States</COLUMN> <COLUMN data_type="text" >Advertiser learning</COLUMN> <COLUMN data_type="text" >Not applicable (CPM or Dynamic)</COLUMN> <COLUMN data_type="text" >Linked</COLUMN> <COLUMN data_type="text" >Managed</COLUMN> <COLUMN data_type="text" >Oct 8, 2007 00:00</COLUMN> <COLUMN data_type="numeric" >186</COLUMN> <COLUMN data_type="numeric" >0</COLUMN> <COLUMN data_type="numeric" >0</COLUMN> <COLUMN data_type="money" >0.1468720390000</COLUMN> <COLUMN data_type="money" >0.1468720390000</COLUMN> </ROW> <ROW type="subtotal"> <COLUMN data_type="text" id="1" >Some Name</COLUMN> <COLUMN data_type="text" id="119886" >Another Value</COLUMN> <COLUMN data_type="text" id="24251" >Address</COLUMN> <COLUMN data_type="text" id="110191" >Pass-through</COLUMN> <COLUMN data_type="text" id="6" >728x90</COLUMN> <COLUMN data_type="text"/> <COLUMN data_type="text" id="1" >United States</COLUMN> <COLUMN data_type="text"/> <COLUMN data_type="text"/> <COLUMN data_type="text"/> <COLUMN data_type="text"/> <COLUMN data_type="text"/> <COLUMN data_type="numeric" >186</COLUMN> <COLUMN data_type="numeric" >0</COLUMN> <COLUMN data_type="numeric" >0</COLUMN> <COLUMN data_type="money" >0.1468720390000</COLUMN> <COLUMN data_type="money" >0.1468720390000</COLUMN> </ROW> <ROW type="subtotal"> <COLUMN data_type="text" id="1" >Some Name</COLUMN> <COLUMN data_type="text" id="119886" >Another Value</COLUMN> <COLUMN data_type="text" id="24251" >Address</COLUMN> <COLUMN data_type="text" id="110191" >Pass-through</COLUMN> <COLUMN data_type="text" id="6" >728x90</COLUMN> <COLUMN data_type="text"/> <COLUMN data_type="text" id="1" >United States</COLUMN> <COLUMN data_type="text"/> <COLUMN data_type="text"/> <COLUMN data_type="text"/> <COLUMN data_type="text"/> <COLUMN data_type="text"/> <COLUMN data_type="numeric" >186</COLUMN> <COLUMN data_type="numeric" >0</COLUMN> <COLUMN data_type="numeric" >0</COLUMN> <COLUMN data_type="money" >0.1468720390000</COLUMN> <COLUMN data_type="money" >0.1468720390000</COLUMN> </ROW> <ROW type="subtotal"> <COLUMN data_type="text" id="1" >Some Name</COLUMN> <COLUMN data_type="text" id="119886" >Another Value</COLUMN> <COLUMN data_type="text" id="24251" >Address</COLUMN> <COLUMN data_type="text" id="110191" >Pass-through</COLUMN> <COLUMN data_type="text" id="6" >728x90</COLUMN> <COLUMN data_type="text"/> <COLUMN data_type="text" id="1" >United States</COLUMN> <COLUMN data_type="text"/> <COLUMN data_type="text"/> <COLUMN data_type="text"/> <COLUMN data_type="text"/> <COLUMN data_type="text"/> <COLUMN data_type="numeric" >186</COLUMN> <COLUMN data_type="numeric" >0</COLUMN> <COLUMN data_type="numeric" >0</COLUMN> <COLUMN data_type="money" >0.1468720390000</COLUMN> <COLUMN data_type="money" >0.1468720390000</COLUMN> </ROW> </DATA> <METADATA rows="60863" columns="17" domain="network" timeend="2007-10-08 01:00:00" timestart="2007-10-08 00:00:00"/> </RESPONSE> </RWResponse> Thanks for any help. Quote Link to comment https://forums.phpfreaks.com/topic/72647-solved-parse-large-xml-document/ Share on other sites More sharing options...
effigy Posted October 10, 2007 Share Posted October 10, 2007 An identity transform in XSLT would be the best (and quickest) approach. Quote Link to comment https://forums.phpfreaks.com/topic/72647-solved-parse-large-xml-document/#findComment-366282 Share on other sites More sharing options...
NJordan72 Posted October 10, 2007 Author Share Posted October 10, 2007 An identity transform in XSLT would be the best (and quickest) approach. Can you point me in the right direction? PHP and XML/XSLT aren't exactly my bread and butter. Thanks. Quote Link to comment https://forums.phpfreaks.com/topic/72647-solved-parse-large-xml-document/#findComment-366285 Share on other sites More sharing options...
NJordan72 Posted October 10, 2007 Author Share Posted October 10, 2007 For the record, I think I got it... <?xml version='1.0'?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:output method="xml" indent="yes"/> <xsl:template match="@* | node()"> <xsl:copy> <xsl:apply-templates select="@* | node()" /> </xsl:copy> </xsl:template> <xsl:template match="*[@type = 'subtotal']" /> </xsl:stylesheet> <?php // load the xslt transformation $xsl = new DomDocument(); $xsl->load("remove_subtotal.xsl"); // load the xml to transform $xmlDoc = new DOMDocument(); $xmlDoc->load("test3.xml"); // create the xsld processor $proc = new XsltProcessor(); $proc->registerPhpFunctions(); // import the stylesheet $xsl = $proc->importStylesheet($xsl); // transform the document $newDoc = $proc->transformToDoc($xmlDoc); // output results print $newDoc->saveXML(); ?> Quote Link to comment https://forums.phpfreaks.com/topic/72647-solved-parse-large-xml-document/#findComment-366337 Share on other sites More sharing options...
effigy Posted October 10, 2007 Share Posted October 10, 2007 Great. How long does that take to run? Quote Link to comment https://forums.phpfreaks.com/topic/72647-solved-parse-large-xml-document/#findComment-366355 Share on other sites More sharing options...
NJordan72 Posted October 10, 2007 Author Share Posted October 10, 2007 Great. How long does that take to run? It brought parsing the 52 meg file down to 5 minutes from over 3 hours. A definite improvement. Quote Link to comment https://forums.phpfreaks.com/topic/72647-solved-parse-large-xml-document/#findComment-366375 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.