Jump to content

[SOLVED] XMLReader consuming a ton of memory


Snowmiser

Recommended Posts

Hi, first post.  ;D

I don't belong to any php communities. This would be my first. Normally I just hang out in the php channel on freenode, but there's doesn't seem to be a lot of knowledge about XML going around and google searches are unsuccessful.

 

Well my question is why would this consume 93mb of memory parsing just a 4mb xml file and is there any way to optimize it?

 

    function xml_into_assoc($xml)
    {
        $result = array();
        $i = 0;

        while ($xml->read())
        {
            switch ($xml->nodeType)
            {
                case XMLReader::ELEMENT:
                {
                    $result[$i]['name'] = $xml->name;
                    $result[$i]['value'] = $xml->isEmptyElement ? '' : xml_into_assoc($xml);

                    if ($xml->hasAttributes)
                    {
                        while ($xml->moveToNextAttribute())
                        {
                            $result[$i]['attributes'][$xml->name] = $xml->value;
                        }
                    }

                    $i++;

                    break;
                }

                case XMLReader::END_ELEMENT:
                {
                    return $result;
                }

                case XMLReader::TEXT:
                {
                }

                case XMLReader::CDATA:
                {
                    $result = $xml->value;
                    break;
                }
            }
        }

        return $result;
    }

 

PHP structures are not hugely memory efficient.  You can use memory_get_usage() to measure the usage at various points and see how it grows during the parsing.

 

When I'm dealing with enourmous data sets, I often pack things into strings.  I regularly get savings of 80-90% of memory used when packing an array of associative arrays into an array of strings (each string can be reconstituted into an associative array when required).

 

 

 

I gather this far is that you're correct. It appears that consuming large amounts of memory isn't uncharacteristic of any of PHP's XML APIs. I will just parse it manually. I can't justify increasing the memory_limit for 4mb.

Actually PHP by design handles NODE TYPES (4x) better when you access those types via ARRAY elements. It's well known fact that PHP converts all XML documents to it's optimized array structure. So accessing node types, whether that be posing a simple question, it's better to access each node with the array TYPE element than to use the XML::TYPE constant, because PHP only performs the lookup when that constant is encountered. While it is already in the array structure $xml[$i]['TYPE']! All I am saying is that PHP hacks XML, it doesn't follow the standards, so it better to learn how they hacked the XML standard and design your application to take advantage of their hack instead of doing it the right way, because the right way will only use ridiculously large amounts of memory and leave you shaking your head wondering what you're doing wrong!

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.