Jump to content

[SOLVED] XMLReader consuming a ton of memory


Snowmiser

Recommended Posts

Hi, first post.  ;D

I don't belong to any php communities. This would be my first. Normally I just hang out in the php channel on freenode, but there's doesn't seem to be a lot of knowledge about XML going around and google searches are unsuccessful.

 

Well my question is why would this consume 93mb of memory parsing just a 4mb xml file and is there any way to optimize it?

 

    function xml_into_assoc($xml)
    {
        $result = array();
        $i = 0;

        while ($xml->read())
        {
            switch ($xml->nodeType)
            {
                case XMLReader::ELEMENT:
                {
                    $result[$i]['name'] = $xml->name;
                    $result[$i]['value'] = $xml->isEmptyElement ? '' : xml_into_assoc($xml);

                    if ($xml->hasAttributes)
                    {
                        while ($xml->moveToNextAttribute())
                        {
                            $result[$i]['attributes'][$xml->name] = $xml->value;
                        }
                    }

                    $i++;

                    break;
                }

                case XMLReader::END_ELEMENT:
                {
                    return $result;
                }

                case XMLReader::TEXT:
                {
                }

                case XMLReader::CDATA:
                {
                    $result = $xml->value;
                    break;
                }
            }
        }

        return $result;
    }

 

Link to comment
Share on other sites

PHP structures are not hugely memory efficient.  You can use memory_get_usage() to measure the usage at various points and see how it grows during the parsing.

 

When I'm dealing with enourmous data sets, I often pack things into strings.  I regularly get savings of 80-90% of memory used when packing an array of associative arrays into an array of strings (each string can be reconstituted into an associative array when required).

 

 

 

Link to comment
Share on other sites

Actually PHP by design handles NODE TYPES (4x) better when you access those types via ARRAY elements. It's well known fact that PHP converts all XML documents to it's optimized array structure. So accessing node types, whether that be posing a simple question, it's better to access each node with the array TYPE element than to use the XML::TYPE constant, because PHP only performs the lookup when that constant is encountered. While it is already in the array structure $xml[$i]['TYPE']! All I am saying is that PHP hacks XML, it doesn't follow the standards, so it better to learn how they hacked the XML standard and design your application to take advantage of their hack instead of doing it the right way, because the right way will only use ridiculously large amounts of memory and leave you shaking your head wondering what you're doing wrong!

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.