Jump to content

problem loading 40meg xml file with simplexml


brm5017

Recommended Posts

I got the php script to run with a 12 meg xml file ( i deleted 75% of the file). This is a problem. I get a feed every day with housing information.  Now i'll have to manually edit this xml every day and chop it into smaller parts..  Is there any way for simplexml to open each line of the file one by one as it needs to? or maybe even this idea:

 


do{
Open Media.xml as Text File;
$variable1 =(first 100k lines of file)
$variable2 =(2nd 100k lines of file)

}until end of file

then insert the variables into the simple_xml parser w/ loops. This way the server doesn't get over loaded.

Link to comment
Share on other sites

You can try this, add it to the top of the script:

ini_set('memory_limit', '50M');

 

See if you are allowed to do that, if not you can try it in a .htaccess file:

php_value memory_limit 16M

 

Or if possible you can modify the php.ini and change:

memory_limit = 16M  ; Maximum amount of memory a script may consume (16MB)
to
memory_limit = 50M  ; Maximum amount of memory a script may consume (16MB)

 

And see what that does for you. I would change it back however, if you do not do this on a regular basis.

Link to comment
Share on other sites

php memory limit already set to 64M --- WEIRD!

 

this does now make me realize that the core.* dumps that I was seeing earlier were all 64M.

 

does simple xml inflate the file when reading from it?

 

Maybe, not sure, try increasing that to 112M and see what happens.

Link to comment
Share on other sites

 

 

where do I freaking start

 

By filling in the blanks about what the data is, what control you have over what data is in the download or what different downloads there are, how often the data changes (is each download a completely separate set of data, only new data, all data to date...),  and what you are trying to accomplish.

 

Simplexml assumes that you have a file or a string with the xml in it (I don't know, but I don't think you can use it on an open stream.) Something else might be a better match, such as the XMLREADER class.

Link to comment
Share on other sites

Ok here's a run down of what I'm trying to accomplish:

 

There's a Real Estate Listing server that stores listing infomation (know as the RETS server). RETS has every house that's currently for sale along with all the details for each house. There are a few different tables, but mainly i'll be using 3 tables : PropertyDescription, PropertyMedia, & ListingAgent.

 

The only control I have when selecting what data to download is the column names. For the ListingAgent & PropertyDescription, I simply select all from their server and export to seperate XML files. For the PropertyMedia table, I select the 3 columns which I need for the database (imgURL,ListingID,PictureOrderNumber). The PropertyMedia table is tricky, because each house could have either 1 picture or 20 pictures. Multiply that by 30,000 houses and the amount of data increases pretty quickly.

 

As far as using simplexml, i've already had success with both the PropertyDescription and ListingAgent tables and i've imported those into my database. Perhaps XMLREADER will work with larger files?

 

Any tips with starting this?

 

 

Link to comment
Share on other sites

ok...let's try parsing line by line instead of loading the entire thing as a simplexml object:

 

<?php
ini_set ("display_errors", "1");
error_reporting(E_ALL);
set_time_limit(0);

$dbhost = 'localhost';
$dbuser = 'mulhol';
$dbpass = 'multeam';
$dbname = 'mulhol_listings';

$debug = true;
$table = 'listing_pictures'; //Change this to the name of the table

//Connect to database
$conn = mysql_connect($dbhost, $dbuser, $dbpass)
  or die ('Error connecting to mysql');
mysql_select_db($dbname);
print "Connected to $dbname <br>";

//Open XML File
$handle = @fopen('Media-Trend.xml', 'r');
if(!$handle)
  die('Failed to open Media-Trend.xml');
print "XML File Opened<br />";

//Loop Over Lines
$columns = null;
$count = 0;
while (!feof($handle)) {
  $line = fgets($handle, 4096);
  //Check for COLUMNS
  if(!is_array($columns)){
    if(preg_match('/<COLUMNS>(.+)<\/COLUMNS>/',$line,$matches)){
      $columns = preg_split('/\s+/',trim($matches[1]));
      //The follow dynamically builds the INSERT statement from the columns
      $insert = "INSERT INTO $table (".implode(',',$columns).") VALUES (".implode(',',array_fill(0,count($columns),"'%s'")).")";
      if($debug)
        print '<table border="1"><tr><th>'.implode('</th><th>',$columns).'</th></tr>';
    }
    continue;
  }
  //Check for DATA
  if(preg_match('/<DATA>(.+)<\/DATA>/',$line,$matches)){
    $parts = preg_split('/\s+/',trim($matches[1]));
    if($debug){
      print '<tr><td>'.implode('</td><td>',$parts).'</td></tr>';
    }else{
      array_walk($parts,create_function('&$v','$v = mysql_real_escape_string($v);')); //Run all parts through mysql_real_escape_string()
      while(count($parts) < count($columns)) $parts[] = ''; //Make sure we have enough values
      array_unshift($parts,$insert); //Add the INSERT command to the beginning
      $sql = call_user_func_array('sprintf',$parts); //Put it all together
      mysql_query($sql) or die(mysql_error()); //Run the query
    }
    $count++;
  }
}
fclose($handle);
if($debug){
  print "</table>";
  print "Found $count rows to go into $table";
}else
  print "Inserted $count rows into $table";
?>

 

edit: added set_time_limit(0);

Link to comment
Share on other sites

I found this on PHP.net in the manual :

 

Can anyone help me out with implementing this?

 

 


For large files, consider using stream_get_line rather than fgets - it can make a significant difference.

$ time yes "This is a test line" | head -1000000 | php -r '$fp=fopen("php://stdin","r"); while($line=stream_get_line($fp,65535,"\n")) { 1; } fclose($fp);'

real    0m1.482s
user    0m1.616s
sys    0m0.152s

$ time yes "This is a test line" | head -1000000 | php -r '$fp=fopen("php://stdin","r"); while($line=fgets($fp,65535)) { 1; } fclose($fp);'

real    0m7.281s
user    0m7.392s
sys    0m0.136s

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.