Jump to content

problem loading 40meg xml file with simplexml


brm5017

Recommended Posts

I'm having trouble loading my xml file. It's on the bigger side, (42 megs) but I still dont think it should be a problem for the server. Any suggestions?? Here's my code as of now...

<?php
//Removed SQL Connection variables
// This is an example opendb.php
$conn = mysql_connect($dbhost, $dbuser, $dbpass) or die                      ('Error connecting to mysql');
mysql_select_db($dbname);

print "Connected to $dbname <br>";

$debug = False;
print "Loading XML File";

if (file_exists('Media-Trend.xml')) {
    $xml = simplexml_load_file('Media-Trend.xml');

    print_r($xml);
} else {
    exit('Failed to open Media-Trend.xml.');
}

print "XML Loaded";
$table = 'listing_pictures'; //Change this to the name of the table

//The rest shouldn't have to be changed

//Get the delimiter
$delim = chr((int)$xml->DELIMITER['value']);
if($debug)
  print "Delimiter: {$delim}[{$xml->DELIMITER['value']}]<br>";

//Get the list of Columns
$columns = explode($delim,trim($xml->COLUMNS)); //Splits column row up
//The follow dynamically builds the INSERT statement from the columns
$insert = "INSERT INTO $table (".implode(',',$columns).") VALUES (".implode(',',array_fill(0,count($columns),"'%s'")).")";

if($debug)
  print '<table border="1"><tr><th>'.implode('</th><th>',$columns).'</th></tr>';

$count = 0;
foreach($xml->DATA as $row){
  $parts = explode($delim,trim($row),count($columns)); //Split the row up
  if($debug){
    print '<tr><td>'.implode('</td><td>',$parts).'</td></tr>';
  }else{
    array_walk($parts,create_function('&$v','$v = mysql_real_escape_string($v);')); //Run all parts through mysql_real_escape_string()
    while(count($parts) < count($columns)) $parts[] = ''; //Make sure we have enough values
    array_unshift($parts,$insert); //Add the INSERT command to the beginning
    $sql = call_user_func_array('sprintf',$parts); //Put it all together
    mysql_query($sql) or die(mysql_error()); //Run the query
  }
  $count++;
}
if($debug){
  print "</table>";
  print "Found $count rows to go into $table";
}else
  print "Inserted $count rows into $table";
?>

 

And a sample of the xml:

<RETS ReplyCode="0" ReplyText="V2.3.3 590: Success">
<COUNT Records="386097" />
<DELIMITER value = "09"/>
<COLUMNS>	PropItemNumber	PropMediaURL	County	ListingID	</COLUMNS>
<DATA>	1	http://trend.trendrets.com:6103/platinum/getmedia?ID=70056313232&LOOT=50038672093	CAMDEN	5061237	</DATA>
<DATA>	1	http://trend.trendrets.com:6103/platinum/getmedia?ID=70054985758&LOOT=50038672093	CAMDEN	2075958	</DATA>
<DATA>	1	http://trend.trendrets.com:6103/platinum/getmedia?ID=70054969058&LOOT=50038672093	CAMDEN	2075972	</DATA>
<DATA>	1	http://trend.trendrets.com:6103/platinum/getmedia?ID=70054969061&LOOT=50038672093	CAMDEN	2076038	</DATA>
<DATA>	1	http://trend.trendrets.com:6103/platinum/getmedia?ID=70054895968&LOOT=50038672093	CAMDEN	2075974	</DATA>
<DATA>	1	http://trend.trendrets.com:6103/platinum/getmedia?ID=70054900820&LOOT=50038672093	CAMDEN	2076016	</DATA>
<DATA>	1	http://trend.trendrets.com:6103/platinum/getmedia?ID=70054975807&LOOT=50038672093	CAMDEN	2076000	</DATA>
<DATA>	1	http://trend.trendrets.com:6103/platinum/getmedia?ID=70054900822&LOOT=50038672093	CAMDEN	2076051	</DATA>

Link to comment
Share on other sites

what happens when $debug is set to true? does it display the table on the screen properly?

 

 

Nothing displays - only the print statements that I've got before the xml load (Connected to Mulhol_Listings <Br> Loading XML File)

 

I've got another print statement that runs after the xml loads that should print "XML Loaded" - that doesn't display either.

 

Think it's a problem with the server?

 

 

 

Link to comment
Share on other sites

how long does it take for the initial print statement to show and the page to stop loading?

 

did you try this:

Add the following two lines of code immediately after your first opening <?php tag to get php to help you by displaying errors -

ini_set ("display_errors", "1");
error_reporting(E_ALL);

Link to comment
Share on other sites

Yep, I just tried this with no luck. First 2 print statements are there instantly, the page takes about 15 seconds to stop loading. 

 

Code as it is now:

 

<?php
ini_set ("display_errors", "1");
error_reporting(E_ALL);

$dbhost = 'localhost';
$dbuser = 'mulhol';
$dbpass = 'multeam';
$dbname = 'mulhol_listings';

// This is an example opendb.php
$conn = mysql_connect($dbhost, $dbuser, $dbpass) or die                      ('Error connecting to mysql');
mysql_select_db($dbname);

print "Connected to $dbname <br>";

$debug = True;
print "Loading XML File";

if (file_exists('Media-Trend.xml')) {
    $xml = simplexml_load_file('Media-Trend.xml');

} else {
    exit('Failed to open Media-Trend.xml.');
}

print "XML Loaded";
$table = 'listing_pictures'; //Change this to the name of the table

//The rest shouldn't have to be changed

//Get the delimiter
$delim = chr((int)$xml->DELIMITER['value']);
if($debug)
  print "Delimiter: {$delim}[{$xml->DELIMITER['value']}]<br>";

//Get the list of Columns
$columns = explode($delim,trim($xml->COLUMNS)); //Splits column row up
//The follow dynamically builds the INSERT statement from the columns
$insert = "INSERT INTO $table (".implode(',',$columns).") VALUES (".implode(',',array_fill(0,count($columns),"'%s'")).")";

if($debug)
  print '<table border="1"><tr><th>'.implode('</th><th>',$columns).'</th></tr>';

$count = 0;
foreach($xml->DATA as $row){
  $parts = explode($delim,trim($row),count($columns)); //Split the row up
  if($debug){
    print '<tr><td>'.implode('</td><td>',$parts).'</td></tr>';
  }else{
    array_walk($parts,create_function('&$v','$v = mysql_real_escape_string($v);')); //Run all parts through mysql_real_escape_string()
    while(count($parts) < count($columns)) $parts[] = ''; //Make sure we have enough values
    array_unshift($parts,$insert); //Add the INSERT command to the beginning
    $sql = call_user_func_array('sprintf',$parts); //Put it all together
    mysql_query($sql) or die(mysql_error()); //Run the query
  }
  $count++;
}
if($debug){
  print "</table>";
  print "Found $count rows to go into $table";
}else
  print "Inserted $count rows into $table";
?>

Link to comment
Share on other sites

shouldn't be

 

it fails before it even gets to "XML Loaded" huh? i've never dealt with core dump files before but what is the contents of one?

 

also, try starting with a smaller version of the XML file and slowly adding to it. can you narrow it down to a specific line it's failing on?

Link to comment
Share on other sites

i ran this code:

Code as it is now:

 

<?php
ini_set ("display_errors", "1");
error_reporting(E_ALL);

$dbhost = 'localhost';
$dbuser = 'mulhol';
$dbpass = 'multeam';
$dbname = 'mulhol_listings';

// This is an example opendb.php
$conn = mysql_connect($dbhost, $dbuser, $dbpass) or die                      ('Error connecting to mysql');
mysql_select_db($dbname);

print "Connected to $dbname <br>";

$debug = True;
print "Loading XML File";

if (file_exists('Media-Trend.xml')) {
    $xml = simplexml_load_file('Media-Trend.xml');

} else {
    exit('Failed to open Media-Trend.xml.');
}

print "XML Loaded";
$table = 'listing_pictures'; //Change this to the name of the table

//The rest shouldn't have to be changed

//Get the delimiter
$delim = chr((int)$xml->DELIMITER['value']);
if($debug)
  print "Delimiter: {$delim}[{$xml->DELIMITER['value']}]<br>";

//Get the list of Columns
$columns = explode($delim,trim($xml->COLUMNS)); //Splits column row up
//The follow dynamically builds the INSERT statement from the columns
$insert = "INSERT INTO $table (".implode(',',$columns).") VALUES (".implode(',',array_fill(0,count($columns),"'%s'")).")";

if($debug)
  print '<table border="1"><tr><th>'.implode('</th><th>',$columns).'</th></tr>';

$count = 0;
foreach($xml->DATA as $row){
  $parts = explode($delim,trim($row),count($columns)); //Split the row up
  if($debug){
    print '<tr><td>'.implode('</td><td>',$parts).'</td></tr>';
  }else{
    array_walk($parts,create_function('&$v','$v = mysql_real_escape_string($v);')); //Run all parts through mysql_real_escape_string()
    while(count($parts) < count($columns)) $parts[] = ''; //Make sure we have enough values
    array_unshift($parts,$insert); //Add the INSERT command to the beginning
    $sql = call_user_func_array('sprintf',$parts); //Put it all together
    mysql_query($sql) or die(mysql_error()); //Run the query
  }
  $count++;
}
if($debug){
  print "</table>";
  print "Found $count rows to go into $table";
}else
  print "Inserted $count rows into $table";
?>

(except for the MySQL parts)

 

with this XML file:

And a sample of the xml:

<RETS ReplyCode="0" ReplyText="V2.3.3 590: Success">
<COUNT Records="386097" />
<DELIMITER value = "09"/>
<COLUMNS>	PropItemNumber	PropMediaURL	County	ListingID	</COLUMNS>
<DATA>	1	http://trend.trendrets.com:6103/platinum/getmedia?ID=70056313232&LOOT=50038672093	CAMDEN	5061237	</DATA>
<DATA>	1	http://trend.trendrets.com:6103/platinum/getmedia?ID=70054985758&LOOT=50038672093	CAMDEN	2075958	</DATA>
<DATA>	1	http://trend.trendrets.com:6103/platinum/getmedia?ID=70054969058&LOOT=50038672093	CAMDEN	2075972	</DATA>
<DATA>	1	http://trend.trendrets.com:6103/platinum/getmedia?ID=70054969061&LOOT=50038672093	CAMDEN	2076038	</DATA>
<DATA>	1	http://trend.trendrets.com:6103/platinum/getmedia?ID=70054895968&LOOT=50038672093	CAMDEN	2075974	</DATA>
<DATA>	1	http://trend.trendrets.com:6103/platinum/getmedia?ID=70054900820&LOOT=50038672093	CAMDEN	2076016	</DATA>
<DATA>	1	http://trend.trendrets.com:6103/platinum/getmedia?ID=70054975807&LOOT=50038672093	CAMDEN	2076000	</DATA>
<DATA>	1	http://trend.trendrets.com:6103/platinum/getmedia?ID=70054900822&LOOT=50038672093	CAMDEN	2076051	</DATA>
</RETS>

 

without errors

Link to comment
Share on other sites

Let me ask you this, is there a way to rename the xml file as a text file - since i'm treating it pretty much as a CSV anyway?

 

By extension he means the php extension which is like an addon to say to the php language.

 

You can input a file named anything as long as it is xml, so you can try changing the filename, but yea chances are that won't help any.

Link to comment
Share on other sites

Sample of the XML:

 

<RETS ReplyCode="0" ReplyText="V2.3.3 590: Success">
<COUNT Records="386097" />
<DELIMITER value = "09"/>
<COLUMNS>    PropItemNumber    PropMediaURL    County    ListingID    </COLUMNS>
<DATA>    1    http://trend.trendrets.com:6103/platinum/getmedia?ID=70056313232&LOOT=50038672093    CAMDEN    5061237    </DATA>
<DATA>    1    http://trend.trendrets.com:6103/platinum/getmedia?ID=70054985758&LOOT=50038672093    CAMDEN    2075958    </DATA>
<DATA>    1    http://trend.trendrets.com:6103/platinum/getmedia?ID=70054969058&LOOT=50038672093    CAMDEN    2075972    </DATA>
<DATA>    1    http://trend.trendrets.com:6103/platinum/getmedia?ID=70054969061&LOOT=50038672093    CAMDEN    2076038    </DATA>
<DATA>    1    http://trend.trendrets.com:6103/platinum/getmedia?ID=70054895968&LOOT=50038672093    CAMDEN    2075974    </DATA>
<DATA>    1    http://trend.trendrets.com:6103/platinum/getmedia?ID=70054900820&LOOT=50038672093    CAMDEN    2076016    </DATA>
<DATA>    1    http://trend.trendrets.com:6103/platinum/getmedia?ID=70054975807&LOOT=50038672093    CAMDEN    2076000    </DATA>
<DATA>    1    http://trend.trendrets.com:6103/platinum/getmedia?ID=70054900822&LOOT=50038672093    CAMDEN    2076051    </DATA>
<DATA>    1    http://trend.trendrets.com:6103/platinum/getmedia?ID=70054900043&LOOT=50038672093    CAMDEN    2076054    </DATA>
<DATA>    1    http://trend.trendrets.com:6103/platinum/getmedia?ID=70054895979&LOOT=50038672093    CAMDEN    2076076    </DATA>
<DATA>    1    http://trend.trendrets.com:6103/platinum/getmedia?ID=70054975816&LOOT=50038672093    CAMDEN    2076067    </DATA>
<DATA>    1    http://trend.trendrets.com:6103/platinum/getmedia?ID=70054895985&LOOT=50038672093    CAMDEN    2076091    </DATA>
<DATA>    1    http://trend.trendrets.com:6103/platinum/getmedia?ID=70054961986&LOOT=50038672093    CAMDEN    2076160    </DATA>
<DATA>    1    http://trend.trendrets.com:6103/platinum/getmedia?ID=70054969070&LOOT=50038672093    CAMDEN    2076131    </DATA>
<DATA>    1    http://trend.trendrets.com:6103/platinum/getmedia?ID=70054900046&LOOT=50038672093    CAMDEN    2076056    </DATA>
</RETS>

Link to comment
Share on other sites

is 400 thousand rows a lot for an XML file?

 

Sort of, if your memory_limit is not set to handle a 40MB file. But the processing time on something like that is pretty massive. Especially when you are trying to insert the data into a database. This could take quite a while, I know with a 2mb file it takes a good deal of time. I cannot imagine it with a 40mb file.

 

You may be better off using the CLI interface of PHP for running this script, so the browser etc does not time out. But the memory_limit may still hold you up so you may have to alter the php.ini and up that limit.

Link to comment
Share on other sites

or a corrupt copy of the simplexml extension

 

Because there are no php errors reported and a core dump file is getting generated, it is likely that the simplexml extension is not installed or functioning correctly.

 

Have you checked your server error log files for more specific information?

 

After you get to a few 10's of thousands of lines or a few 10's or MB of data and you expect the size of the data to continue to grow, you should migrate that data to a database (which is probably what you are doing now) and let the database engine do the work of finding, sorting, and manipulating your data.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.