maghnom Posted June 7, 2014 Share Posted June 7, 2014 I'm trying to utilize a PHP script to parse a large XML file (around 450 MB) to MYSQL database into certain structure and definitions of included XML elements. The problem is that the original script uses file_get_contents and SimpleXMLElement to get it done, but the corn job executed by the server halts due to the volume of the XML file. I'm no PHP expert, so I bought this XMLSplit software and divided the XML into 17 separated XML files each at size of 30 MB, parsed them one by one using the same script. However, the output database was missing a lot of input, and I have serious doubts whether this would be the same output of the original file if left not divided automatically and parsed one by one. So, I've decided to use XMLReader with this exact PHP script to parse this big XML file, but so far I couldn't manage to simply replace the parsing code and keep other functionality intact. I'm including the script below, I'd really appreciate if someone helps me to do so. <?php set_time_limit(0); ini_set('memory_limit', '1024M'); include_once('../db.php'); include_once(DOC_ROOT.'/include/func.php'); mysql_query("TRUNCATE screenshots_list"); mysql_query("TRUNCATE pages"); mysql_query("TRUNCATE page_screenshots"); // This is the part I need help with to change into XMLReader instead of utilized function, to enable parsing of the large XML file correctly (while keeping rest of the script code as is if possible): $xmlstr = file_get_contents('t_info.xml'); $xml = new SimpleXMLElement($xmlstr); foreach ($xml->template as $item) { //print_r($item); $sql = sprintf("REPLACE INTO templates SET id = %d, state = %d, price = %d, exc_price = %d, inserted_date = '%s', update_date = '%s', downloads = %d, type_id = %d, type_name = '%s', is_flash = %d, is_adult = %d, width = '%s', author_id = %d, author_nick = '%s', package_id = %d, is_full_site = %d, is_real_size = %d, keywords = '%s', sources = '%s', description = '%s', software_required = '%s'", $item->id, $item->state, $item->price, $item->exc_price, $item->inserted_date, $item->update_date, $item->downloads, $item->template_type->type_id, $item->template_type->type_name, $item->is_flash, $item->is_adult, $item->width, $item->author->author_id, $item->author->author_nick, $item->package->package_id, $item->is_full_site, $item->is_real_size, $item->keywords, $item->sources, $item->description, $item->software_required); //echo '<br>'.$sql; mysql_query($sql); //print_r($item->screenshots_list->screenshot); foreach ($item->screenshots_list->screenshot as $scr) { $main = (!empty($scr->main_preview)) ? 1 : 0; $small = (!empty($scr->small_preview)) ? 1 : 0; insert_data($item->id, 'screenshots_list', 0, $scr->uri, $scr->filemtime, $main, $small); } foreach ($item->styles->style as $st) { insert_data($item->id, 'styles', $st->style_id, $st->style_name); } foreach ($item->categories->category as $cat) { insert_data($item->id, 'categories', $cat->category_id, $cat->category_name); } foreach ($item->sources_available_list->source as $so) { insert_data($item->id, 'sources_available_list', $so->source_id, ''); } foreach ($item->software_required_list->software as $soft) { insert_data($item->id, 'software_required_list', $soft->software_id, ''); } //print_r($item->pages->page); if (!empty($item->pages->page)) { foreach ($item->pages->page as $p) { mysql_query(sprintf("REPLACE INTO pages SET tpl_id = %d, name = '%s', id = NULL ", $item->id, $p->name)); $page_id = mysql_insert_id(); if (!empty($p->screenshots->scr)) { foreach ($p->screenshots->scr as $psc) { $href = (!empty($psc->href)) ? (string)$psc->href : ''; mysql_query(sprintf("REPLACE INTO page_screenshots SET page_id = %d, description = '%s', uri = '%s', scr_type_id = %d, width = %d, height = %d, href = '%s'", $page_id, $psc->description, $psc->uri, $psc->scr_type_id, $psc->width, $psc->height, $href)); } } } }}?> I'd appreciate your help with that... Quote Link to comment Share on other sites More sharing options...
kicken Posted June 7, 2014 Share Posted June 7, 2014 I've not used XMLReader, however one of the user comments points to a SimpleXMLReader class that provides a simple call-back style interface for parsing a large XML file. You could give that a try and see if it works out for you. As for your current file-splitting solution, so long as the file splits correctly it shouldn't cause any issues as you don't seem to have an inter-dependencies during the parsing. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.