Psycho Posted October 30, 2006 Share Posted October 30, 2006 I have an XML file which contains the data for many different movies. I am int he process of writing a script to parse the XML file to be input into a database. however, i am trying to decide which would be the most efficient point at which to input the data. Here are the two options I am looking at now.[b]1) COMPLETELY PROCESS THE XML FIRST[/b]I will process the complete XML file and put the data into separate arrays based upon the tables that the data needs to go into.here is an example of what I mean: [code]$movies[movieid][name]$movies[movieid][plot]$movies[movieid][runningtime]etc.$actors[actorid][fname]$actors[actorid][lname]$actor_movie[actorid][0] = movieid1$actor_movie[actorid][1] = movieid2etc.[/code]The benefit with this method is that I will not have to process a lot of duplicates with things such as actors, studios, etc. I will have a single entry in the $actors array for each actor and I would only have to check once to see if it already exists in the database. The downside would be that I may get some very large arrays as the XML file will contain hundreds, possibly thousands, of movie records.[b]1) INPUT RECORDS INTO DATABASE AS THEY ARE PROCESSED[/b]In this scenario I would insert records for each applicable record as they are processed. So, I process the first movie I would insert the records for that movie. Then as I process the actors for that movie I would insert the actors for that movieBut I would first have to do a query to see if the actor already exists. So, as I go through each movie if the same actor exists in 20 movies I would be doing a check 20 times to see if that actor already exists. All of these duplicative queries could be less efficient than the method above.I hope this all makes sense. Anyone have a suggestion? Quote Link to comment https://forums.phpfreaks.com/topic/25631-when-to-input-data-into-database/ Share on other sites More sharing options...
roopurt18 Posted October 30, 2006 Share Posted October 30, 2006 Why not do something like:[code]<?php$sql = "INSERT INTO TableName (Col_1, Col_2, ..., Col_n) VALUES ";$extras = Array();foreach($DataSet as $Data){ $extras[] = "({$Data['Col_1']}, {$Data['Col_2']}, ..., {$Data['Col_n'})";}$sql .= implode(", ", $extras);?>[/code]You said this is to process a very large file, it might be best to use ini_set to turn off the script time limit and the memory limit and run it from a command prompt. If this is an operation that needs to run frequently consider setting it up as a cron. Quote Link to comment https://forums.phpfreaks.com/topic/25631-when-to-input-data-into-database/#findComment-116996 Share on other sites More sharing options...
Psycho Posted October 31, 2006 Author Share Posted October 31, 2006 Thanks for the tip concerning the time limit - that will come in handy. However, the rest doesn't really answer my question.Here is a very brief example on part of what one record might contain[code]<movielist> <movie> <title>Apocalypse now</title> <runningtime>120</runningtime> <mpaarating>120</mpaarating> <genres> <genre>Action</genre> <genre>War</genre> <genre>Drama</genre> </genres> <cast> <actor> <fname>Marlon</fname> <lname>Brando</lname> </actor> <actor> <fname>Martin</fname> <lname>Sheen</lname> </actor> </cast> </movie></movielist>[/code]As you can see, one movie recor can contain several records for genre or actors. These need to go into separate tables. However, the genres listed will be repeated many times for all the other movies.So, in the example of genres I can attempt to add each genre as I process each movie record. Doing that would mean that I would be processing a lot of duplicate entries as "Drama" would be a genre used for a lot of movies and I would have to do query to check if the genre exists before adding it to the genre table.The other alternative is to create a "genre" array as I am processing the data and I could simply create a line that goes something like this [code]$genres[$genrename][] = $movieid;[/code]Then after I have processed all the movies my genres array would be something like this[code]$genres[Drama][0] = 234$genres[Drama][1] = 339$genres[Drama][2] = 98$genres[Comedy][0] = 86$genres[Comedy][1] = 291$genres[Comedy][2] = 44[/code]Then I just need to check each genre name once and then enter the values accordingly. Quote Link to comment https://forums.phpfreaks.com/topic/25631-when-to-input-data-into-database/#findComment-117039 Share on other sites More sharing options...
roopurt18 Posted November 1, 2006 Share Posted November 1, 2006 Store each piece of repeatable data in an array as you encounter it and then process the arrays. Quote Link to comment https://forums.phpfreaks.com/topic/25631-when-to-input-data-into-database/#findComment-117987 Share on other sites More sharing options...
Psycho Posted November 1, 2006 Author Share Posted November 1, 2006 That's the approach i have already taken. Thanks for the validation. Quote Link to comment https://forums.phpfreaks.com/topic/25631-when-to-input-data-into-database/#findComment-118091 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.