When to input data into Database

Psycho · October 30, 2006

I have an XML file which contains the data for many different movies. I am int he process of writing a script to parse the XML file to be input into a database. however, i am trying to decide which would be the most efficient point at which to input the data. Here are the two options I am looking at now.

[b]1) COMPLETELY PROCESS THE XML FIRST[/b]
I will process the complete XML file and put the data into separate arrays based upon the tables that the data needs to go into.

here is an example of what I mean:
[code]$movies[movieid][name]
$movies[movieid][plot]
$movies[movieid][runningtime]
etc.

$actors[actorid][fname]
$actors[actorid][lname]

$actor_movie[actorid][0] = movieid1
$actor_movie[actorid][1] = movieid2
etc.[/code]

The benefit with this method is that I will not have to process a lot of duplicates with things such as actors, studios, etc. I will have a single entry in the $actors array for each actor and I would only have to check once to see if it already exists in the database. The downside would be that I may get some very large arrays as the XML file will contain hundreds, possibly thousands, of movie records.

[b]1) INPUT RECORDS INTO DATABASE AS THEY ARE PROCESSED[/b]
In this scenario I would insert records for each applicable record as they are processed. So, I process the first movie I would insert the records for that movie. Then as I process the actors for that movie I would insert the actors for that movie

But I would first have to do a query to see if the actor already exists. So, as I go through each movie if the same actor exists in 20 movies I would be doing a check 20 times to see if that actor already exists. All of these duplicative queries could be less efficient than the method above.

I hope this all makes sense. Anyone have a suggestion?

roopurt18 · October 30, 2006

Why not do something like:

[code]<?php
$sql = "INSERT INTO TableName (Col_1, Col_2, ..., Col_n) VALUES ";
$extras = Array();
foreach($DataSet as $Data){
$extras[] = "({$Data['Col_1']}, {$Data['Col_2']}, ..., {$Data['Col_n'})";
}
$sql .= implode(", ", $extras);
?>[/code]

You said this is to process a very large file, it might be best to use ini_set to turn off the script time limit and the memory limit and run it from a command prompt. If this is an operation that needs to run frequently consider setting it up as a cron.

Psycho · October 31, 2006

Thanks for the tip concerning the time limit - that will come in handy. However, the rest doesn't really answer my question.

Here is a very brief example on part of what one record might contain
[code]<movielist>
<movie>
<title>Apocalypse now</title>
<runningtime>120</runningtime>
<mpaarating>120</mpaarating>
<genres>
<genre>Action</genre>
<genre>War</genre>
<genre>Drama</genre>
</genres>
<cast>
<actor>
<fname>Marlon</fname>
<lname>Brando</lname>
</actor>
<actor>
<fname>Martin</fname>
<lname>Sheen</lname>
</actor>
</cast>
</movie>
</movielist>[/code]

As you can see, one movie recor can contain several records for genre or actors. These need to go into separate tables. However, the genres listed will be repeated many times for all the other movies.

So, in the example of genres I can attempt to add each genre as I process each movie record. Doing that would mean that I would be processing a lot of duplicate entries as "Drama" would be a genre used for a lot of movies and I would have to do query to check if the genre exists before adding it to the genre table.

The other alternative is to create a "genre" array as I am processing the data and I could simply create a line that goes something like this [code]$genres[$genrename][] = $movieid;[/code]
Then after I have processed all the movies my genres array would be something like this
[code]$genres[Drama][0] = 234
$genres[Drama][1] = 339
$genres[Drama][2] = 98
$genres[Comedy][0] = 86
$genres[Comedy][1] = 291
$genres[Comedy][2] = 44[/code]

Then I just need to check each genre name once and then enter the values accordingly.

roopurt18 · November 1, 2006

Store each piece of repeatable data in an array as you encounter it and then process the arrays.

Psycho · November 1, 2006

That's the approach i have already taken. Thanks for the validation.

Sign In

When to input data into Database

Recommended Posts

Psycho

Link to comment

Share on other sites

roopurt18

Link to comment

Share on other sites

Psycho

Link to comment

Share on other sites

roopurt18

Link to comment

Share on other sites

Psycho

Link to comment

Share on other sites

Archived

Browse

Activity

Important Information