Jump to content

When to input data into Database


Psycho

Recommended Posts

I have an XML file which contains the data for many different movies. I am int he process of writing a script to parse the XML file to be input into a database. however, i am trying to decide which would be the most efficient point at which to input the data. Here are the two options I am looking at now.

[b]1) COMPLETELY PROCESS THE XML FIRST[/b]
I will process the complete XML file and put the data into separate arrays based upon the tables that the data needs to go into.

here is an example of what I mean:
[code]$movies[movieid][name]
$movies[movieid][plot]
$movies[movieid][runningtime]
etc.

$actors[actorid][fname]
$actors[actorid][lname]

$actor_movie[actorid][0] = movieid1
$actor_movie[actorid][1] = movieid2
etc.[/code]

The benefit with this method is that I will not have to process a lot of duplicates with things such as actors, studios, etc. I will have a single entry in the $actors array for each actor and I would only have to check once to see if it already exists in the database. The downside would be that I may get some very large arrays as the XML file will contain hundreds, possibly thousands, of movie records.

[b]1) INPUT RECORDS INTO DATABASE AS THEY ARE PROCESSED[/b]
In this scenario I would insert records for each applicable record as they are processed. So, I process the first movie I would insert the records for that movie. Then as I process the actors for that movie I would insert the actors for that movie

But I would first have to do a query to see if the actor already exists. So, as I go through each movie if the same actor exists in 20 movies I would be doing a check 20 times to see if that actor already exists. All of these duplicative queries could be less efficient than the method above.



I hope this all makes sense. Anyone have a suggestion?
Link to comment
Share on other sites

Why not do something like:

[code]<?php
$sql = "INSERT INTO TableName (Col_1, Col_2, ..., Col_n) VALUES ";
$extras = Array();
foreach($DataSet as $Data){
  $extras[] = "({$Data['Col_1']}, {$Data['Col_2']}, ..., {$Data['Col_n'})";
}
$sql .= implode(", ", $extras);
?>[/code]

You said this is to process a very large file, it might be best to use ini_set to turn off the script time limit and the memory limit and run it from a command prompt.  If this is an operation that needs to run frequently consider setting it up as a cron.
Link to comment
Share on other sites

Thanks for the tip concerning the time limit - that will come in handy. However, the rest doesn't really answer my question.

Here is a very brief example on part of what one record might contain
[code]<movielist>
<movie>
    <title>Apocalypse now</title>
    <runningtime>120</runningtime>
    <mpaarating>120</mpaarating>
    <genres>
      <genre>Action</genre>
      <genre>War</genre>
      <genre>Drama</genre>
    </genres>
    <cast>
      <actor>
        <fname>Marlon</fname>
        <lname>Brando</lname>
      </actor>
      <actor>
        <fname>Martin</fname>
        <lname>Sheen</lname>
      </actor>
    </cast>
  </movie>
</movielist>[/code]

As you can see, one movie recor can contain several records for genre or actors. These need to go into separate tables. However, the genres listed will be repeated many times for all the other movies.

So, in the example of genres I can attempt to add each genre as I process each movie record. Doing that would mean that I would be processing a lot of duplicate entries as "Drama" would be a genre used for a lot of movies and I would have to do query to check if the genre exists before adding it to the genre table.

The other alternative is to create a "genre" array as I am processing the data and I could simply create a line that goes something like this [code]$genres[$genrename][] = $movieid;[/code]
Then after I have processed all the movies my genres array would be something like this
[code]$genres[Drama][0] = 234
$genres[Drama][1] = 339
$genres[Drama][2] = 98
$genres[Comedy][0] = 86
$genres[Comedy][1] = 291
$genres[Comedy][2] = 44[/code]

Then I just need to check each genre name once and then enter the values accordingly.
Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.