guinbRo Posted January 11, 2013 Share Posted January 11, 2013 (edited) Hey guys, I'm entering the amateurish level of programming right now, but have a challenging project ahead of me. Here is a brief scope of the problem: We are collecting a lot of data. It ends up being essentially 53 columns in the CSV. The first thing I noticed was that there were certain portions of the data that were very likely to repeat. This led to the following Schema (MySQL) to decrease redundancies: TABLE Markets (Relies on nothing) (Holds data about markets) TABLE Contacts (Relies on markets) (Contacts are associated with a market) TABLE Events (The events belong to both a market and a contact) TABLE Event Products (This only relies on an Event) Essentially my task is to automate the process since we'll be processing thousands of rows. So the script would probably need to behave like so: Import in markets Then import associated contacts (as well as the unique market id assigned by MySQL) Then import associated events (placing the unique market and contact id assigned by MySQL as well) Then import associated event products (placing the unique event id assigned by MySQL as well) As you can see this is probably really challenging. I've looked around a bit for some decent tutorials or reading. Sorry for the long winded post, here are my three questions: I cannot accurately estimate how long a project like this should take me, in your experience how long does a project like this take? Am I going about this the right way? (Even using PHPMyAdmin to import took a long time due to having to get the appropriate ids) Do you know of any good resources, or books for this? Thanks a million! Edited January 11, 2013 by guinbRo Quote Link to comment https://forums.phpfreaks.com/topic/273019-helping-determining-how-challenging-a-project-is/ Share on other sites More sharing options...
DavidAM Posted January 13, 2013 Share Posted January 13, 2013 Your opening statement sounds like you have a single file of UN-normalized data you want to load into an empty normalized database. So I am answering from that perspective. In this situation, I usually create a table matching the import file, with some added columns (in this case) for MarketID, ContactID, EventID, etc. These added columns default to NULL. Then import the CSV file so all of my data is in one place. Then I use a series of SQL statements, directly against the database (phpmyadmin) to review and process the data. First, check the market data: SELECT DISTINCT MarketName FROM ImportData using whatever makes the market data unique. If the result looks reasonable, and returns the expected number of rows, great. If there is a problem, fix the import data and re-import (or fix it in the ImportTable as well), then try again. I fix the source data (if possible) in case I have to drop the import table and start over. Once I'm happy with the data, I insert it into the final table: INSERT INTO MarketData (column, names, here) SELECT DISTINCT column, names, here FROM ImportData Here is where the added columns come in. Once I have inserted the data, they now have a unique ID from the AUTO_INCREMENT column of the final table, so update the import table with this ID (we will need it to do the next level (Contact)). UPDATE ImportTable FROM MarketData SET ImportTable.MarketID = MarketData.ID WHERE ImportTable.MarketName = MarketData.MarketName Then move on to the next level -- Contacts. As I do this, I save every SQL statement that is used, including the ones I use to review the data, along with copious comments on what it does and why. This way, if I have to start over, I can either copy/paste the SQL instead of creating it again; or I can put the conversion SQL in a file and execute the SQL script to do multiple steps quickly. I would not bother to write a PHP script to handle this, unless there is a significant amount of conditional updates. "You can't automate a process that doesn't exist" --- I was told that by a guy I worked with/for many years ago, and it makes sense. Doing this manually in SQL will be faster than trying to write a process to do it, since there will be some trial and error. Once you have the process down, if it will have to be repeated --- because the file comes to you every month or whatever --- then you can automate it and you will know what you are trying to accomplish. Even then, I would likely build a single SQL script file to be run against each new file, rather than write a PHP script. Quote Link to comment https://forums.phpfreaks.com/topic/273019-helping-determining-how-challenging-a-project-is/#findComment-1405348 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.