akphidelt2007 Posted July 18, 2015 Share Posted July 18, 2015 Alright, so I have a little project I'm trying to work on. I get handed an expenditure report every week, here are the pertinent fields. Table Name: project_data Project_Num | Employee | Expenditure_Type | Expenditure_Date | Hours | Revenue 123456 | John Doe | Labor - Straight | 2015-01-01 | 1.5 | 255.00 Now, there is no unique identifier in this data. What I want to create is a way the accounting team can import any file they want and it will reconcile the data with what is already in the database. I know how to do right outer joins with ID's to find the differences but I can't seem to figure out the most effective way to do this. Another think to keep in mind is an import can have up to 20,000 rows in it. So if I concatenate all the fields and check the strings I'm afraid it might be a little too slow. I'm thinking of creating a temporary table called: project_data_temp Storing the upload, then the next step is comparing the two tables to figure out what is different from both tables. All my searches on Google have come up with matching unique ids, whereas here there is no unique id. The only way I would know how to do it is to create a new column that concatenates these values in both tables and comparing them. Not sure how efficient that would be or if it is needed. Any help would be appreciated, thanks in advance. Quote Link to comment Share on other sites More sharing options...
requinix Posted July 19, 2015 Share Posted July 19, 2015 You have to make a decision about what constitutes a duplicate record. One that is "already in the database", as you say. I'm guessing project+employee+expenditure+date? Maybe +hours? That information combined creates uniqueness and that's what you search for. Quote Link to comment Share on other sites More sharing options...
akphidelt2007 Posted July 19, 2015 Author Share Posted July 19, 2015 You have to make a decision about what constitutes a duplicate record. One that is "already in the database", as you say. I'm guessing project+employee+expenditure+date? Maybe +hours? That information combined creates uniqueness and that's what you search for. So pretty much I have to create a new field that concatenates those values and searches for them between the two tables? I guess that makes sense as a unique identifier is necessary and as I need to make sure all those fields match the new import, that is the only way. Just sucks to have a uid that could be up to 100+ characters long. Thanks. Quote Link to comment Share on other sites More sharing options...
requinix Posted July 19, 2015 Share Posted July 19, 2015 (edited) You don't have to concatenate anything. Just search multiple columns. You can create an index on all the relevant columns to help with that... a rather large index, but would probably be worth it. Edited July 19, 2015 by requinix Quote Link to comment Share on other sites More sharing options...
akphidelt2007 Posted July 19, 2015 Author Share Posted July 19, 2015 You don't have to concatenate anything. Just search multiple columns. You can create an index on all the relevant columns to help with that... a rather large index, but would probably be worth it. I've been doing this long enough where I should know this, but I cannot figure out how this would be done with a query to find entries that do not exist between the two tables using their fields without concatenating? What would be just the basic structure of the query? All I can think of is joining the two but I do not know how I can have a join and find entries that don't match at the same time. Quote Link to comment Share on other sites More sharing options...
requinix Posted July 19, 2015 Share Posted July 19, 2015 Do an outer JOIN on the matching fields, then restrict the results to WHERE some non-nullable field IS NULL - which would only happen if the join failed. LEFT JOIN existing_table t2 ON t2.a = t1.a AND t2.b = t1.b AND... WHERE t2.some_non_nullable_column IS NULLOr you could just put a UNIQUE key on those columns, do an INSERT IGNORE, and let MySQL skip the duplicates for you. Which would be easier. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.