Jump to content

Reconciling Data from two Imports without Unique Identifier


akphidelt2007

Recommended Posts

Alright, so I have a little project I'm trying to work on. I get handed an expenditure report every week, here are the pertinent fields.

 

Table Name: project_data

 

Project_Num | Employee | Expenditure_Type | Expenditure_Date | Hours | Revenue

 

123456 | John Doe | Labor - Straight | 2015-01-01 | 1.5 | 255.00

 

 

Now, there is no unique identifier in this data. What I want to create is a way the accounting team can import any file they want and it will reconcile the data with what is already in the database. I know how to do right outer joins with ID's to find the differences but I can't seem to figure out the most effective way to do this. Another think to keep in mind is an import can have up to 20,000 rows in it. So if I concatenate all the fields and check the strings I'm afraid it might be a little too slow. 

 

I'm thinking of creating a temporary table called: project_data_temp

 

Storing the upload, then the next step is comparing the two tables to figure out what is different from both tables. All my searches on Google have come up with matching unique ids, whereas here there is no unique id. The only way I would know how to do it is to create a new column that concatenates these values in both tables and comparing them. Not sure how efficient that would be or if it is needed.

 

Any help would be appreciated, thanks in advance.

 

Link to comment
Share on other sites

You have to make a decision about what constitutes a duplicate record. One that is "already in the database", as you say. I'm guessing project+employee+expenditure+date? Maybe +hours? That information combined creates uniqueness and that's what you search for.

Link to comment
Share on other sites

You have to make a decision about what constitutes a duplicate record. One that is "already in the database", as you say. I'm guessing project+employee+expenditure+date? Maybe +hours? That information combined creates uniqueness and that's what you search for.

 

So pretty much I have to create a new field that concatenates those values and searches for them between the two tables? I guess that makes sense as a unique identifier is necessary and as I need to make sure all those fields match the new import, that is the only way. Just sucks to have a uid that could be up to 100+ characters long. Thanks.

Link to comment
Share on other sites

You don't have to concatenate anything. Just search multiple columns. You can create an index on all the relevant columns to help with that... a rather large index, but would probably be worth it.

Edited by requinix
Link to comment
Share on other sites

You don't have to concatenate anything. Just search multiple columns. You can create an index on all the relevant columns to help with that... a rather large index, but would probably be worth it.

 

I've been doing this long enough where I should know this, but I cannot figure out how this would be done with a query to find entries that do not exist between the two tables using their fields without concatenating? What would be just the basic structure of the query? All I can think of is joining the two but I do not know how I can have a join and find entries that don't match at the same time. 

Link to comment
Share on other sites

Do an outer JOIN on the matching fields, then restrict the results to WHERE some non-nullable field IS NULL - which would only happen if the join failed.

LEFT JOIN existing_table t2 ON t2.a = t1.a AND t2.b = t1.b AND...
WHERE t2.some_non_nullable_column IS NULL
Or you could just put a UNIQUE key on those columns, do an INSERT IGNORE, and let MySQL skip the duplicates for you. Which would be easier.
Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.