Reconciling Data from two Imports without Unique Identifier

akphidelt2007 · July 18, 2015

Alright, so I have a little project I'm trying to work on. I get handed an expenditure report every week, here are the pertinent fields.

Table Name: project_data

123456 | John Doe | Labor - Straight | 2015-01-01 | 1.5 | 255.00

Now, there is no unique identifier in this data. What I want to create is a way the accounting team can import any file they want and it will reconcile the data with what is already in the database. I know how to do right outer joins with ID's to find the differences but I can't seem to figure out the most effective way to do this. Another think to keep in mind is an import can have up to 20,000 rows in it. So if I concatenate all the fields and check the strings I'm afraid it might be a little too slow.

I'm thinking of creating a temporary table called: project_data_temp

Storing the upload, then the next step is comparing the two tables to figure out what is different from both tables. All my searches on Google have come up with matching unique ids, whereas here there is no unique id. The only way I would know how to do it is to create a new column that concatenates these values in both tables and comparing them. Not sure how efficient that would be or if it is needed.

Any help would be appreciated, thanks in advance.

requinix · July 19, 2015

You have to make a decision about what constitutes a duplicate record. One that is "already in the database", as you say. I'm guessing project+employee+expenditure+date? Maybe +hours? That information combined creates uniqueness and that's what you search for.

akphidelt2007 · July 19, 2015

You have to make a decision about what constitutes a duplicate record. One that is "already in the database", as you say. I'm guessing project+employee+expenditure+date? Maybe +hours? That information combined creates uniqueness and that's what you search for.

So pretty much I have to create a new field that concatenates those values and searches for them between the two tables? I guess that makes sense as a unique identifier is necessary and as I need to make sure all those fields match the new import, that is the only way. Just sucks to have a uid that could be up to 100+ characters long. Thanks.

requinix · July 19, 2015

You don't have to concatenate anything. Just search multiple columns. You can create an index on all the relevant columns to help with that... a rather large index, but would probably be worth it.

akphidelt2007 · July 19, 2015

You don't have to concatenate anything. Just search multiple columns. You can create an index on all the relevant columns to help with that... a rather large index, but would probably be worth it.

I've been doing this long enough where I should know this, but I cannot figure out how this would be done with a query to find entries that do not exist between the two tables using their fields without concatenating? What would be just the basic structure of the query? All I can think of is joining the two but I do not know how I can have a join and find entries that don't match at the same time.

requinix · July 19, 2015

Do an outer JOIN on the matching fields, then restrict the results to WHERE some non-nullable field IS NULL - which would only happen if the join failed.

LEFT JOIN existing_table t2 ON t2.a = t1.a AND t2.b = t1.b AND...

WHERE t2.some_non_nullable_column IS NULL

Or you could just put a UNIQUE key on those columns, do an INSERT IGNORE, and let MySQL skip the duplicates for you. Which would be easier.

Sign In

Reconciling Data from two Imports without Unique Identifier

Recommended Posts

akphidelt2007

Link to comment

Share on other sites

requinix

Link to comment

Share on other sites

akphidelt2007

Link to comment

Share on other sites

requinix

Link to comment

Share on other sites

akphidelt2007

Link to comment

Share on other sites

requinix

Link to comment

Share on other sites

Archived

Browse

Activity

Important Information