July 15, 2021 at 9:59 am
Good Morning,
I am looking for some advice on some work i have been given. currently we have a small database which is based on a zip file of 5 CSV files. The CSV files get updated each week and the files then need to be uploaded into the database each week.
I am ok importing the new files in weekly however we need to be able to keep a log of what has changed in the CSV's compared to the database. The problem I have is that the files have no distinguishable primary key field. The only primary key i can determine is the entire row.
For example the are 5 columns and all bar one will be the same which is the date field however the date field cannot be used an indetifier because it can be used in another row.
I was looking at importing the new CSVs into temp tables then having a problem go through each and compare it to the existing tables so i can add an update date or information as to what changed however i am not sure it would be possible without a unqiue indentifer.
I cannot think of a way of being able to keep track of the history, so i am looking for either any ideas of how i can do this or confirmation as whether i am correct in thinking that keeping track of the history would be impossible.
Thank you
July 15, 2021 at 10:34 am
If you have five columns in a table and the only PK which 'works' is a combination of all five columns, there can be no updates to the data.
Why not? Because if a row comes from the source data which is in any way different to all of the existing rows, it is a new row.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
July 15, 2021 at 11:39 am
Yep. You're stuck. Either all the values are the same, then no update should occur and no change tracking needed, or, the data is inserted only, and no way to show that value X is actually a change for a given row.
A change to the process must occur. The generating system that makes the CSVs needs to create some kind of unique key, somehow, so you can consume it. Or, you could create a an artificial set of keys the next time you import the data (although, since the natural key is all the columns, this is going to be potentially problematic, because you should always use the natural key, even when you use an artificial one) and then transmit that back to the source for them to use going forward so that changes are actually changes, not new inserts.
You really are going to have to change how the system is working to solve this.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
Viewing 3 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply