August 16, 2012 at 5:13 am
hi all
i am getting software asset from different client machine nearly 2000 client per day,i am getting data in text file and uploading in database.my file look like this first day.
SOFT|GE-DT-0049|000FFE0E68AE|2012-08-16 12:59:45Z||Cisco WebEx Meetings
SOFT|GE-DT-0049|000FFE0E68AE|2012-08-16 12:59:45Z||Adobe AIR
SOFT|GE-DT-0049|000FFE0E68AE|2012-08-16 12:59:45Z||Adobe Flash Player 10 ActiveX
SOFT|GE-DT-0049|000FFE0E68AE|2012-08-16 12:59:45Z|20120612|Audacity 2.0
SOFT|GE-DT-0049|000FFE0E68AE|2012-08-16 12:59:45Z|20110310|AVI Media Player 1.0.1
den second day , same file with additional column
SOFT|GE-DT-0049|000FFE0E68AE||2012-08-16 12:59:45Z||Cisco WebEx Meetings
SOFT|GE-DT-0049|000FFE0E68AE|2012-08-16 12:59:45Z||Adobe AIR
SOFT|GE-DT-0049|000FFE0E68AE|2012-08-16 12:59:45Z||Adobe Flash Player 10 ActiveX
SOFT|GE-DT-0049|000FFE0E68AE|2012-08-16 12:59:45Z|20120612|Audacity 2.0
SOFT|GE-DT-0049|000FFE0E68AE|2012-08-16 12:59:45Z|20110310|AVI Media Player 1.0.1
SOFT|GE-DT-0049|000FFE0E68AE|2012-08-16 12:59:45Z|Babylon toolbar on IE
SOFT|GE-DT-0049|000FFE0E68AE|2012-08-16 12:59:45Z||Belarc Advisor 8.2
SOFT|GE-DT-0049|000FFE0E68AE|2012-08-16 12:59:45Z||CCleaner
here no primary key, i dont know how to design my data table,
first day i am inserting all the data in table, second day also inserting all data with additional rows(i means new rows).now duplication is there,already existing data is repeating in second day insertion.
please help me out.
--------------------------------------------------------
I am Learner -- SQL
August 16, 2012 at 5:24 am
If there's no primary key, there's no reliable way of preventing duplicates. The first thing to do is to import from your text file into a staging table, and clean the data up in there, so that you have each column contains the same sort of information (dates, application names, etc). Then you can look at what makes each row unique, and use an outer join or some other technique to avoid importing data into your main table that is already there.
John
August 22, 2012 at 9:23 am
Shall we understand that...
1- Your system is cumulative loading of a particular table.
2- Each day you get new rows that have to be inserted.
3- Each day you get updates to some columns of existing rows.
Is that correct?
If the answer is yes then your system should look like this:
SourceDocuments ==> StatingTable ==> CoreTable
SourceDocuments represent the files you have to load.
StatingTable has to be truncated before each "load" and should include ALL columns you want to capture from the SourceDocuments.
CoreTable represents the place you want to keep data up-to-date. As already posted you have to figure out a way to identify each row uniquely (PK) then your load process has to either INSERT or UPDATE rows depending on existance of the row or not on CoreTable.
Hope this helps.
_____________________________________
Pablo (Paul) Berzukov
Author of Understanding Database Administration available at Amazon and other bookstores.
Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.Viewing 3 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply