How do I find out if I double loaded a file, programmatically?

  • I was just doing a table import task (right click database name/Tasks/Import Data), not knowing my boss had just loaded the same file. it did not warn me that the table currently existed. It just appended the same information to the same table, doubling it. I fixed that one, but, it seems that I might have done this myself in the last couple of weeks, and I'd like to find that table, and there have been a LOT of table loads.

    I'm thinking I could get the difference between tables by comparing:

    select distinct count(*) from tblname

    against

    select count(*) from tblname

    but how do I incorporate this into some sort of proc that will go through all the tables and let me know where the issue is? I'm swamped and don't have the time to go through each table manually.

    I have code that shows me row counts, and have been able to eliminate a few tables from contention, as they are loading monthly data that should only increase minorly month to month, so, no double jumps there.

    Thanks.

  • This won't help you right now, but maybe in future. You should put some sort of unique constraint on your table which would result in an error if such a thing ever happened again.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Thanks. However, Table is created with the upload. Not populating existing tables. In this case, I did not know the previous table was there, likely because of lack of table refresh.

  • Ok, I haven't written stored procs in a while. Bear with me please.

    I'm running this, and it's telling me I need to declare @cnt1 and @cnt2.

    Is there some scope problems I'm running into within an exec command?

    ================================================

    alter PROCEDURE spGet_Distinct_Rec_Cnt

    -- Add the parameters for the stored procedure here

    @tblname varchar(150)

    AS

    BEGIN

    -- SET NOCOUNT ON added to prevent extra result sets from

    -- interfering with SELECT statements.

    SET NOCOUNT ON;

    declare @sql as varchar(255)

    declare @cnt1 as bigint

    declare @cnt2 as bigint

    -- Insert statements for procedure here

    set @sql = 'set @cnt1 = (select ' + @tblname + ', count(*) from ' + @tblname + ')'

    execute( @sql)

    set @sql = 'set @cnt2 = (select distinct ' + @tblname + ', count(*) from ' + @tblname + ')'

    execute(@sql)

    If @cnt1 <> @cnt2

    return -1

    END

    ==========================================================

    Msg 137, Level 15, State 1, Line 1

    Must declare the scalar variable "@cnt1".

    Msg 137, Level 15, State 1, Line 1

    Must declare the scalar variable "@cnt2".

  • Phil Parkin (4/24/2015)


    This won't help you right now, but maybe in future. You should put some sort of unique constraint on your table which would result in an error if such a thing ever happened again.

    If you did your data import through code instead of using the UI this would be a lot easier for you.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • I guess you should let me know what you mean by code.

    Unless it's really quick and dirty, there's no time to set up code. I'm just being thrown a continual stream of files to import in a highly aggressive project. Everything is "get this in now", while I'm having to develop my own projects. No repeat imports as of yet. Nothing the same each time. Not my cup of tea.

  • DSNOSPAM (4/24/2015)


    I guess you should let me know what you mean by code.

    Unless it's really quick and dirty, there's no time to set up code. I'm just being thrown a continual stream of files to import in a highly aggressive project. Everything is "get this in now", while I'm having to develop my own projects. No repeat imports as of yet. Nothing the same each time. Not my cup of tea.

    Then I guess you will have to pay close attention when you do your imports. 🙂 Doesn't like you have time to do it right because they want it done right now.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • no chit.

  • DSNOSPAM (4/24/2015)


    I was just doing a table import task (right click database name/Tasks/Import Data), not knowing my boss had just loaded the same file. it did not warn me that the table currently existed. It just appended the same information to the same table, doubling it. I fixed that one, but, it seems that I might have done this myself in the last couple of weeks, and I'd like to find that table, and there have been a LOT of table loads.

    I'm thinking I could get the difference between tables by comparing:

    select distinct count(*) from tblname

    against

    select count(*) from tblname

    but how do I incorporate this into some sort of proc that will go through all the tables and let me know where the issue is? I'm swamped and don't have the time to go through each table manually.

    I have code that shows me row counts, and have been able to eliminate a few tables from contention, as they are loading monthly data that should only increase minorly month to month, so, no double jumps there.

    Thanks.

    There is a way to automate this. I just need to know...

    1) What determines if a row is a duplicate or not at the column level. For example, are there 1 or two columns that could be used to determine if rows are for the same thing? Hopefully it's not truly all the columns as your original post suggests but whatever. Be honest.

    2) Are there columns in the duplicate rows that temporally identify the order in which the dupes were entered/imported?

    3) What do you want to do with the dupes once we find them?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply