Wondering if it would be better to use UNION when performing inserts

  • A colleague of mine is going to be leaving soon. I'm taking over for him. For two years he's been working on a project, so there's only a dev/test database. He's got a C# project which he runs periodically to restart all the tables in the database. At this point I don't know how he empties the tables, but that's something I can check with him later. What I would like to ask here is if it is faster to perform a lot of INSERTS to repopulate tables, or if it would be better to perform one INSERT with several UNION statements, as I've seen done here in SQL Server Central's forums and articles. I'll produce one of the SQL scripts he uses. All the SQL scripts are like this one. (This script is a shorter one, several are much larger as they populate a lot of records into the database. Also note I'll modify the data being inserted for privacy's sake.):

    ALTER TABLE  [PW].[ProgramContract]  NOCHECK CONSTRAINT ALL

    SET IDENTITY_INSERT [PW].[Contract] ON

    INSERT INTO [PW].[Contract] ([ID],[ProgramID],[PAContractNum],[PAContractAmt],[PAContractStartDate],[PAContractExpirationDate],[CreatedBy],[CreatedOn],[UpdatedBy],[UpdatedOn]) VALUES (1, 8, '15FHBFPP0115', 0.0000, '2014-07-01 00:00:00', '2016-06-30 00:00:00', 'some.person', '2015-02-11 00:00:00', 'some.person', NULL)
    INSERT INTO [PW].[Contract] ([ID],[ProgramID],[PAContractNum],[PAContractAmt],[PAContractStartDate],[PAContractExpirationDate],[CreatedBy],[CreatedOn],[UpdatedBy],[UpdatedOn]) VALUES (2, 8, '15FHBFPP0102', 0.0000, '2014-07-01 00:00:00', '2020-06-30 00:00:00', 'some.person', '2015-02-11 00:00:00', 'some.person', NULL)
    INSERT INTO [PW].[Contract] ([ID],[ProgramID],[PAContractNum],[PAContractAmt],[PAContractStartDate],[PAContractExpirationDate],[CreatedBy],[CreatedOn],[UpdatedBy],[UpdatedOn]) VALUES (3, 13, 'HRFY2015', 0, '2014-07-01 00:00:00', '2016-06-30 00:00:00', 'some.person', '2015-02-21 00:00:00', 'some.person', NULL)
    INSERT INTO [PW].[Contract] ([ID],[ProgramID],[PAContractNum],[PAContractAmt],[PAContractStartDate],[PAContractExpirationDate],[CreatedBy],[CreatedOn],[UpdatedBy],[UpdatedOn]) VALUES (4, 10, 'RX2015', 0, '2014-07-01 00:00:00', '2020-06-30 00:00:00', 'some.person', '2015-02-21 00:00:00', 'some.person', NULL)
    INSERT INTO [PW].[Contract] ([ID],[ProgramID],[PAContractNum],[PAContractAmt],[PAContractStartDate],[PAContractExpirationDate],[CreatedBy],[CreatedOn],[UpdatedBy],[UpdatedOn]) VALUES (5, 3, 'STD-NM', 0, '2014-07-01 00:00:00', '2020-06-30 00:00:00', 'some.person', '2015-02-23 00:00:00', 'some.person', NULL)
    INSERT INTO [PW].[Contract] ([ID],[ProgramID],[PAContractNum],[PAContractAmt],[PAContractStartDate],[PAContractExpirationDate],[CreatedBy],[CreatedOn],[UpdatedBy],[UpdatedOn]) VALUES (7, 1, 'DPCP2015', 0, '2014-07-01 00:00:00', '2016-06-30 00:00:00', 'some.person', '2015-03-13 00:00:00', 'some.person', NULL)
    INSERT INTO [PW].[Contract] ([ID],[ProgramID],[PAContractNum],[PAContractAmt],[PAContractStartDate],[PAContractExpirationDate],[CreatedBy],[CreatedOn],[UpdatedBy],[UpdatedOn]) VALUES (8, 4, 'WIC', 0, '2014-07-01 00:00:00', '2020-06-30 00:00:00', 'some.person', '2015-03-19 00:00:00', 'some.person', NULL)
    INSERT INTO [PW].[Contract] ([ID],[ProgramID],[PAContractNum],[PAContractAmt],[PAContractStartDate],[PAContractExpirationDate],[CreatedBy],[CreatedOn],[UpdatedBy],[UpdatedOn]) VALUES (14, 9, 'IM-VFC', 0.0000, '2014-07-01 00:00:00', '2020-06-30 00:00:00', 'some.person', '2015-03-26 00:00:00', 'some.person', NULL)
    INSERT INTO [PW].[Contract] ([ID],[ProgramID],[PAContractNum],[PAContractAmt],[PAContractStartDate],[PAContractExpirationDate],[CreatedBy],[CreatedOn],[UpdatedBy],[UpdatedOn]) VALUES (15, 24, 'RX2015-OLD', 0, '2014-07-01 00:00:00', '2020-06-30 00:00:00', 'some.person', '2015-03-26 00:00:00', 'some.person', NULL)
    INSERT INTO [PW].[Contract] ([ID],[ProgramID],[PAContractNum],[PAContractAmt],[PAContractStartDate],[PAContractExpirationDate],[CreatedBy],[CreatedOn],[UpdatedBy],[UpdatedOn]) VALUES (18, 28, 'RX-RABIES', 0, '2014-07-01 00:00:00', '2020-06-30 00:00:00', 'some.person', '2015-05-21 00:00:00', 'some.person', NULL)
    INSERT INTO [PW].[Contract] ([ID],[ProgramID],[PAContractNum],[PAContractAmt],[PAContractStartDate],[PAContractExpirationDate],[CreatedBy],[CreatedOn],[UpdatedBy],[UpdatedOn]) VALUES (20, 14, 'CMS', 0, '2014-07-01 00:00:00', '2020-06-30 00:00:00', 'some.person', '2015-05-27 00:00:00', 'some.person', NULL)
    INSERT INTO [PW].[Contract] ([ID],[ProgramID],[PAContractNum],[PAContractAmt],[PAContractStartDate],[PAContractExpirationDate],[CreatedBy],[CreatedOn],[UpdatedBy],[UpdatedOn]) VALUES (21, 13, 'HRFY2017', 0, '2016-07-01 00:00:00', '2019-06-30 00:00:00', 'some.person', '2016-07-11 00:00:00', 'some.person', NULL)
    INSERT INTO [PW].[Contract] ([ID],[ProgramID],[PAContractNum],[PAContractAmt],[PAContractStartDate],[PAContractExpirationDate],[CreatedBy],[CreatedOn],[UpdatedBy],[UpdatedOn]) VALUES (25, 32, 'HR-OPE', 0, '2017-01-03 00:00:00', '2020-06-30 00:00:00', 'some.person', '2017-01-03 00:00:00', 'some.person', NULL)
    INSERT INTO [PW].[Contract] ([ID],[ProgramID],[PAContractNum],[PAContractAmt],[PAContractStartDate],[PAContractExpirationDate],[CreatedBy],[CreatedOn],[UpdatedBy],[UpdatedOn]) VALUES (30, 29, 'BHEM SUPPLIES', 0, '2017-03-14 00:00:00', '2019-06-30 00:00:00', '', '2017-03-14 00:00:00', '', NULL)
    INSERT INTO [PW].[Contract] ([ID],[ProgramID],[PAContractNum],[PAContractAmt],[PAContractStartDate],[PAContractExpirationDate],[CreatedBy],[CreatedOn],[UpdatedBy],[UpdatedOn]) VALUES (32, 34, 'RX2019', 0, '2019-04-16 00:00:00', '2019-06-30 00:00:00', 'some.person', '2019-04-17 00:00:00', 'some.person', NULL)
    INSERT INTO [PW].[Contract] ([ID],[ProgramID],[PAContractNum],[PAContractAmt],[PAContractStartDate],[PAContractExpirationDate],[CreatedBy],[CreatedOn],[UpdatedBy],[UpdatedOn]) VALUES (33, 13, 'HRFY2020', 0, '2019-07-01 00:00:00', '2021-06-30 00:00:00', 'some.person', '2019-07-01 00:00:00', 'some.person', NULL)
    INSERT INTO [PW].[Contract] ([ID],[ProgramID],[PAContractNum],[PAContractAmt],[PAContractStartDate],[PAContractExpirationDate],[CreatedBy],[CreatedOn],[UpdatedBy],[UpdatedOn]) VALUES (36, 33, '1', 0, '2019-07-01 00:00:00', '2020-06-30 00:00:00', 'some.person', '2019-12-20 00:00:00', 'some.person', NULL)

    SET IDENTITY_INSERT [PW].[Contract] OFF
    ALTER TABLE [PW].[ProgramContract] WITH CHECK CHECK CONSTRAINT ALL

    Kindest Regards, Rod Connect with me on LinkedIn.

  • No need for UNION:

    INSERT [PW].[Contract] ([ID],[ProgramID],[PAContractNum],[PAContractAmt],[PAContractStartDate],[PAContractExpirationDate],[CreatedBy],[CreatedOn],[UpdatedBy],[UpdatedOn]) VALUES (1, 8, '15FHBFPP0115', 0.0000, '2014-07-01 00:00:00', '2016-06-30 00:00:00', 'some.person', '2015-02-11 00:00:00', 'some.person', NULL)
    ,(2, 8, '15FHBFPP0102', 0.0000, '2014-07-01 00:00:00', '2020-06-30 00:00:00', 'some.person', '2015-02-11 00:00:00', 'some.person', NULL)
    ,(3, 13, 'HRFY2015', 0, '2014-07-01 00:00:00', '2016-06-30 00:00:00', 'some.person', '2015-02-21 00:00:00', 'some.person', NULL)
    ,(4, 10, 'RX2015', 0, '2014-07-01 00:00:00', '2020-06-30 00:00:00', 'some.person', '2015-02-21 00:00:00', 'some.person', NULL)
    ,(5, 3, 'STD-NM', 0, '2014-07-01 00:00:00', '2020-06-30 00:00:00', 'some.person', '2015-02-23 00:00:00', 'some.person', NULL)
    ,(7, 1, 'DPCP2015', 0, '2014-07-01 00:00:00', '2016-06-30 00:00:00', 'some.person', '2015-03-13 00:00:00', 'some.person', NULL)
    ,(8, 4, 'WIC', 0, '2014-07-01 00:00:00', '2020-06-30 00:00:00', 'some.person', '2015-03-19 00:00:00', 'some.person', NULL)
    ,(14, 9, 'IM-VFC', 0.0000, '2014-07-01 00:00:00', '2020-06-30 00:00:00', 'some.person', '2015-03-26 00:00:00', 'some.person', NULL)
    ,(15, 24, 'RX2015-OLD', 0, '2014-07-01 00:00:00', '2020-06-30 00:00:00', 'some.person', '2015-03-26 00:00:00', 'some.person', NULL)
    ,(18, 28, 'RX-RABIES', 0, '2014-07-01 00:00:00', '2020-06-30 00:00:00', 'some.person', '2015-05-21 00:00:00', 'some.person', NULL)
    ,(20, 14, 'CMS', 0, '2014-07-01 00:00:00', '2020-06-30 00:00:00', 'some.person', '2015-05-27 00:00:00', 'some.person', NULL)
    ,(21, 13, 'HRFY2017', 0, '2016-07-01 00:00:00', '2019-06-30 00:00:00', 'some.person', '2016-07-11 00:00:00', 'some.person', NULL)
    ,(25, 32, 'HR-OPE', 0, '2017-01-03 00:00:00', '2020-06-30 00:00:00', 'some.person', '2017-01-03 00:00:00', 'some.person', NULL)
    ,(30, 29, 'BHEM SUPPLIES', 0, '2017-03-14 00:00:00', '2019-06-30 00:00:00', '', '2017-03-14 00:00:00', '', NULL)
    ,(32, 34, 'RX2019', 0, '2019-04-16 00:00:00', '2019-06-30 00:00:00', 'some.person', '2019-04-17 00:00:00', 'some.person', NULL)
    ,(33, 13, 'HRFY2020', 0, '2019-07-01 00:00:00', '2021-06-30 00:00:00', 'some.person', '2019-07-01 00:00:00', 'some.person', NULL)
    ,(36, 33, '1', 0, '2019-07-01 00:00:00', '2020-06-30 00:00:00', 'some.person', '2019-12-20 00:00:00', 'some.person', NULL)

     

    If you have many thousands of rows to insert, you may find it works faster if you split the inserts into batches.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Phil Parkin wrote:

    No need for UNION:

    INSERT [PW].[Contract] ([ID],[ProgramID],[PAContractNum],[PAContractAmt],[PAContractStartDate],[PAContractExpirationDate],[CreatedBy],[CreatedOn],[UpdatedBy],[UpdatedOn]) VALUES (1, 8, '15FHBFPP0115', 0.0000, '2014-07-01 00:00:00', '2016-06-30 00:00:00', 'some.person', '2015-02-11 00:00:00', 'some.person', NULL)
    ,(2, 8, '15FHBFPP0102', 0.0000, '2014-07-01 00:00:00', '2020-06-30 00:00:00', 'some.person', '2015-02-11 00:00:00', 'some.person', NULL)
    ,(3, 13, 'HRFY2015', 0, '2014-07-01 00:00:00', '2016-06-30 00:00:00', 'some.person', '2015-02-21 00:00:00', 'some.person', NULL)
    ,(4, 10, 'RX2015', 0, '2014-07-01 00:00:00', '2020-06-30 00:00:00', 'some.person', '2015-02-21 00:00:00', 'some.person', NULL)
    ,(5, 3, 'STD-NM', 0, '2014-07-01 00:00:00', '2020-06-30 00:00:00', 'some.person', '2015-02-23 00:00:00', 'some.person', NULL)
    ,(7, 1, 'DPCP2015', 0, '2014-07-01 00:00:00', '2016-06-30 00:00:00', 'some.person', '2015-03-13 00:00:00', 'some.person', NULL)
    ,(8, 4, 'WIC', 0, '2014-07-01 00:00:00', '2020-06-30 00:00:00', 'some.person', '2015-03-19 00:00:00', 'some.person', NULL)
    ,(14, 9, 'IM-VFC', 0.0000, '2014-07-01 00:00:00', '2020-06-30 00:00:00', 'some.person', '2015-03-26 00:00:00', 'some.person', NULL)
    ,(15, 24, 'RX2015-OLD', 0, '2014-07-01 00:00:00', '2020-06-30 00:00:00', 'some.person', '2015-03-26 00:00:00', 'some.person', NULL)
    ,(18, 28, 'RX-RABIES', 0, '2014-07-01 00:00:00', '2020-06-30 00:00:00', 'some.person', '2015-05-21 00:00:00', 'some.person', NULL)
    ,(20, 14, 'CMS', 0, '2014-07-01 00:00:00', '2020-06-30 00:00:00', 'some.person', '2015-05-27 00:00:00', 'some.person', NULL)
    ,(21, 13, 'HRFY2017', 0, '2016-07-01 00:00:00', '2019-06-30 00:00:00', 'some.person', '2016-07-11 00:00:00', 'some.person', NULL)
    ,(25, 32, 'HR-OPE', 0, '2017-01-03 00:00:00', '2020-06-30 00:00:00', 'some.person', '2017-01-03 00:00:00', 'some.person', NULL)
    ,(30, 29, 'BHEM SUPPLIES', 0, '2017-03-14 00:00:00', '2019-06-30 00:00:00', '', '2017-03-14 00:00:00', '', NULL)
    ,(32, 34, 'RX2019', 0, '2019-04-16 00:00:00', '2019-06-30 00:00:00', 'some.person', '2019-04-17 00:00:00', 'some.person', NULL)
    ,(33, 13, 'HRFY2020', 0, '2019-07-01 00:00:00', '2021-06-30 00:00:00', 'some.person', '2019-07-01 00:00:00', 'some.person', NULL)
    ,(36, 33, '1', 0, '2019-07-01 00:00:00', '2020-06-30 00:00:00', 'some.person', '2019-12-20 00:00:00', 'some.person', NULL)

    If you have many thousands of rows to insert, you may find it works faster if you split the inserts into batches.

    Values is the way to go - but needs breaking at 1000 rows per insert - see https://docs.microsoft.com/en-us/sql/t-sql/queries/table-value-constructor-transact-sql?view=sql-server-ver15

    it is faster than individual inserts by far.

    as you have a C# code already generating the individual inserts then it should be very easy to change it to use VALUES and implement the breakdown into chunks of less than 1k values.

  • Thanks for posting the link. I did not know about the 1000-row limit.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Thank you for letting me know about that 1000 record limit in using the VALUES clause for inserts. I'll try to remember that. Other tables my colleague's code uses would exceed 1000. So, I'll have to break it into up 1000 record inserts at a time.

    Incredibly good to know. Thank you, again!

    Kindest Regards, Rod Connect with me on LinkedIn.

  • Rod at work wrote:

    What I would like to ask here is if it is faster to perform a lot of INSERTS to repopulate tables, or if it would be better to perform one INSERT with several UNION statements, as I've seen done here in SQL Server Central's forums and articles.

    Just a couple of thoughts on this...

    First, it would be interesting to know why he kills and repopulates the tables.

    Second, why doesn't the C code just keep the tables up to date?

    Third, where are you getting the scripts from?  Are they being created by the app?  If so, have the app write to a file and then do a BULK INSERT... then you don't have to worry about any limits and it'll still be nasty fast.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Good questions, Jeff. Taking your questions in order, here's my answers.

    First, this is a test/development database. The schema is still in flux. Sometimes he'll make changes to the schema, then he must drop and reload the database or table. (Actually, now I'm wondering why he doesn't drop the table, instead of emptying it out. I'll have to ask him that.)

    Second, the C# code does keep the table up to date. In fact, I asked him about that just yesterday. He has the SQL script to do the work, but then ignores the SQL script to do everything in C# code. He duplicates the code there. He told me his answer, but I admit I don't understand why he has the code duplicated in two places.

    Third is a good question. What I think is happening is he has a license for TOAD on his machine. I do know that TOAD can generate the DDL code to create tables and databases. It might also generate test data. I'm guessing as I have no experience with TOAD.

    I've got a question for you about BULK INSERT. I think I've only used it once before, many years ago, so I've not had any chance to build skills at using it. Are you saying that if we use BULK INSERT we do not have to disable the constraints, then re-enable them again afterwards? And if I adopt that method, do I still do as Phil has recommended by using multiple VALUES?

    Now I want to bring up something I've only learned yesterday, while discussing with my colleague questions about this app he's worked on. He told me that no one, except for myself, has come to him asking him any questions about all the apps he's worked on for almost 20 years. He's been here a lot longer than I have, but we're both shocked at the total disregard for learning about the apps he's worked on, documentation on the apps, the state the apps are in, what repository the source code is in (hoping that it is in a repo somewhere), etc. Again, we are blindsided by practices of where we work. I just don't get how no one seems to care. Others are going to have to pick up the slack and do something to one of the dozens of apps he's worked on over the years when something will have to be changed. But it just comes across as though they'll throw whoever the unfortunate person is, into the deep end of the pool to either sink or swim.

    Kindest Regards, Rod Connect with me on LinkedIn.

  • On the subject of BULK INSERT...

    As with anything else, "It Depends" but, no, you don't actually have to disable all sorts of constraints to bulk load data into a table using BULK INSERT.  There are some speed tricks that use minimal logging , etc, but using BULK INSERT is pretty much the same as using INSERT/SELECT/UNION ALL or INSERT/SELECT/VALUES and it will absolutely beat the tar out of the old INSERT/SELECT or INSERT?VALUES that use one insert per row.  The biggest difference is where BULK INSERT gets its data from... files.  Generally speaking, my "go to" file format is Tab Separated Values (or "TSV" for short).

    In 2017, Microsoft finally figured out that the CSV (Comma Separated Values, which also include double-quotes) format is a thing (and has been for more than 4 decades) and is still much less bloated than XML, JSON, HTML, and a whole bunch of other junk and finally made it so that BULK INSERT will finally handle true CSV according to the long standing CSV spec.

    There are the occasional oddities about BULK INSERT (it counts delimiters, not rows/lines of data) and doesn't handle ragged right (which is poorly formed "inconsistent data" that should be avoided to begin with... even XML has "well formed" rules) and it's not a panacea for every bloody format someone can pull out of their ear but, with well formed data, it's actually the fastest method there is to load external tabular data in SQL Tables.  It'll even sequester "bad rows" in a separate file for re-evaluation and repair and can change the order of columns between the input and the target table and skip columns on either side, etc, etc.

    It's also a seriously underrated and underused tool just like OPENROWSET is.  Between the two, I've not had a need to even learn how to spell SSIS except to replace it.

    BCP (Bulk Copy Program) is another tool that comes with SQL Server and it works in a very similar manner as BULK INSERT except it can also be used to pull data out of SQL Tables into files.

    BULK INSERT, BCP, and OPENROWSET all require some "brute force" knowledge of data and that's why a lot of people steer away from them.  Seems like they can't be bothered if there's no GUI to click.  Anyone that claims that they're into ETL and ELT and other slaughtering names for data movement between systems that can't actually use those 3 things, isn't really into data, IMHO.  You can do some truly amazing stuff with them considering they were first made way back before today's developers parents were married. 😀

    And they all blow XML out of the water when it comes to high performance, voluminous data transfers especially if you get into "native" data transfers with you actually pass the 4 bytes of INTs, etc, instead of all the human readable stuff that people insist on. 😀

    Of course, that's all just  my opinion and a whole lot of people will say that you no longer need to know things like data structure, etc.  I agree that most people don't need to because they're users of one sort or another and the GUI's they use will usually suffice for what they want to do... at some expense.  For example, imagine all the fun people had when Microsoft up and said "Yeah... you need to convert all of your hundreds of DTS packages so that they work with this new SSIS thing".  It wasn't all auto-magic, either.

    On the app discussion...

    The typical reason why no one has asked folks like him any questions about the apps being developed that are key to the business is that they have no clue.  Most of the general public, including leaders of companies and managers and all of the users have no clue about anything having to do with computers or IT except that they use the computer to do their job.  They don't even know the program is sometimes referred to as an "app" unless it's on their phone and then they might not have a clue that those apps are actually programs and that their phone is actually a very sophisticated computer.

    It sounds like that's what has happened where you work at.  Even management isn't aware of what they have and what they need to protect.  Do they even have an IT manager where you work at, never mind a good one?  We already know the last part of that is true... you don't have a good one if there's one at all or you wouldn't be having the observations that you're having.

    There definitely needs to be a culture change there.  The trouble is, if you try to institute such a culture change in a government agency, you're likely to end up being outsourced.  Terribly sad but true.

    Like my Mom used to say, sometimes you need to just "Do the best you can and can the rest".

     

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thank you, Jeff, good information

    Kindest Regards, Rod Connect with me on LinkedIn.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply