Adding INSERT INTO ... to SELECT statement causes it to run significantly longer than expected

  • Hi all, I have been having an issue with an INSERT INTO ... SELECT statement taking a disproportionately long time over just the SELECT statement itself.

    I have been struggling to find a suitable explanation for this which on the surface seems quite odd. Not having been able to find much useful via Google (though I could be searching for the wrong terms) I thought I'd try it out here based on some of the great answers that regularly appear in these columns. Rather than an explicit SQL solution I'm really just wondering if someone has encountered this before and what may be the cause behind it.

    I have what could be called an 'averagely complex' SELECT statement which by itself takes an average of 2500ms to execute and produces 550 rows with the data that I am using. If I combine this with an INSERT INTO... however, all of a sudden it stretches out to over 8000ms.

    Note that the table being inserted into is only 7 columns wide (all of plain INT type) and there are only two basic non-clustered single-column indexes.

    Given that this is an insert of a mere 550 rows, the more than tripling of execution time seemed odd so I trialled a few scenarios.

    1) The longer insert time occurs whether or not the target table is empty or already populated with (000's of) rows.

    2) I created a temporary copy of the target table, empty and with no indexes, and modified the SQL to use it. It still took 8000ms+ to insert.

    3) I modified the original SQL to use

    SELECT ... INTO <table>

    (thereby creating a new table instead of using the existing table) - it still took 8000ms+

    4) To prove insert was working OK in general, I used

    SELECT * INTO temp FROM <target table>

    to copy the populated target table physically and it took < 100 millisecs.

    5) I modified the original statement using a very basic SELECT and some literal values

    INSERT INTO <target_table> (col1, col2, ...)

    SELECT tab.column1, 2, 3, 4, 5, 6, 7 FROM Table1 AS tab

    where Table1 had several thousand rows. This insert used the same target table, just a simpler SELECT data source and also took only 100 milliseconds or so.

    Given this, it looks like it is the combination of the original SQL with the INSERT which is causing the problem, rather than a general insert performance issue. So...

    6) I compared graphical execution plans for the statement both with the INSERT and without. Apart from the icons related to the INSERT operation at the very outer level they were essentially identical. I say 'essentially' because the 1% cost attributed to the INSERT operation had a corresponding effect (no more than 1%) on a couple of the inner operations. The plan showed the insert operation taking effectively zero CPU time (0.000001 secs) and 0.01 I/O cost.

    For reference sake I have included the SQL as well as the plans for comparison. I'm sure that there are a few improvements that could be made, however at this stage the real question for me is why the INSERT is taking so long when it should really only add a few milliseconds?

    Of course, I could obviously be wrong and it may be the code that is at fault - but that doesn't seem logical in this case (insert cheeky Microsoft comment here)

    If anyone can provide some enlightenment I'd be most grateful!

    Original SQL Code

    INSERT INTO client_carer (

    shift_id, carer_id, weighting, nominal_code, preferred, grade_penalty, availability_penalty )

    SELECT DISTINCT

    s.shift_id,

    c.carer_id,

    0,

    null,

    0,

    CASE WHEN cg.rank > scg.rank THEN 1 ELSE 0 END,

    CASE WHEN ap.ranking > 1 THEN 1 ELSE 0 END

    FROMdbo.shift s

    JOINdbo.client cl on cl.client_id = s.client_id

    JOINdbo.carer_grade scg on scg.carer_grade_id = s.min_grade_id

    JOIN#availability a on a.start_date_time <= s.date_start and a.end_date_time >= s.date_finish

    JOINdbo.carer c on c.carer_id = a.carer_id

    JOINdbo.carer_grade cg on cg.carer_grade_id = c.carer_grade_id

    JOINdbo.availability_preference ap on ap.preference_id = a.preference_id

    LEFT OUTER JOIN dbo.preferred_carer cmpt on cmpt.carer_id = c.carer_id and cmpt.client_id = cl.client_id

    WHERE

    cg.rank >= scg.rank

    AND(cl.group_id is null OR

    not exists (SELECT 1 FROM dbo.carer_group cwg WHERE cwg.carer_id = c.carer_id) OR

    exists (SELECT 1 FROM dbo.carer_group cwg WHERE cwg.carer_id = c.carer_id AND cwg.group_id = cl.group_id))

    AND(cmpt.preference is null OR not cmpt.preference = 7)

    ANDnot exists (SELECT 1

    FROM dbo.visit v

    WHERE v.carer_id = c.carer_id

    AND (v.cancellation_id is null OR v.cancellation_id in (-1, 0))

    AND (s.date_start < v.date_finish AND v.date_start < s.date_finish) )

  • the 1st thing that comes to mind , are there different disks for temp DB and data DB?

  • At any way you can follows basic rules to improve performance as

    http://dba.stackexchange.com/questions/6441/cxpacket-waits-performance-tune-for-sql-server-2008

  • darrylc-ssc (2/1/2012)


    If anyone can provide some enlightenment I'd be most grateful!

    Sorry to disappoint, but there's no obvious cause I can see from the plans. You might try the test inserting into a table with a clustered index (any clustered index), but if I had the system in front of me I would be looking at the usual performance counters (especially wait stats), perhaps clearing sys.dm_os_wait_stats before the query (using DBCC SQLPERF) and looking at the content of that DMV afterward for clues. The extra time will be going somewhere, it's just a case of tracking down where.

  • johnitech.itech (2/2/2012)


    At any way you can follows basic rules to improve performance as

    http://dba.stackexchange.com/questions/6441/cxpacket-waits-performance-tune-for-sql-server-2008

    Another random link from johnitech.itech *sigh*

  • Did you ever make any progress on this? I'm running into exactly the same issue with a query today. Except mine is a bit more dramatic - the SELECT by itself takes about 4 minutes to return to screen or INTO a table. But INSERT INTO -> SELECT is taking hours. I'm at a loss, I do this all the time!

  • I see a significant amount of places where you are creating triangular joins. I'm not sure how that is affecting your insert, but I would try to remedy those and see if it helps. Are you familiar with what is causing your triangular joins JOIN#availability a on a.start_date_time <= s.date_start and a.end_date_time >= s.date_finish

    Jared
    CE - Microsoft

  • I'd be interested in seeing the table definition for the target table, if you can provide that.

    I may have missed it in your description, but did you try inserting into a temp table, then into the target table from there?

    I'm used to Insert Select taking longer than simply Select, but your case seems excessive. Inserts are required to take locks, update indexes, check constraints, validate data type and length compatability, write to the transaction log, write to the I/O cache, calculate any persisted computed columns, populate any indexed views, possibly write to audit logs of various sorts, fire any relevant triggers, et al, while a simple Select has a lot less work to do (and the Insert Select has to do all that too, of course).

    Checking wait states and types is definitely where you'll need to go next, and those will be particular to your environment. Not something that can be easily replicated by any of us. But the table definition is worth looking into as well. (I know you tried Select Into, but there's still overhead on that if the source data has things like user-defined datatypes, or CLR data types, or if the database has certain types of audit logging enabled, and so on. So, worth checking.)

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Thanks very much for all your suggestions so far 🙂 but unfortunately thus far I haven't been able to make much progress on this issue as yet - sorry Patrick.

    Jared: Yes, the triangular joins aren't the best but there is a whole overlapping date range check which I've simplified down to what you see here (and it was a whole lot worse than that originally). I might have another look at this in the future but for now will have to leave that as it is. The thing is I'm sure that this (the insert delay) was not happening before for this very same statement.

    GSquared: Yes, I tried inserting to a temp table first and unfortunately it still took the extended time (though the copy into the true target table was quick). Table definition is as follows:

    CREATE TABLE [dbo].[client_carer](

    [shift_id] [int] NULL,

    [carer_id] [int] NULL,

    [weighting] [int] NULL,

    [nominal_code] [int] NULL,

    [preferred] [int] NULL,

    [grade_penalty] [int] NULL,

    [availability_penalty] [int] NULL

    ) ON [PRIMARY]

    There is a simple non-clustered index on each of shift_id and carer_id columns. Also no computed columns, user-defined or CLR datatypes, etc anywhere so that shouldn't be an issue.

    GSquared/Paul White: I've had a look at the wait stats but couldn't see anything much that would be affecting things. I'll see if I can post some results shortly but unfortunately on another task right now.

    Thanks again.

  • Is there no clustered index on this table? How about a primary key?

    Heaps often behave "strangely" compared to tables with clustered indexes. Lack of a primary key can also do unexpected things to insert performance. The definition doesn't have either item. (They're usually "synonymous", but not always.)

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • GSquared (2/14/2012)


    Is there no clustered index on this table? How about a primary key?

    Heaps often behave "strangely" compared to tables with clustered indexes. Lack of a primary key can also do unexpected things to insert performance. The definition doesn't have either item. (They're usually "synonymous", but not always.)

    Ha! That's probably the issue. The LACK of a primary key or clustered index... I thought I read something about that at one point. I will try to find it.

    Jared
    CE - Microsoft

  • For my issue I actually did try with and without a clustered PK. It didn't actually have any impact. What DID work was simplifying what was a very complex query down as much as possible. I had both functions and sub-queries and the only thing that seemed to do the trick was seperating them out into seperate steps in the data transformation. Once simplified down the INSERT/SELECT did work, minimally logged as I hoped.

    I maintain that the SQL optimizer is actually evaluating the SELECT execution plan differently in the context of an INSERT/SELECT vs. a simple SELECT.

  • Glad to hear you solved the issue, but could you provide us with execution plans from the original statement's insert versus just the select. Its nice that you "maintain" that :-), but we would like to see some actual data on it.

    Jared
    CE - Microsoft

  • Sure - as I said, these are pretty complex transformation queries for loading a "search" mart, so there is a lot going on here. I think the file names should be self explanatory though.

  • For my original query, the plans as provided back in the first post of this thread appeared to show no effective difference between the SELECT vs. the INSERT..SELECT ...

    I also tried creating a clustered and primary key (and both) index on the table to no avail, the INSERT still takes way longer. As an aside, the target table always starts out empty before this process runs and I have tried tests both updating statistics and not with no effect.

    I'm attaching screenshots of wait stats results I received after resetting the counters immediately before running the statement and then running SQL from sys.dm_os_wait_stats immediately afterwards.

    Looking at the results is would seem that the INSERT statement is suffering less waits than the SELECT only version - strange...

    Please note that the results shown ignore various "system" wait types and is based on the query found here: http://blog.sqlauthority.com/2011/02/01/sql-server-introduction-to-wait-stats-and-wait-types-wait-type-day-1-of-28/

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply