Select * into table2 from table1 - Performance Impact - what do u think?

  • GilaMonster, That is a very good question. We are doing a kind of data ware housing , here we are creating set of tables for reports.After running this procedure these tables are created. since we are dealing with huge data, we went for " select * into " . Also I didn't mention one thing, ie we are doing some aggregate functions in the select query. for ex select c1,sum(c2),c3,c4.... into table2 from table 1 group by c1,c3,c4.

    The query is using table spool operators and all. But I am restricted to do the logic change due to some reasons.

  • Daxesh Patel (6/13/2012)


    if table has a large size data, I think SELECT * INTO should be avoided due to its limitations.

    I would create New table ahead of time with partitions (each partition on different disk) and use INSERT INTO SELECT ...

    I would also consider in my test applying data compression in both the tables

    Be careful about this approach. Partitioning is a data management tool, not a performance enhancement tool. You could actually slow down the inserts instead of speeding them up.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • I've sped up a lot of procedures recently by removing SELECT ... INTOs and changing the procedures to work directly off the base tables. Try that if it's possible, see if it's faster. With good indexing, it may very well be faster.

    Are you saying then that the query you want help with is not the SELECT * INTO Table2 FROM Table1 that you asked about and that many people have spend time trying to help you with?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • No Gail shaw, it is not like that. we are also doing both the steps.

  • Varun R (6/13/2012)


    GilaMonster, That is a very good question. We are doing a kind of data ware housing , here we are creating set of tables for reports.After running this procedure these tables are created. since we are dealing with huge data, we went for " select * into " . Also I didn't mention one thing, ie we are doing some aggregate functions in the select query. for ex select c1,sum(c2),c3,c4.... into table2 from table 1 group by c1,c3,c4.

    The query is using table spool operators and all. But I am restricted to do the logic change due to some reasons.

    So, you are dropping the tables and recreating them each time? If so, why? Why not just add/update the missing/changed data? If this is a "warehouse," data should not be changing all that frequently (some may argue it should not change at all, only be inserted) except for inserts.

    Jared
    CE - Microsoft

  • This is not a pure data ware house. This code is already developed by other team. Currently we are doing optimization. we cant change this logic, because it will affect all the system as of now. I know that what they did is wrong. But currently our focus is to improve the performance of the existing procedures.

  • Grant Fritchey (6/13/2012)


    Daxesh Patel (6/13/2012)


    if table has a large size data, I think SELECT * INTO should be avoided due to its limitations.

    I would create New table ahead of time with partitions (each partition on different disk) and use INSERT INTO SELECT ...

    I would also consider in my test applying data compression in both the tables

    Be careful about this approach. Partitioning is a data management tool, not a performance enhancement tool. You could actually slow down the inserts instead of speeding them up.

    I have to take my statement back, yes partitions may slow down inserts especially when destination table is wipe n load

    Thanks

  • How about ensuring the destination table is on a different filegroup in the database to the source, and that filegroup is mapped to files on different physical discs to the source?

    Mike

  • Varun R (6/13/2012)


    Hi All,

    Data : 9 Million records

    144 GB Ram

    what about the Performance of Select * into table2 from table1 in Bulk recovery model.

    Here it is taking around 1 hour

    Any one has any idea to improve the query to 1-10 Minutes?

    Regards,

    Varun

    I have to echo Gail's thoughts on this. Instead of copying the data, just use it. Right good queries on good indexes and you should have no problems. When it comes to properly written queries, 9 million rows just isn't that much.

    As a side bar, if you can move "reporting" to a different server that's being keep up to date by (hopefully) a "san snapshot", then do it. You won't have to worry about copying 9 million row tables and you won't have to worry about reporting users making a mistake in a query possibly dragging the system down to a crawl.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Daxesh Patel (6/13/2012)


    if table has a large size data, I think SELECT * INTO should be avoided due to its limitations.

    What "limitations" are you speaking of?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I feel the best op tion would be to use SET ROWCOUNT option and use while loop to move the data in batches. This could improve the performance a bit.

  • Jeff Moden (6/13/2012)


    Daxesh Patel (6/13/2012)


    if table has a large size data, I think SELECT * INTO should be avoided due to its limitations.

    What "limitations" are you speaking of?

    It has many limitations mentioned here.

    some of them...

    - you cannot specify file group, INTO will always create new table in default file group

    - you cannot specify data types of columns (especially calculated column in select)

    basically no control on new table definition

  • Daxesh Patel (6/14/2012)


    ...

    - you cannot specify data types of columns (especially calculated column in select)

    basically no control on new table definition

    You can specify data types of columns:

    SELECT CAST(NULL AS TINYINT) AS Col1

    ,CAST(NULL AS CHAR(10)) AS Col2

    ,CAST(1 +15 /100 AS NUMERIC(10,5)) AS Col3

    INTO NewTable

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • Daxesh Patel (6/14/2012)


    ...

    - you cannot specify file group, INTO will always create new table in default file group

    ...

    Depends, sometimes (for example in DW serial load) the following is possible:

    ALTER DATABASE MyDB

    MODIFY FILEGROUP NewOne DEFAULT;

    GO

    SELECT CAST(NULL AS TINYINT) AS Col1

    ,CAST(NULL AS CHAR(10)) AS Col2

    ,CAST(1 +15 /100 AS NUMERIC(10,5)) AS Col3

    INTO NewTable

    GO

    ALTER DATABASE MyDB

    MODIFY FILEGROUP OldOne DEFAULT;

    GO

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • Thanks for the cover, Eugene.

    You can even specify if a column is supposed to be NULLable or not through the use of ISNULL.

    About the only thing that you can do with SELECT/INTO is to transfer the formulas for calculated columns and to automatically transfer keys/indexes. Even in the FULL Recovery Mode, SELECT/INTO is frequently faster than creating the table first.

    In other words, "It Depends". 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 16 through 30 (of 30 total)

You must be logged in to reply to this topic. Login to reply