June 13, 2012 at 8:46 am
GilaMonster, That is a very good question. We are doing a kind of data ware housing , here we are creating set of tables for reports.After running this procedure these tables are created. since we are dealing with huge data, we went for " select * into " . Also I didn't mention one thing, ie we are doing some aggregate functions in the select query. for ex select c1,sum(c2),c3,c4.... into table2 from table 1 group by c1,c3,c4.
The query is using table spool operators and all. But I am restricted to do the logic change due to some reasons.
June 13, 2012 at 8:52 am
Daxesh Patel (6/13/2012)
if table has a large size data, I think SELECT * INTO should be avoided due to its limitations.I would create New table ahead of time with partitions (each partition on different disk) and use INSERT INTO SELECT ...
I would also consider in my test applying data compression in both the tables
Be careful about this approach. Partitioning is a data management tool, not a performance enhancement tool. You could actually slow down the inserts instead of speeding them up.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
June 13, 2012 at 8:54 am
I've sped up a lot of procedures recently by removing SELECT ... INTOs and changing the procedures to work directly off the base tables. Try that if it's possible, see if it's faster. With good indexing, it may very well be faster.
Are you saying then that the query you want help with is not the SELECT * INTO Table2 FROM Table1 that you asked about and that many people have spend time trying to help you with?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
June 13, 2012 at 8:57 am
No Gail shaw, it is not like that. we are also doing both the steps.
June 13, 2012 at 9:00 am
Varun R (6/13/2012)
GilaMonster, That is a very good question. We are doing a kind of data ware housing , here we are creating set of tables for reports.After running this procedure these tables are created. since we are dealing with huge data, we went for " select * into " . Also I didn't mention one thing, ie we are doing some aggregate functions in the select query. for ex select c1,sum(c2),c3,c4.... into table2 from table 1 group by c1,c3,c4.The query is using table spool operators and all. But I am restricted to do the logic change due to some reasons.
So, you are dropping the tables and recreating them each time? If so, why? Why not just add/update the missing/changed data? If this is a "warehouse," data should not be changing all that frequently (some may argue it should not change at all, only be inserted) except for inserts.
Jared
CE - Microsoft
June 13, 2012 at 9:13 am
This is not a pure data ware house. This code is already developed by other team. Currently we are doing optimization. we cant change this logic, because it will affect all the system as of now. I know that what they did is wrong. But currently our focus is to improve the performance of the existing procedures.
June 13, 2012 at 9:52 am
Grant Fritchey (6/13/2012)
Daxesh Patel (6/13/2012)
if table has a large size data, I think SELECT * INTO should be avoided due to its limitations.I would create New table ahead of time with partitions (each partition on different disk) and use INSERT INTO SELECT ...
I would also consider in my test applying data compression in both the tables
Be careful about this approach. Partitioning is a data management tool, not a performance enhancement tool. You could actually slow down the inserts instead of speeding them up.
I have to take my statement back, yes partitions may slow down inserts especially when destination table is wipe n load
Thanks
June 13, 2012 at 9:54 am
How about ensuring the destination table is on a different filegroup in the database to the source, and that filegroup is mapped to files on different physical discs to the source?
Mike
June 13, 2012 at 9:12 pm
Varun R (6/13/2012)
Hi All,Data : 9 Million records
144 GB Ram
what about the Performance of Select * into table2 from table1 in Bulk recovery model.
Here it is taking around 1 hour
Any one has any idea to improve the query to 1-10 Minutes?
Regards,
Varun
I have to echo Gail's thoughts on this. Instead of copying the data, just use it. Right good queries on good indexes and you should have no problems. When it comes to properly written queries, 9 million rows just isn't that much.
As a side bar, if you can move "reporting" to a different server that's being keep up to date by (hopefully) a "san snapshot", then do it. You won't have to worry about copying 9 million row tables and you won't have to worry about reporting users making a mistake in a query possibly dragging the system down to a crawl.
--Jeff Moden
Change is inevitable... Change for the better is not.
June 13, 2012 at 9:14 pm
Daxesh Patel (6/13/2012)
if table has a large size data, I think SELECT * INTO should be avoided due to its limitations.
What "limitations" are you speaking of?
--Jeff Moden
Change is inevitable... Change for the better is not.
June 13, 2012 at 9:24 pm
I feel the best op tion would be to use SET ROWCOUNT option and use while loop to move the data in batches. This could improve the performance a bit.
June 14, 2012 at 4:32 am
Jeff Moden (6/13/2012)
Daxesh Patel (6/13/2012)
if table has a large size data, I think SELECT * INTO should be avoided due to its limitations.What "limitations" are you speaking of?
It has many limitations mentioned here.
some of them...
- you cannot specify file group, INTO will always create new table in default file group
- you cannot specify data types of columns (especially calculated column in select)
basically no control on new table definition
June 14, 2012 at 5:07 am
Daxesh Patel (6/14/2012)
...- you cannot specify data types of columns (especially calculated column in select)
basically no control on new table definition
You can specify data types of columns:
SELECT CAST(NULL AS TINYINT) AS Col1
,CAST(NULL AS CHAR(10)) AS Col2
,CAST(1 +15 /100 AS NUMERIC(10,5)) AS Col3
INTO NewTable
June 14, 2012 at 5:10 am
Daxesh Patel (6/14/2012)
...- you cannot specify file group, INTO will always create new table in default file group
...
Depends, sometimes (for example in DW serial load) the following is possible:
ALTER DATABASE MyDB
MODIFY FILEGROUP NewOne DEFAULT;
GO
SELECT CAST(NULL AS TINYINT) AS Col1
,CAST(NULL AS CHAR(10)) AS Col2
,CAST(1 +15 /100 AS NUMERIC(10,5)) AS Col3
INTO NewTable
GO
ALTER DATABASE MyDB
MODIFY FILEGROUP OldOne DEFAULT;
GO
June 14, 2012 at 5:57 am
Thanks for the cover, Eugene.
You can even specify if a column is supposed to be NULLable or not through the use of ISNULL.
About the only thing that you can do with SELECT/INTO is to transfer the formulas for calculated columns and to automatically transfer keys/indexes. Even in the FULL Recovery Mode, SELECT/INTO is frequently faster than creating the table first.
In other words, "It Depends". 😀
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 16 through 30 (of 30 total)
You must be logged in to reply to this topic. Login to reply