February 2, 2008 at 12:09 pm
Suppose you created a file with 30,000 or more lines of "insert into values" to a table in TempDB. That is, you created the table like this" "create table #t1" CREATE TABLE [dbo].[#t1](
[AddressID] [int] NOT NULL,
[AddressLine1] [nvarchar](60) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[AddressLine2] [nvarchar](60) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[City] [nvarchar](30) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[StateProvinceID] [int] NOT NULL,
[PostalCode] [nvarchar](15) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[rowguid] [uniqueidentifier] ROWGUIDCOL NOT NULL,
[ModifiedDate] [datetime] NOT NULL
) ON [PRIMARY]
and then you had a 5MB file full on inserts to that table like this: insert into #t1 (AddressID, AddressLine1, AddressLine2, City, StateProvinceID, PostalCode, rowguid, ModifiedDate) values (1, '1970 Napa Ct.', null, 'Bothell', 79, '98011', '{9AADCB0D-36CF-483F-84D8-585C2D4EC6E9}', 'Jan 4 1998 12:00AM')
It doesn't matter if whe you recreate the case, the rows are the same or different. I used different rows using the script to generate inserts from the AdventureWorks Person.Address table.
This query takes a very long time to execute on TempDB. Paging file is on a different drive. TempDB is on a drive with RAID 0+1. The machine is a 64bit Opteron, with 32GB of RAM, and fast ethernet cards.
On my home computer this query takes 1 min and 30 seconds. But, on the monster company machine it takes from 7 to 20 min to complete.
Can somebody explain the SQL Server architecture with enough detail why this is so. Taking it from the TDS transmission to the relational engine and storage engine interactions, to how TempDb is managed.
The same query run against a user db runs faster. Also, when I run a loop inserting 30,000 rows, it also runs faster. And finally, when I run a SELECT INTO from the user db table to tempdb table, it's amazingly fast.
I'm not asking if this query should or should not be run, or whether is against best practices. Rather, I'm trying to understand why it is taking so long to complete on the monster machine when there are no users on it.
Your explanations are very much appreciated.
Ben Lopez
February 3, 2008 at 8:20 am
I have some new information on this issue.
I went ahead and purchased Itzik Ben-Gan new book on T-SQL Querying (this is not a plug for his book but it already saved a lot of time). I read his explanation of Query Processing at the physical level (Chapter 2). There he provided a script to peer into the actions taken during compilation. The script he provided queries the sys.dm_exec_query_optimizer_info view which is new in SQL Server 2005. I ran the script and was suprised to find out that the 5MB of inserts is a single plan comprised of 30,000 steps. So, I changed the script to force each insert to be its own batch by adding a GO after each insert. Now the query executes faster and consistently.
What's better an script that has a single execution plan with 30,000 steps or a script that re-uses an execution plan 30,000 times? Obviously, the latter.
Anyway, there is still a puzzle. How come the giant single batch script in SQL Server 2000 works fine but not in SQL Server 2005? In SQL Server 2005 each insert should in a batch to get better performance.
Ben Lopez
PS.
Thank you Itzik Ben-Gan, you are the greatest.
February 3, 2008 at 4:28 pm
The query engine can reuse the same plan 30000 times rather than making one HUGE plan. 🙂
Also, doing all the inserts in one batch is a great way to make a really big transaction that needs a lot of logging. You could try to include some begin transaction / commit transaction every so often. The GO statements effectively did this for you as well as separating the batches.
Finally, it may also be more efficient to insert the rows using
insert into myTable(x, y, z)
select '1', '2', '3'
UNION ALL
select '4', '5', '6'
UNION ALL
select '4', '5', '6'
UNION ALL
select '4', '5', '6'
rather than using an individual insert statement for each line. Is this something you're doing repeatedly or just once? You could also look at using a table lock.
February 4, 2008 at 4:35 am
Itzik is the greatest.
It sounds as if you're doing some sort of data migration. If you can, you should look into, in some way, turning this into a bulk load. Look up one of these as a possibility: BULK INSERT or OPENROWSET BULK.
2005 should be able to perform this, even if it's not the perfect approach. What error are you getting?
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
February 4, 2008 at 12:53 pm
Totally in favor of using BCP over that "30000" insert lines !!!!
* Noel
February 4, 2008 at 1:43 pm
Totally agree with the above suggestions.
If you happen to be building this 5MB file of inline commands - might be worth just making it into an XML data file, or any form of delimited file. 30K records using a single insert statement (with caveat listed below) should be a LOT faster than 90 seconds: more like 9 seconds if you ask me.
Caveat: if you try
INSERT INTO tableA
Select values UNION ALL
SElect values UNION ALL
....
with 30,000 select statements, you're also going to force some serious slowness, just because you are making the optimizer process and merge 30,000 SELECT statements. While that works great for a FEW atomic statements - it's going to "ping" your server pretty good. I don't know where the break point is, but 30K seems to be OVER that sweet spot for my hardware....
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
February 4, 2008 at 2:39 pm
I have now tried the UNION that unionizes each row values and references a single table in the INSERT as suggested by one of you. This was a little improvement. Others have suggested BCP which I also tested. This of course, it's the best solution and it run in under 1 sec. Amazingly fast and of course, this is what I have strongly recommended to my user.
But, now the user is out of the way, between you guys and me, there are some lingering questions.
For example, here is another pretty good solution. This is using a fully qualifed table name. For example instead of #t1, using tempDB.dbo.t1. This runs very fast but not as fast as BCP. Does any one know why. Is there is compile step when running BCP? Of courese, I could turn on Profiler and see what's going, but off the top of your head, what's happening when we run BCP or BULK INSERT or it's variation OPENROWSET in terms of the compilation step?
And also, why in SQL Server 2000, the batch runs so well?
Ben Lopez
Somebody asked various questions as what is the situation. This is not a one time deal. The user has a vendor program that generates an Excel file. They copy/paste into the batch that they run in Query Analyzer. In addition to the INSERTs, the script contains references to this table as well as other tables.
I have not investigated XML based solutions yet. Will we see performance similar to BCP?
February 4, 2008 at 5:35 pm
I will address the "why queries run fast/slow on 2000 and slow/fast on 2005" question: they have different optimizers. Therefore some queries will run the same, some will be faster and some WILL be slower on 2005 compared to 2000. In reality (and IMNSHO) it turns out on the whole to be a large percentage run the same, smaller percentage run faster and pretty small percentage run slower. The problem is that we feel offended when the slower part happens! 🙂
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
February 4, 2008 at 7:16 pm
yes, Sir... I smile with you...And thank you for giving us your general quantification. Your categories go to the crux of the problem. I, and I'm sure, the rest of the community would like to know more about the categories of queries that will run slower. Otherwise, this group will be out there, waiting to be surprised when we least expect it.
For example, if my client's case, they are running the latest and greatest machine and O/S. For this project, the client upgraded to a machine with 8 8218 AMD processor with 32GB of RAM, the latest EMC disk array on Windows Server 2003 Enterprise x64 Edition and SQL Server 2005.
Now because the Optimizer has been changed, some queries will not run as fast as before, even though the client has a monster machine. I'm reading on how the compilation has changed. Instead of Parse-Bind-Optimize, it's now Parse-Algebrize-Optimize. And it seems to me, that this was done to accommodate XML queries. Please help us understand the slower running queries category.
As for the client, I already made a strong recommendation to use BCP as somebody had recommended. I tested it and I'm sure that this approach will give the client, sub-second response time. Thanks to all of you participating on this forum/thread.
Ben Lopez
February 4, 2008 at 11:33 pm
Ben -
Just a couple of thoughts based on your previous comments, it sounds like you're currently programattically generating 30K+ individual insert statements which will be pretty slow no matter what. You don't mention what language you're using but you may want to look at the possibility of using either XML (e.g. create and xml doc which can be sent to SQL as a single document) or using the newer bulk insert capabilities in .net 2.0+.
Joe
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply