June 4, 2012 at 5:54 am
I created a very big fact table on my own laptop using the following blog by Kalen Delaney:
Geek City: Build a Big Table with a Columnstore Index
Essentially it copies dbo.FactInternetSales in itself 9 times, giving a resulting table of about 30 million rows.
I ran the following simple query:
SELECT
g.EnglishCountryRegionName
,psc.EnglishProductSubcategoryName
,SalesAmount= SUM(fact.SalesAmount)
,Quantity= SUM(fact.OrderQuantity)
FROM
dbo.FactInternetSales fact
INNER JOIN
dbo.DimDate d
ON fact.OrderDateKey = d.DateKey
INNER JOIN
dbo.DimCustomer c
ON fact.CustomerKey = c.CustomerKey
INNER JOIN
dbo.DimGeography g
ON c.GeographyKey = g.GeographyKey
INNER JOIN
dbo.DimProduct p
ON fact.ProductKey = p.ProductKey
INNER JOIN
dbo.DimProductSubcategory psc
ON p.ProductSubcategoryKey = psc.ProductSubcategoryKey
INNER JOIN
dbo.DimProductCategory pc
ON psc.ProductCategoryKey = pc.ProductCategoryKey
WHEREd.CalendarYear = 2008
AND pc.EnglishProductCategoryName = 'Bikes'
GROUP BY g.EnglishCountryRegionName,psc.EnglishProductSubcategoryName
Very simple, no real whistles.
It run for about 2 minutes on this heap of 30 million rows on my 64-bit laptop.
Then I added a unique clustered index on the table with the following query:
CREATE UNIQUE CLUSTERED INDEX clus_FactInternetSalesBig ON FactInternetSalesBig(SalesOrderNumber, SalesOrderLineNumber);
It took about 30 minutes to create this index.
I ran the query again, after clearing the cache and it took an astonishing 32 minutes to complete.
I restarted the service, as most of my RAM was eaten by SQL Server and I couldn't really work with my laptop anymore (that shall teach my not to set Maximum Server Memory Limit :-))
I reran the query after the service was restarted and now it took about 6 minutes.
It just seems strange to me that adding a clustered index slows down a query. Can the sorting in a clustered index really mess up a query that much?
Anyone have an idea why this happened?
ps: I know I can speed up the query by adding indexes on the join columns, that's the next step of my testing process 😀
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
June 4, 2012 at 7:36 am
I don't have the bandwidth to blow out a big factinternetsales table, but note the following:
1) indexes can change a plan - and sometimes the change may be for the worst.
2) have you updated stats (with full scan)?
3) in your query, you should not join to dimdate. That is unnecessary since you can do this:
WHEREfact.OrderDateKey BETWEEN 20080000 AND 20089999
This is a very common refactor. The fastest thing you can do in SQL Server is nothing, and in this case you do nothing with DimDate.
4) have you set max memory on your instance?
5) please show actual execution plans from each run
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
June 4, 2012 at 8:03 am
Also, how did you free the cache? There are many ways to free caches, each with different effect.
Your simple query does not include the new table. Is that the right query?
June 4, 2012 at 8:25 am
I can't reproduce the behavior that you are describing on my SQL 2012 VM at all. As a heap it takes 36 seconds to run the query after generating the large fact table with 30 million rows. After building the clustered index it takes 26 seconds from a cold buffer poll (I issued CHECKPOINT to write any dirty pages out to disk followed by DBCC DROPCLEANBUFFERS to clear the clean buffer pages from the cache). The actual execution plans in my tests are identical for the heap and the clustered index, the only difference being a Table Scan instead of a Clustered Index Scan, but the shape and other logical operations are identical.
Jonathan Kehayias | Principal Consultant | MCM: SQL Server 2008
My Blog | Twitter | MVP Profile
Training | Consulting | Become a SQLskills Insider
Troubleshooting SQL Server: A Guide for Accidental DBAs[/url]
June 4, 2012 at 2:07 pm
Thanks everyone for their responses so far.
@robert: I cleaned the caches with the following statements:
DBCC DROPCLEANBUFFERS;
DBCC FREEPROCCACHE;
I'll add the CHECKPOINT command to this statements in the future.
The simple statement I provided uses indeed the normal fact table (I wrote the query for this forum at work so I used a regular AdventureWorks), but the big table is just the same fact table - same schema - but just with more rows into it.
It's not my intent to optimize the query itself. I just need a long running query on a large table for a columnstore index demo. It's just that I'm surprised that adding an index could worsen performance so much (at least on my machine).
I did not set max memory limit, but I will set it to 4Gb 'cause I'd still like to use my laptop during the testing 🙂
Newbie question: is it necessary to update statistics when the index has just been created and when you expect a scan instead of a seek?
@jonathan-2: waw, such a difference in execution time. Did you test on an actual server or just on your demo machine?
I do have only the clustered index, which doesn't help at all for the query, no other indexes to speed up things.
I'm rerunning test now with statistics on and actual execution plans on, so I'll be able to update you with more information.
Thanks again!
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
June 4, 2012 at 2:29 pm
I didn't experience any problems with the queries either. I ran it on my laptop with 8 CPU and 16 GB of RAM (which was running 3 other SQL instances at the same time).
June 4, 2012 at 2:32 pm
It is a VM running in VirtualBox on my laptop. The VM is configured with 2 vCPU and 4GB RAM and is running on a USB3 external drive using a OCZ Agility 3 240GB SSD.
Jonathan Kehayias | Principal Consultant | MCM: SQL Server 2008
My Blog | Twitter | MVP Profile
Training | Consulting | Become a SQLskills Insider
Troubleshooting SQL Server: A Guide for Accidental DBAs[/url]
June 4, 2012 at 2:33 pm
Robert Davis (6/4/2012)
I didn't experience any problems with the queries either. I ran it on my laptop with 8 CPU and 16 GB of RAM (which was running 3 other SQL instances at the same time).
Hmmm. Is it bad that I'm running SQL Server on Windows 7 instead of Windows Server 2008R2?
Can the OS have a detrimental impact?
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
June 4, 2012 at 2:51 pm
My laptop is on Windows 7 too. I wouldn't use it on real servers, but for testing, it should be fine.
June 4, 2012 at 3:25 pm
Allright, I reran the scripts and the results are in attachment.
The results are quite inconsistent. The first time creating the index took me about half an hour, the second time only 18 minutes.
Execution time of the query with the clustered index varies between 5 and 7 minutes.
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
June 4, 2012 at 6:03 pm
What hardware are you using for the test? What kind of disk is this on? I would suspect that if you monitor the I/O latency and block sizes you might start to see the correlation of your problem with performance.
Jonathan Kehayias | Principal Consultant | MCM: SQL Server 2008
My Blog | Twitter | MVP Profile
Training | Consulting | Become a SQLskills Insider
Troubleshooting SQL Server: A Guide for Accidental DBAs[/url]
June 4, 2012 at 11:11 pm
The hardware is just a standard Dell laptop, with 8Gb RAM and i7 Intel processor.
I'm at work now, so I'll post the details later.
I created the columnstore index and now the query runs less than one second, so the problems are very likely IO related.
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
June 4, 2012 at 11:39 pm
Koen Verbeeck (6/4/2012)
The hardware is just a standard Dell laptop, with 8Gb RAM and i7 Intel processor.I'm at work now, so I'll post the details later.
I created the columnstore index and now the query runs less than one second, so the problems are very likely IO related.
If you are using the standard dell SATA disk, that is really the most likely cause of your differences given the information you've provided so far. SSDs really make the difference in laptop performance these days for being able to test server like workloads against SQL.
Jonathan Kehayias | Principal Consultant | MCM: SQL Server 2008
My Blog | Twitter | MVP Profile
Training | Consulting | Become a SQLskills Insider
Troubleshooting SQL Server: A Guide for Accidental DBAs[/url]
June 4, 2012 at 11:40 pm
Jonathan Kehayias (6/4/2012)
Koen Verbeeck (6/4/2012)
The hardware is just a standard Dell laptop, with 8Gb RAM and i7 Intel processor.I'm at work now, so I'll post the details later.
I created the columnstore index and now the query runs less than one second, so the problems are very likely IO related.
If you are using the standard dell SATA disk, that is really the most likely cause of your differences given the information you've provided so far. SSDs really make the difference in laptop performance these days for being able to test server like workloads against SQL.
I shall quote this when I talk to my manager about a new laptop 😀
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
June 4, 2012 at 11:51 pm
For what you've spec'd already, I wouldn't even go after a new laptop. I've gone the path of buying my own SSD in the past and asked permission to use that in place of the rotating disk with the understanding that if I left either the SSD remained company property or it had to be wiped using the appropriate tools, just to have the performance. A consumer grade 256GB SSD is less than $1 per GB on Newegg today, so it's not really a big loss if you have to give it up given the performance difference it would make. My last 2 employers agreed to purchase the SSD even when the were around $400 for a 256GB SSD over buying a new laptop. My basic laptop for main use, and the one that I am writing this on is a Core i5 with 8GB RAM and a 256GB SSD, which is way more than most people need for SQL Server testing and even presenting. If you have a USB 3.0 port or a eSATA port, you can get phenomenal performance from an external disk for really cheap.
Note that the USB 3.0 is slower than eSATA, so I would go that route if you can. If I had a laptop with standard disk I couldn't change, I'd go the eSATA route, because you can easily get a PCMCIA card to support it.
Jonathan Kehayias | Principal Consultant | MCM: SQL Server 2008
My Blog | Twitter | MVP Profile
Training | Consulting | Become a SQLskills Insider
Troubleshooting SQL Server: A Guide for Accidental DBAs[/url]
Viewing 15 posts - 1 through 15 (of 17 total)
You must be logged in to reply to this topic. Login to reply