May 8, 2014 at 8:45 am
I need to do lots of filtered counts, similar to the counting done on Ebay or Amazon. eg. Cars(34500), Car Colors - Blue(3000), Red(2000), Silver(5000) ..... etc.
I have my standard count query:
select count(*), item from table
group by item
I also added a non-clustered index based on item, which on average produces the results from one million records in 150 ms.
Each different item count adds another approx 150 ms to the total.
So far at five different item counts the query time is approx 750ms to 780ms.
I need to add more counts but want to keep the query as fast as possible, is creating a SQL Indexed View for each item the only faster solution?
http://www.sqlbadpractices.com/
I found the above thread regarding SQL Indexed View performance on two million records, and it seems to offer a substantial performance increase.
Is that typically how companies like Ebay and Amazon do it?
Thanks
May 8, 2014 at 9:14 am
Try something like this:
select COUNT(*) AS Cars,
SUM(CASE WHEN Color = 'Blue' THEN 1 ELSE 0 END) AS Cars_Blue,
SUM(CASE WHEN Color = 'Red' THEN 1 ELSE 0 END) AS Cars_Red
from table
-- Gianluca Sartori
May 8, 2014 at 10:24 am
Thanks
May 8, 2014 at 10:46 am
Sadly it takes 'CPU time = 811 ms, elapsed time = 817 ms' for ONE query with nine items.
It is also a pain to type in the items names eg. States.
May 8, 2014 at 10:53 am
could you perhaps provide some sample table and data scripts to better understand?
________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
May 8, 2014 at 12:18 pm
something to play around with
SELECT TOP 1000000
TranID = IDENTITY(INT, 1, 1),
TypeID = 1 + CAST(Abs(Checksum(Newid()) % 999 ) AS INT),
variant = CHAR(Abs(Checksum(Newid())) % 4 + 65)
INTO TransData
FROM sys.all_columns ac1
CROSS JOIN sys.all_columns ac2
CROSS JOIN sys.all_columns ac3
CREATE NONCLUSTERED INDEX [NIC+jls] ON [dbo].[TransData] (
[TypeID] ASC, [variant] ASC)
GO
--SELECT TypeID, variant, COUNT(variant) AS varcnt
--FROM TransData
--where typeid = 103
--GROUP BY TypeID, variant
SET STATISTICS TIME ON
SELECT typeID, COUNT(*) as cnt
, SUM(CASE WHEN variant = 'A'THEN 1 ELSE 0 END) AS varA
, SUM(CASE WHEN variant = 'B'THEN 1 ELSE 0 END) AS varB
, SUM(CASE WHEN variant = 'C'THEN 1 ELSE 0END) AS varC
, SUM(CASE WHEN variant = 'D'THEN 1 ELSE 0END) AS varD
FROM TransData
WHERE typeid = 103
GROUP BY typeid
ORDER BY typeid
SET STATISTICS TIME OFF
________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
May 8, 2014 at 3:50 pm
Thanks for the code sample.
I will try to get some data, and table samples tomorrow.
May 10, 2014 at 3:15 am
In my opinion it is not optimal to hard code the data hierarchy in this way. If an attribute is added, it would mean revisiting the code!
The window functions can handle this very efficiently, especially when there is an POC index in place. For further details on the subject, I strongly recommend Itzik Ben-Gan's book Microsoft SQL Server 2012 High-Performance T-SQL Using Window Functions (Developer Reference).
I did put together a little sample, first some data and then some code 😎
USE tempdb;
GO
IF EXISTS (SELECT * FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_NAME = N'TBL_COUNT_SAMPLE'
AND TABLE_SCHEMA = N'dbo')
DROP TABLE dbo.TBL_COUNT_SAMPLE;
CREATE TABLE dbo.TBL_COUNT_SAMPLE
(
SAMPLE_ID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED NOT NULL
,SAMPLE_NAME NVARCHAR(128) NOT NULL
,SAMPLE_CATEGORY NVARCHAR(128) NOT NULL
,SAMPLE_TYPE NVARCHAR(128) NOT NULL
,SAMPLE_SUBTYPE NVARCHAR(128) NOT NULL
,SAMPLE_COLOUR NVARCHAR(64) NOT NULL
);
INSERT INTO dbo.TBL_COUNT_SAMPLE
(
SAMPLE_NAME
,SAMPLE_CATEGORY
,SAMPLE_TYPE
,SAMPLE_SUBTYPE
,SAMPLE_COLOUR
)
SELECT
ITEM.ITEM_NAME
,CATEGORY.SAMPLE_CATEGORY
,MTYPE.SAMPLE_TYPE
,STYPE.SAMPLE_SUBTYPE
,COLOUR.SAMPLE_COLOUR
FROM (VALUES (N'Transportation'),(N'Recreation'),(N'Sport')) AS CATEGORY(SAMPLE_CATEGORY)
CROSS JOIN (VALUES (N'Car'),(N'Bicycle'),(N'Airplane'),(N'Motorcycle')) AS MTYPE(SAMPLE_TYPE)
CROSS JOIN (VALUES (N'Pedal'),(N'Petrol'),(N'Electric'),(N'HDV'),(N'Diesel')) AS STYPE(SAMPLE_SUBTYPE)
CROSS JOIN (VALUES (N'Red'),(N'Green'),(N'Blue'),(N'Yellow'),(N'Gray'),(N'Black'),(N'White')) AS COLOUR(SAMPLE_COLOUR)
CROSS JOIN (VALUES ('Abbott-Baynes Sailplanes') ,('ABC Motors') ,('ADC Aircraft') ,('Air Navigation and Engineering Company')
,('Airco') ,('The Airscrew Company') ,('Airship Industries') ,('Airspeed Ltd.') ,('AJEP') ,('Armstrong Whitworth Aircraft')
,('Arrow Aircraft Ltd.') ,('Auster') ,('Austin Motor Company') ,('Aviation Traders') ,('Avro') ,('Avro International Aerospace')
,('Beagle Aircraft') ,('William Beardmore and Company') ,('Blackburn Aircraft') ,('Boulton & Paul Ltd') ,('Boulton Paul Aircraft')
,('Bristol Aeroplane Company') ,('British Aerial Transport') ,('British Aerospace') ,('British Aircraft Company')
,('British Aircraft Corporation') ,('British Aircraft Manufacturing') ,('British and Colonial Aeroplane Company')
,('Central Aircraft Company') ,('Chilton Aircraft') ,('Chrislea Aircraft') ,('Cierva Autogiro Company')
,('Civilian Aircraft Company') ,('Clayton & Shuttleworth') ,('Comper Aircraft Company') ,('Cunliffe-Owen Aircraft')
,('Dart Aircraft') ,('De Havilland') ,('Desoutter Aircraft Company') ,('The Dunstable Sailplane Company') ,('E. D. Abbott Ltd')
,('Edgar Percival Aircraft') ,('Elliotts of Newbury') ,('English Electric') ,('Fairey Aviation Company') ,('Fane Aircraft Company')
,('Folland Aircraft') ,('Foster, Wikner Aircraft') ,('Garland Aircraft Company') ,('General Aircraft Limited')
,('Gloster Aircraft Company') ,('Gosport Aircraft Company') ,('Grahame-White') ,('Handley Page') ,('Hants and Sussex Aviation')
,('Hawker Aircraft') ,('Hawker Siddeley') ,('Heston Aircraft Company') ,('Hewlett & Blondeau') ,('Hordern-Richmond')
,('Hunting Aircraft') ,('John O. Isaacs') ,('Lakes Flying Company') ,('Lang Propellers') ,('Luton Aircraft') ,('M. B. Arpin & Co.')
,('Mann Egerton') ,('Martinsyde') ,('Miles Aircraft') ,('Moss Brothers Aircraft') ,('Napier & Son') ,('Nash & Thomson')
,('National Aircraft Factory') ,('National Aircraft Factory No. 2') ,('Nieuport & General Aircraft')
,('Norman Thompson Flight Company') ,('Parnall') ,('Parnall & Sons') ,('Phoenix Dynamo Manufacturing Company')
,('Port Victoria Marine Experimental Aircraft Depot') ,('Reid and Sigrist') ,('Rollason Aircraft and Engines')
,('Royal Aircraft Establishment') ,('Saunders-Roe') ,('Scottish Aviation') ,('Seaplane Experimental Station') ,('Siddeley-Deasy')
,('Sopwith Aviation Company') ,('Spartan Aircraft Ltd') ,('Supermarine') ,('Vickers Limited') ,('Vickers-Armstrongs')
,('Westland Aircraft') ,('Westland Helicopters') ,('J. Samuel White')
) AS ITEM(ITEM_NAME);
and then a query returning the number of types in each category
USE tempdb;
GO
/* CTE Category - Type */
;WITH CATEGORY_TYPE AS
(
SELECT
X.SAMPLE_CATEGORY
,X.SAMPLE_TYPE
FROM
(
SELECT
ROW_NUMBER() OVER
(
PARTITION BY
CS.SAMPLE_CATEGORY
,CS.SAMPLE_TYPE
ORDER BY
(SELECT NULL)
) AS CAT_TYPE_RID
,CS.SAMPLE_CATEGORY
,CS.SAMPLE_TYPE
FROM dbo.TBL_COUNT_SAMPLE CS
) AS X WHERE X.CAT_TYPE_RID = 1
)
/* Show count of Types in each Category */
SELECT
X.SAMPLE_CATEGORY
,X.CAT_TYPE_COUNT
FROM
(
SELECT
ROW_NUMBER() OVER
(
PARTITION BY
CS.SAMPLE_CATEGORY
ORDER BY
(SELECT NULL)
) AS CAT_TYPE_RID
,COUNT(CS.SAMPLE_TYPE) OVER
(
PARTITION BY
CS.SAMPLE_CATEGORY
ORDER BY
(SELECT NULL)
) AS CAT_TYPE_COUNT
,CS.SAMPLE_CATEGORY
FROM CATEGORY_TYPE CS
) AS X WHERE X.CAT_TYPE_RID = 1;
Results
SAMPLE_CATEGORY CAT_TYPE_COUNT
----------------- --------------
Recreation 4
Sport 4
Transportation 4
Statistics (IO and TIME) without an POC Index
[font="Courier New"]Table 'Worktable'. Scan count 6, logical reads 43, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'TBL_COUNT_SAMPLE'. Scan count 1, logical reads 589, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 62 ms, elapsed time = 48 ms.[/font]
Add POC Index
USE [tempdb]
GO
CREATE NONCLUSTERED INDEX [NCLIDX_POC_COUNT_SAMPLE] ON [dbo].[TBL_COUNT_SAMPLE]
(
[SAMPLE_CATEGORY] ASC,
[SAMPLE_TYPE] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
GO
Statistics (IO and TIME) with an POC Index
[font="Courier New"]Table 'Worktable'. Scan count 6, logical reads 42, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'TBL_COUNT_SAMPLE'. Scan count 1, logical reads 249, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 16 ms, elapsed time = 17 ms.[/font]
May 10, 2014 at 8:49 am
Eirikur,
Thanks for your suggestion and code sample.
Heavy duty stuff, my brain hurts just looking at it 🙂
Is that the same method used by Ebay and Amazon?
BTW I did create SQL Indexed Views for:
Employment Type: Full Time, Part Time, etc...
Education: Bachelors, Masters, etc...
States: VA, NY, etc....
Using the query below.
"Select count(*) from Table Group By State"
Each one takes about 2 ms to run after the Index View creation.
My main query also takes about 20ms to get each group of 10 records per page out of approx 750k records.
When i run the query and search for all 'Full-Time' jobs, the execution planner suggests adding a clustered index for each of the following: Employment Type, Education and States.
After adding the above clustered indexes, when i search for all 'Full-Time; jobs, it takes approx 0.88 seconds, to return 600k records.
I am very surprised by this as each of the queries run on their own takes a few ms, so i thought it would take about one tenth of a second not 0.88 seconds when they were all added together.
I looked at the execution planner but it is so complex i really don't know where the issues are and how to fix them.
Thanks
Ian
May 10, 2014 at 10:36 am
[font="Arial Black"]WHY[/font] are you returning 600K rows to begin with???:blink: And [font="Arial Black"]WHERE [/font]are you returning them to? :unsure:
--Jeff Moden
Change is inevitable... Change for the better is not.
May 10, 2014 at 10:50 am
My bad, the total count is 600k, records returned is only 10.
May 10, 2014 at 11:05 am
I have attached the plan, it seems like the clustered indexes would be faster if they were sql indexed views ?
May 10, 2014 at 12:02 pm
Eirikur,
I ran your sample query on my data and it took 284 ms vs my 0 ms using a SQL Indexed View.
I will research the book for some windowing function ideas.
Thanks
Ian
May 10, 2014 at 12:13 pm
Here are two more queries, somewhat self explanatory, first one is
CATEGORY-TYPE-NAME-COLOUR-COUNT(COLOUR)
and the second is
CATEGORY-TYPE-NAME-COLOUR-SUBTYPE-COUNT(SUBTYPE)
With a small set of 2000000 records, these are returning in 0.1-0.2 sec on my mediocre laptop.
😎
USE tempdb;
GO
DECLARE @CATEGORY NVARCHAR(128) = N'Sport';
DECLARE @TYPE NVARCHAR(128) = N'Airplane';
DECLARE @NAME NVARCHAR(128) = N'Sopwith Aviation Company';
SET STATISTICS IO ON;
SET STATISTICS TIME ON;
/* */
SELECT
X.SAMPLE_CATEGORY
,X.SAMPLE_TYPE
,X.SAMPLE_NAME
,X.SAMPLE_COLOUR
,X.COLOUR_COUNT
FROM
(
SELECT
ROW_NUMBER() OVER
(
PARTITION BY
CS.SAMPLE_CATEGORY
,CS.SAMPLE_TYPE
,CS.SAMPLE_NAME
,CS.SAMPLE_COLOUR
ORDER BY
(SELECT NULL)
) AS MAINETYPE_RID
,COUNT(CS.SAMPLE_SUBTYPE) OVER
(
PARTITION BY
CS.SAMPLE_CATEGORY
,CS.SAMPLE_TYPE
,CS.SAMPLE_NAME
,CS.SAMPLE_COLOUR
ORDER BY
(SELECT NULL)
) AS COLOUR_COUNT
,CS.SAMPLE_CATEGORY
,CS.SAMPLE_TYPE
,CS.SAMPLE_NAME
,CS.SAMPLE_COLOUR
FROM dbo.TBL_COUNT_SAMPLE CS
WHERE CS.SAMPLE_CATEGORY = @CATEGORY
AND CS.SAMPLE_TYPE = @TYPE
AND CS.SAMPLE_NAME = @NAME
) AS X WHERE X.MAINETYPE_RID = 1;
USE tempdb;
GO
DECLARE @CATEGORY NVARCHAR(128) = N'Sport';
DECLARE @TYPE NVARCHAR(128) = N'Airplane';
DECLARE @NAME NVARCHAR(128) = N'Sopwith Aviation Company';
SET STATISTICS IO ON;
SET STATISTICS TIME ON;
/* */
SELECT
X.SAMPLE_CATEGORY
,X.SAMPLE_TYPE
,X.SAMPLE_NAME
,X.SAMPLE_COLOUR
,X.SAMPLE_SUBTYPE
,X.COLOURSTYPE_COUNT
FROM
(
SELECT
ROW_NUMBER() OVER
(
PARTITION BY
CS.SAMPLE_CATEGORY
,CS.SAMPLE_TYPE
,CS.SAMPLE_NAME
,CS.SAMPLE_COLOUR
,CS.SAMPLE_SUBTYPE
ORDER BY
(SELECT NULL)
) AS MAIN_RID
,CS.SAMPLE_CATEGORY
,CS.SAMPLE_TYPE
,CS.SAMPLE_NAME
,CS.SAMPLE_COLOUR
,CS.SAMPLE_SUBTYPE
,COUNT(CS.SAMPLE_SUBTYPE) OVER
(
PARTITION BY
CS.SAMPLE_CATEGORY
,CS.SAMPLE_TYPE
,CS.SAMPLE_NAME
,CS.SAMPLE_COLOUR
,CS.SAMPLE_SUBTYPE
ORDER BY
(SELECT NULL)
) AS COLOURSTYPE_COUNT
FROM dbo.TBL_COUNT_SAMPLE CS
WHERE CS.SAMPLE_CATEGORY = @CATEGORY
AND CS.SAMPLE_TYPE = @TYPE
AND CS.SAMPLE_NAME = @NAME
) AS X WHERE X.MAIN_RID = 1;
May 10, 2014 at 1:01 pm
isuckatsql (5/10/2014)
I ran your sample query on my data and it took 284 ms vs my 0 ms using a SQL Indexed View.
Then you don't have to worry:w00t:
😎
Viewing 15 posts - 1 through 15 (of 26 total)
You must be logged in to reply to this topic. Login to reply