May 1, 2014 at 9:38 am
select top 15 count(*) as cnt, state from table
group by state
order by cnt desc
select top 15 count(*) as cnt, city from table
group by city
order by cnt desc
select top 15 count(*) as cnt, company from table
group by company
order by cnt desc
Can the above three queries be combined into one and still be fast, if so how?
What i am trying to go is an item count, by group, similar to ones Inbox in Outlook.
Thanks
May 1, 2014 at 9:52 am
check out Windowing functions. Something like:
select count(*) over (partition by city) citycount
, count(*) over (partition by state) statecount
...
May 1, 2014 at 9:56 am
Thanks
May 1, 2014 at 2:54 pm
isuckatsql (5/1/2014)
select top 15 count(*) as cnt, state from tablegroup by state
order by cnt desc
select top 15 count(*) as cnt, city from table
group by city
order by cnt desc
select top 15 count(*) as cnt, company from table
group by company
order by cnt desc
Can the above three queries be combined into one and still be fast, if so how?
What i am trying to go is an item count, by group, similar to ones Inbox in Outlook.
Thanks
I want to see your expected output. If New York City is the most populous city but California is the most populous state, how would a single output with the respective columns and counts portray that? NYC is in NY, not CA...
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
May 2, 2014 at 4:36 am
Kevin,
You are correct!
I am not sure on the best way to do this.
Based on the Microsoft Outlook scenario, i assume you would need one SP for each of the folders: Inbox, Sent, Junk.....etc.. which i'm sure would be horribly slow.
One of the suggestions made was to hold all the records in memory, then make the counts based on the records contained only in memory. Then update the records in memory every two hours.
Any other ideas?
Thanks
Ian
May 2, 2014 at 5:09 am
isuckatsql (5/2/2014)
Kevin,You are correct!
I am not sure on the best way to do this.
Based on the Microsoft Outlook scenario, i assume you would need one SP for each of the folders: Inbox, Sent, Junk.....etc.. which i'm sure would be horribly slow.
One of the suggestions made was to hold all the records in memory, then make the counts based on the records contained only in memory. Then update the records in memory every two hours.
Any other ideas?
Thanks
Ian
I'm not sure what this recommendation is based on...
Before SQL2014 you cannot really control what data will be hold in memory and for how long.
How would such a suggestion be implemented to work undereach and every condition? What SQL command would you use to "force" it into memory?
Or am I missing something here?
May 2, 2014 at 5:49 am
Have you thought about how the TOP(15) requirement will work? What if your user wants to see - and expand - a state node which isn't in the top 15?
Mimicing the Outlook menu system is fairly common. So long as you put a little thought into the data structure driving it, performance doesn't have to be an issue.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
May 2, 2014 at 12:56 pm
Lutz is correct, as usual. You cannot command SQL Server to "keep this table in memory all the time" like you want. However, it WILL stay in memory "all the time" if it is regularly referenced, which this type of dashboard aggregates probably will. So you do what you are suggesting - make one or more aggregate tables and refresh them on a period that is acceptable to the stake holders. To speed the refresh of said data I would keep track of the last "something" that was counted up through and use that as an indexed, seekable predicate in your query that computes the increments for your aggregates. There are various ways to accomplish this efficiently.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
May 2, 2014 at 2:03 pm
Is it better to do this with SQL or C# ?
I have the table running in memory via C# and the initial load takes 52 seconds, then each count, as a test all cities in CA, takes 0.03 seconds.
Thanks
May 2, 2014 at 9:50 pm
Seems quite silly (and unscalable) to me to read all the data just to read all the data iteratively. π
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
May 3, 2014 at 3:41 am
I agree, Kevin.
But if the OP really want to "force" the data to stay in memory, one way could be a dll that would query the aggregated data and keep the result in memory. Another dll would just return the data of the previous one.
The question is: why? If the aggregated data don't have to be accurate, why not create a tiny table that'll hold the aggregated results and refresh the table as needed?
May 3, 2014 at 5:15 am
A little trick with SIGN and the OVER clause makes this a simple query
π
USE AdventureWorks2012;
GO
;WITH MULTI_COUNTING AS
(
SELECT /* Sum the row numbers, gives distinct count */
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS PEAD_RID
,SUM(PEAD_AddressLine1 ) OVER (PARTITION BY (SELECT NULL)) AS COUNT_AddressLine1
,SUM(PEAD_City ) OVER (PARTITION BY (SELECT NULL)) AS COUNT_City
,SUM(PEAD_StateProvinceID ) OVER (PARTITION BY (SELECT NULL)) AS COUNT_StateProvinceID
,SUM(PEAD_PostalCode ) OVER (PARTITION BY (SELECT NULL)) AS COUNT_PostalCode
FROM
(
SELECT /* Row number 1 returns 1, else 0 */
1 + SIGN(1 - ROW_NUMBER() OVER (PARTITION BY PEAD.AddressLine1 ORDER BY (SELECT NULL))) AS PEAD_AddressLine1
,1 + SIGN(1 - ROW_NUMBER() OVER (PARTITION BY PEAD.City ORDER BY (SELECT NULL))) AS PEAD_City
,1 + SIGN(1 - ROW_NUMBER() OVER (PARTITION BY PEAD.StateProvinceID ORDER BY (SELECT NULL))) AS PEAD_StateProvinceID
,1 + SIGN(1 - ROW_NUMBER() OVER (PARTITION BY PEAD.PostalCode ORDER BY (SELECT NULL))) AS PEAD_PostalCode
FROM Person.Address PEAD
) AS COUNT_BASE
)
SELECT
MC.COUNT_AddressLine1
,MC.COUNT_City
,MC.COUNT_StateProvinceID
,MC.COUNT_PostalCode
FROM MULTI_COUNTING MC
WHERE MC.PEAD_RID = 1;
Results
COUNT_AddressLine1 COUNT_City COUNT_StateProvinceID COUNT_PostalCode
-------------------- -------------------- --------------------- --------------------
13567 575 74 661
May 3, 2014 at 5:45 am
But this query doesn't return the result set as requested since there's a TOP 15 involved for each group...
May 3, 2014 at 7:19 am
LutzM (5/3/2014)
But this query doesn't return the result set as requested since there's a TOP 15 involved for each group...
Ooops, the lights are on but no one's at home π
[mental note: read the full post]
Viewing 14 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply