March 12, 2008 at 9:06 pm
Hey guys,
I have a table with about 80 columns and 400 millions records. Each columns has different responses that I need to get frequency for. I need to get counts for each response from all the columns... I have a query that does it, but it will run forever... what is the best way to do so?
My starting query:
select res, sum(cnt) from
(
select col1 res, count(*) as cnt from table1 with (nolock)
group by col1
union all
select col2 res, count(*) as cnt from table1 with (nolock)
group by col2
........................
select col80 res, count(*) as cnt from table1 with (nolock)
group by col80
)a group by res
March 12, 2008 at 9:20 pm
You might try breaking it up into smaller pieces.
For example, create a temp table, and insert the results for column 1 into that. Then as a separate query, insert the results for column 2. And so on, through each of the columns. It's a bit heavy handed, but it might break up the job enough to give the server a chance to do it.
Indexing the columns won't help much, since you need to count results from each one, and adding 80 indexes to a table will often just confuse the query optimizer. You might try it anyway, just to see if it works. You might be able to create a set of indexed views, one view for each column, and query from those instead of directly from the table, but that's a little bit of a stretch. Either of these will slow down inserts and updates and such, probably by a large margin.
How often do you have to pull this data? How heavy is the update/delete/insert traffic on this table?
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
March 12, 2008 at 9:23 pm
Another thing you might try, if there's a lot of data being added daily, but if the older data doesn't change much, is set up a nightly batch to dump a summary of the prior day's data into a summary table. Then all you have to do is query the data so far today, and add that to the data from the prior night.
That won't help much if the older data changes frequently, or if there's no datetime type field in the data indicating the time it was created or last updated. But, if there is such a field, and if the older data is pretty static, it might be a good solution.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
March 12, 2008 at 9:23 pm
Thanks for the response... this table is pretty busy but can afford to run the query for couple of hours... I need to do this as a one time job for now.
March 12, 2008 at 9:59 pm
Ghanta:
what is the data type and range of the response columns?
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
March 12, 2008 at 10:00 pm
dataType is char(1) and the values can be a-z, 0, 1, blank, nulls. Thanks.
March 12, 2008 at 11:00 pm
If the number of unique combinations of values is not large, you might start by creating an intermediate table with the rowcount for each unique combination of values across a range of columns. This may be faster, depending on the distribution of values, or it might not help at all or make it worse.
-- Reduce to unique row combinations
Insert into CountTable
select
col1 as x1,
col2 as x2,
col3 as x3,
...
col10 as x10,
RowCount = count(*)
from
MyTable
group by
col1,
col2,
col3,
...
col10
union all
select
col11 as x1,
col12 as x2,
col13 as x3,
...
col20 as x10,
RowCount = count(*)
from
MyTable
group by
col11,
col12,
col13,
...
col20
union all
... for rest of columns 21 to 80...
-- Then union to sum rowcounts
select
x1 res,
sum(RowCount) as cnt
from
CountTable
group by
x1
union all
select
x2 res,
sum(RowCount) as cnt
from
CountTable
group by
x2
union all
...
...
select
x10 res,
sum(RowCount) as cnt
from
CountTable
group by
x10
March 13, 2008 at 7:23 am
Michael Valentine Jones (3/12/2008)
If the number of unique combinations of values is not large, you might start by creating an intermediate table with the rowcount for each unique combination of values across a range of columns. This may be faster, depending on the distribution of values, or it might not help at all or make it worse.
This is the same idea that I have been working on, but I think that 5 columns is probably the safe limit and I would want to record them into an integer for better aggregate indexing. You will need a temp table to hold the intermediate stats and then re-collate that table as well
Here is that solution, except that I only included the first 10 columns (2 groups of 5), you will have to add the rest yourself (not to hard with cut and paste):
Select Res, sum(Cnt) as Cnt, ColA, ColB, ColC, ColD, ColE INTO #Stats
FROM (
Select
CASE When Col1 between 'a' and 'z' then ascii(lower(Col1)) - 95-- lowercaser letter -> 2-27
When Col1='0' or Col1='1' then ascii(Col1) - 48-- 0-1
When Col1=' ' then 28
When Col1 IS NULL then 29END
+CASE When Col2 between 'a' and 'z' then ascii(lower(Col2)) - 95-- lowercaser letter -> 2-27
When Col2='0' or Col2='1' then ascii(Col2) - 48-- 0-1
When Col2=' ' then 28
When Col2 IS NULL then 29END * 30
+CASE When Col3 between 'a' and 'z' then ascii(lower(Col3)) - 95-- lowercaser letter -> 2-27
When Col3='0' or Col3='1' then ascii(Col3) - 48-- 0-1
When Col3=' ' then 28
When Col3 IS NULL then 29END * 900
+CASE When Col4 between 'a' and 'z' then ascii(lower(Col4)) - 95-- lowercaser letter -> 2-27
When Col4='0' or Col4='1' then ascii(Col4) - 48-- 0-1
When Col4=' ' then 28
When Col4 IS NULL then 29END * 27000
+CASE When Col5 between 'a' and 'z' then ascii(lower(Col5)) - 95-- lowercaser letter -> 2-27
When Col5='0' or Col5='1' then ascii(Col5) - 48-- 0-1
When Col5=' ' then 28
When Col5 IS NULL then 29END * 810000
AS res, count_Big(*) as Cnt
, Min(Col1) as ColA, Min(Col2) as ColB, Min(Col3) as ColC, Min(Col4) as ColD, Min(Col5) as ColE
From table1 with (nolock)
Group by
CASE When Col1 between 'a' and 'z' then ascii(lower(Col1)) - 95-- lowercaser letter -> 2-27
When Col1='0' or Col1='1' then ascii(Col1) - 48-- 0-1
When Col1=' ' then 28
When Col1 IS NULL then 29END
+CASE When Col2 between 'a' and 'z' then ascii(lower(Col2)) - 95-- lowercaser letter -> 2-27
When Col2='0' or Col2='1' then ascii(Col2) - 48-- 0-1
When Col2=' ' then 28
When Col2 IS NULL then 29END * 30
+CASE When Col3 between 'a' and 'z' then ascii(lower(Col3)) - 95-- lowercaser letter -> 2-27
When Col3='0' or Col3='1' then ascii(Col3) - 48-- 0-1
When Col3=' ' then 28
When Col3 IS NULL then 29END * 900
+CASE When Col4 between 'a' and 'z' then ascii(lower(Col4)) - 95-- lowercaser letter -> 2-27
When Col4='0' or Col4='1' then ascii(Col4) - 48-- 0-1
When Col4=' ' then 28
When Col4 IS NULL then 29END * 27000
+CASE When Col5 between 'a' and 'z' then ascii(lower(Col5)) - 95-- lowercaser letter -> 2-27
When Col5='0' or Col5='1' then ascii(Col5) - 48-- 0-1
When Col5=' ' then 28
When Col5 IS NULL then 29END * 810000
UNION ALL
Select
CASE When Col6 between 'a' and 'z' then ascii(lower(Col6)) - 95-- lowercaser letter -> 2-27
When Col6='0' or Col6='1' then ascii(Col6) - 48-- 0-1
When Col6=' ' then 28
When Col6 IS NULL then 29END
+CASE When Col7 between 'a' and 'z' then ascii(lower(Col7)) - 95-- lowercaser letter -> 2-27
When Col7='0' or Col7='1' then ascii(Col7) - 48-- 0-1
When Col7=' ' then 28
When Col7 IS NULL then 29END * 30
+CASE When Col8 between 'a' and 'z' then ascii(lower(Col8)) - 95-- lowercaser letter -> 2-27
When Col8='0' or Col8='1' then ascii(Col8) - 48-- 0-1
When Col8=' ' then 28
When Col8 IS NULL then 29END * 900
+CASE When Col9 between 'a' and 'z' then ascii(lower(Col9)) - 95-- lowercaser letter -> 2-27
When Col9='0' or Col9='1' then ascii(Col9) - 48-- 0-1
When Col9=' ' then 28
When Col9 IS NULL then 29END * 27000
+CASE When Col10 between 'a' and 'z' then ascii(lower(Col10)) - 95-- lowercaser letter -> 2-27
When Col10='0' or Col10='1' then ascii(Col10) - 48-- 0-1
When Col10=' ' then 28
When Col10 IS NULL then 29END * 810000
AS res, count_Big(*) as Cnt
, Min(Col6) as ColA, Min(Col7) as ColB, Min(Col8) as ColC, Min(Col9) as ColD, Min(Col10) as ColE
From table1 with (nolock)
Group by
CASE When Col6 between 'a' and 'z' then ascii(lower(Col6)) - 95-- lowercaser letter -> 2-27
When Col6='0' or Col6='1' then ascii(Col6) - 48-- 0-1
When Col6=' ' then 28
When Col6 IS NULL then 29END
+CASE When Col7 between 'a' and 'z' then ascii(lower(Col7)) - 95-- lowercaser letter -> 2-27
When Col7='0' or Col7='1' then ascii(Col7) - 48-- 0-1
When Col7=' ' then 28
When Col7 IS NULL then 29END * 30
+CASE When Col8 between 'a' and 'z' then ascii(lower(Col8)) - 95-- lowercaser letter -> 2-27
When Col8='0' or Col8='1' then ascii(Col8) - 48-- 0-1
When Col8=' ' then 28
When Col8 IS NULL then 29END * 900
+CASE When Col9 between 'a' and 'z' then ascii(lower(Col9)) - 95-- lowercaser letter -> 2-27
When Col9='0' or Col9='1' then ascii(Col9) - 48-- 0-1
When Col9=' ' then 28
When Col9 IS NULL then 29END * 27000
+CASE When Col10 between 'a' and 'z' then ascii(lower(Col10)) - 95-- lowercaser letter -> 2-27
When Col10='0' or Col10='1' then ascii(Col10) - 48-- 0-1
When Col10=' ' then 28
When Col10 IS NULL then 29END * 810000
UNION ALL
... --(fill out to all 40 col's; 8 Selects of 5 cols each)
)
Group by Res
-- now separate & recollate the 5 columns
Select Res, sum(Cnt) From (
Select ColA as Res, sum(Cnt) as Cnt from #Stats with (nolock) Group by ColA
UNION ALL Select ColB as Res, sum(Cnt) as Cnt from #Stats with (nolock) Group by ColB
UNION ALL Select ColC as Res, sum(Cnt) as Cnt from #Stats with (nolock) Group by ColC
UNION ALL Select ColD as Res, sum(Cnt) as Cnt from #Stats with (nolock) Group by ColD
UNION ALL Select ColE as Res, sum(Cnt) as Cnt from #Stats with (nolock) Group by ColE
)
Group by Res
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply