April 8, 2010 at 4:45 am
Hello everyone,
I have a challenge where I need to count the number of rows that have valid data in them across a number of columns. Valid data in this instance is defined as not being blank for character columns, not being 0 for numeric and being more recent than the start of the century for dates.
At the moment I am using something like the second method in the following example:
IF OBJECT_ID('tempdb..#test_data_table','U') IS NOT NULL DROP TABLE #test_data_table
IF OBJECT_ID('tempdb..#test_count_table','U') IS NOT NULL DROP TABLE #test_count_table
SELECT 'Data' C1,'Data' C2,'Data' C3,1 C4 INTO #TEST_DATA_TABLE UNION ALL
SELECT 'Data','Data','Data',1 UNION ALL
SELECT 'Data','','Data',0 UNION ALL
SELECT 'Data','Data','',1 UNION ALL
SELECT '','Data','Data',0 UNION ALL
SELECT '','','Data',1 UNION ALL
SELECT '','Data','',1 UNION ALL
SELECT '','','',0
--One way (creates a result table)
SELECT
SUM(CASE WHEN C1!='' THEN 1 ELSE 0 END) C1Count,
SUM(CASE WHEN C2!='' THEN 1 ELSE 0 END) C2Count,
SUM(CASE WHEN C3!='' THEN 1 ELSE 0 END) C3Count,
SUM(CASE WHEN C4!=0 THEN 1 ELSE 0 END) C4Count
INTO #test_count_table
FROM #test_data_table
SELECT 'C1Count',C1Count
FROM #test_count_table
UNION
SELECT 'C2Count',C2Count
FROM #test_count_table
UNION
SELECT 'C3Count',C3Count
FROM #test_count_table
UNION
SELECT 'C4Count',C4Count
FROM #test_count_table
--Another way (direct)
SELECT 'C1Count',COUNT(C1)
FROM #test_data_table
WHERE C1!=''
UNION
SELECT 'C2Count',COUNT(C2)
FROM #test_data_table
WHERE C2!=''
UNION
SELECT 'C3Count',COUNT(C3)
FROM #test_data_table
WHERE C3!=''
UNION
SELECT 'C4Count',COUNT(C4)
FROM #test_data_table
WHERE C4!=0
What I am trying to find out is if there is a more efficient way to do this as while I'm currently testing on some small tables, I will have to run this on some very large ones too (10 million+ rows). I want to get away from using dynamic SQL if possible and would really like a solution where I can make it as set-based as possible.
The results need to be in the form presented by either of the above methods (column names as row labels and counts in a column)
Any suggestions?
Thanks in a advance,
Andrew
April 8, 2010 at 9:32 am
I think you have the best way to do this.
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
April 8, 2010 at 9:44 am
I feared this may be the case.
:satisfied:
Thanks for the help.
April 8, 2010 at 9:50 am
I think the first method is better then the second with the unions because you are only scanning the source table once. If you have to cross tab/pivot it, check the articles on that in my signature.
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
April 8, 2010 at 10:18 am
Thanks again.
I had already read most of those excellent articles, but forgotten that I had :hehe:. I've now re-read them fully.
A.
April 8, 2010 at 9:52 pm
As a side bar, I'm not sure why you want to avoid dynamic SQL for this... it's not slow when used for batch processess... just for bazillions of single row GUI hits. Since it's not public facing, it's not likely that you need to worry about SQL Injection but preventing it isn't that hard, either.
Written correctly, dynamic SQL isn't difficult to troubleshoot, either.
--Jeff Moden
Change is inevitable... Change for the better is not.
April 9, 2010 at 1:10 am
SELECT U.Name,
U.Value
FROM (
SELECT C1Count = COUNT(ASCII(C1)),
C2Count = COUNT(ASCII(C2)),
C3Count = COUNT(ASCII(C3)),
C4Count = COUNT(NULLIF(C4, 0))
FROM #test_data_table
) S
UNPIVOT (
Value
FOR Name
IN (C1Count, C2Count, C3Count, C4Count)
) U;
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply