Help with Counts

Question

Help with Counts

Viewing 15 posts - 76 through 90 (of 121 total)

You must be logged in to reply to this topic. Login to reply

bicky1980 SSCrazy Points: 2079 More actions · Answer 1

Hi Chris

I have run the query on a sample of 100,000 records, in my results there are no duplicate telephone numbers and no duplicate datakeys, which is great! (Ran fast too)

Just need to incorporate the other fields into the query

Thanks

bicky1980 SSCrazy Points: 2079 More actions · Answer 2

The Query does however, appear to exclude some records, which I cant explain.

I had 4 records with the same datakey and same landline, but these results were excluded from the final table?

Any ideas why?

Thanks

ChrisM@home SSC-Insane Points: 24260 More actions · Answer 3

bicky1980 (1/17/2012)
The Query does however, appear to exclude some records, which I cant explain.
I had 4 records with the same datakey and same landline, but these results were excluded from the final table?
Any ideas why?
Thanks

Run a select against your table, with a where clause like the following:

SELECT...

FROM ...

WHERE datakey = datakeyvalue OR landline = landlinevalue

[font="Arial"]^{Low-hanging fruit picker and defender of the moggies}[/font]

For better assistance in answering your questions, please read this[/url].

Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

Jared SSC Guru Points: 61793 More actions · Answer 4

The boxes and balls cannot be applied/used until we know the order of operations. Maybe I missed it in this long thread :), so I'm sorry if I did... but let's take landlines first. Say my data for datakey and landline is this:

0014, 1234567890

0014, 1234567891

0015, 1234567890

0015, 1234567891

How many unique landlines is this according to what you are wanting; 1 or 2? Again, I apologize if this has been answered already, but some of the posts are pretty long individually, so I want a summation. Also, when you tell me the number, can you explain exactly why. Please don't use the word "unique" as it is non-descriptive the way you have been using it. Say something like "this only counts as 1 because we first see that a datakey has more than one landline, but that only counts as 1 because it is the same data key. The second datakey has landlines that have been seen previously in the set, so they do not count at all."

Note the word "previously"... That is important here because if we discard all duplicates as part of the set, this would be 0.

Jared
CE - Microsoft

bicky1980 SSCrazy Points: 2079 More actions · Answer 5

The datakey & landline doesnt exist in my final table

I edited the query at the end to put the results into a new table

SELECT *

into lvl

FROM Level1

UNION ALL

SELECT * FROM Level2

ORDER BY datakey, telno

Was this correct?

ChrisM@home SSC-Insane Points: 24260 More actions · Answer 6

bicky1980 (1/17/2012)
The datakey & landline doesnt exist in my final table

That's going to happen - can you post the results of the query I posted last?

I edited the query at the end to put the results into a new table
SELECT *
into lvl
FROM Level1
UNION ALL
SELECT * FROM Level2
ORDER BY datakey, telno
Was this correct?

Personally I'd go for a temp table and you will probably need to index it for performance.

[font="Arial"]^{Low-hanging fruit picker and defender of the moggies}[/font]

For better assistance in answering your questions, please read this[/url].

Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

GSquared SSC Guru Points: 260824 More actions · Answer 7

bicky1980 (1/13/2012)
GSquared (1/13/2012)
The order of the rows in a dataset doesn't really matter to set-based SQL operations on it.
I just need a clarification on this statement, "Total number of populated Landline fields (Needs to have a unique datakey as well as be unique itself - If the same landline is anywhere else in the results its needs to be excluded from the figure)"
If a landline exists multiple times in the data, and has multiple datakey values, don't count it all, or count it only one time?
E.g.:
datakey landline pairs:
1 100
2 100
Would that count as 1 or 0? Clarification of "If the same landline is anywhere else in the results its needs to be excluded from the figure". Does that mean exclude it entirely, or exclude the duplicates?
If that should count as 1, then this seems to work:
IF OBJECT_ID(N'tempdb..#test2') IS NOT NULL
DROP TABLE #test2;
CREATE TABLE #test2
(indkey NVARCHAR(2),
datakey NVARCHAR(4),
landline NVARCHAR(11),
mobile NVARCHAR(11),
email NVARCHAR(20))
INSERT INTO #test2
VALUES ('1', '0001', '01234567890', '0712345679', '1@test.co.uk'),
('2', '0001', '01234567890', '', '1@test.co.uk'),
('3', '0002', '01234567890', '', '2@test.co.uk'),
('4', '0002', '01234567891', '', '2@test.co.uk'),
('5', '0002', '', '07123456789', ''),
('6', '0003', '01234567892', '07123456791', ''),
('7', '0004', '01234567893', '07123456792', ''),
('8', '0005', '01234567894', '07123456793', '2@test.co.uk'),
('9', '0008', '01234567895', '07123456793', '9@test.co.uk') ;
SELECT 'Total' AS CountType, COUNT(*) AS Qty
FROM #test2
UNION ALL
SELECT 'Datakeys', COUNT(DISTINCT datakey)
FROM #test2
UNION ALL
SELECT 'Landlines', COUNT(DISTINCT landline)
FROM #test2
WHERE landline > ''
AND datakey > ''
UNION ALL
SELECT 'Mobiles', COUNT(DISTINCT mobile)
FROM #test2
WHERE mobile > ''
AND datakey > ''
UNION ALL
SELECT 'Emails', COUNT(DISTINCT email)
FROM #test2
WHERE email > ''
AND datakey > '';
If 0, then change the Select to this:
SELECT 'Total' AS CountType, COUNT(*) AS Qty
FROM #test2
UNION ALL
SELECT 'Datakeys', COUNT(DISTINCT datakey)
FROM #test2
UNION ALL
SELECT 'Landlines',
COUNT(DISTINCT landline)
FROM (SELECT landline
FROM #test2
WHERE landline > ''
AND datakey > ''
GROUP BY landline
HAVING COUNT(*) = 1) AS Sub
UNION ALL
SELECT 'Mobiles',
COUNT(DISTINCT mobile)
FROM (SELECT mobile
FROM #test2
WHERE mobile > ''
AND datakey > ''
GROUP BY mobile
HAVING COUNT(*) = 1) AS Sub
UNION ALL
SELECT 'Emails',
COUNT(DISTINCT email)
FROM (SELECT email
FROM #test2
WHERE email > ''
AND datakey > ''
GROUP BY email
HAVING COUNT(*) = 1) AS Sub;
(All scripts tested in SQL 2008 R2 Dev Edition.)
I would only require this to be counted as 1 record only...From your solution for this the results would be:
Total9
Datakeys6
Landlines5
Mobiles4
Emails1
The results should be:
Total9
Datakeys6
Landlines6
Mobiles5
Emails3

The results from my first query are exactly what you list as the "should be" answer.

Re-ran the first version, copy-and-paste of results is:

CountTypeQty

Total9

Datakeys6

Landlines6

Mobiles5

Emails3

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

bicky1980 SSCrazy Points: 2079 More actions · Answer 8

Hello Chris

I cant post the results due to data protection 🙁 , which is going to be difficult to try and get you to see my issue, this was why I was using a small table of 'made up' records.

bicky1980 SSCrazy Points: 2079 More actions · Answer 9

How do I copy code from the forum to paste into Management studio, without having to refomat it myself everytime?

ChrisM@home SSC-Insane Points: 24260 More actions · Answer 10

How do I copy code from the forum to paste into Management studio, without having to refomat it myself everytime?

Paste it into a word document, then copy and paste again. I've got a word doc on my desktop specifically for this.

bicky1980 (1/17/2012)
Hello Chris
I cant post the results due to data protection 🙁 , which is going to be difficult to try and get you to see my issue, this was why I was using a small table of 'made up' records.

Change the phone numbers (probably the area codes alone will be enough), but be careful that you don't give different random numbers to dupes of the same real number!

When you get the opportunity, try GSquared's code.

[font="Arial"]^{Low-hanging fruit picker and defender of the moggies}[/font]

For better assistance in answering your questions, please read this[/url].

Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

bicky1980 SSCrazy Points: 2079 More actions · Answer 11

GSquared (1/17/2012)
bicky1980 (1/13/2012)
GSquared (1/13/2012)
The order of the rows in a dataset doesn't really matter to set-based SQL operations on it.
I just need a clarification on this statement, "Total number of populated Landline fields (Needs to have a unique datakey as well as be unique itself - If the same landline is anywhere else in the results its needs to be excluded from the figure)"
If a landline exists multiple times in the data, and has multiple datakey values, don't count it all, or count it only one time?
E.g.:
datakey landline pairs:
1 100
2 100
Would that count as 1 or 0? Clarification of "If the same landline is anywhere else in the results its needs to be excluded from the figure". Does that mean exclude it entirely, or exclude the duplicates?
If that should count as 1, then this seems to work:
IF OBJECT_ID(N'tempdb..#test2') IS NOT NULL
DROP TABLE #test2;
CREATE TABLE #test2
(indkey NVARCHAR(2),
datakey NVARCHAR(4),
landline NVARCHAR(11),
mobile NVARCHAR(11),
email NVARCHAR(20))
INSERT INTO #test2
VALUES ('1', '0001', '01234567890', '0712345679', '1@test.co.uk'),
('2', '0001', '01234567890', '', '1@test.co.uk'),
('3', '0002', '01234567890', '', '2@test.co.uk'),
('4', '0002', '01234567891', '', '2@test.co.uk'),
('5', '0002', '', '07123456789', ''),
('6', '0003', '01234567892', '07123456791', ''),
('7', '0004', '01234567893', '07123456792', ''),
('8', '0005', '01234567894', '07123456793', '2@test.co.uk'),
('9', '0008', '01234567895', '07123456793', '9@test.co.uk') ;
SELECT 'Total' AS CountType, COUNT(*) AS Qty
FROM #test2
UNION ALL
SELECT 'Datakeys', COUNT(DISTINCT datakey)
FROM #test2
UNION ALL
SELECT 'Landlines', COUNT(DISTINCT landline)
FROM #test2
WHERE landline > ''
AND datakey > ''
UNION ALL
SELECT 'Mobiles', COUNT(DISTINCT mobile)
FROM #test2
WHERE mobile > ''
AND datakey > ''
UNION ALL
SELECT 'Emails', COUNT(DISTINCT email)
FROM #test2
WHERE email > ''
AND datakey > '';
If 0, then change the Select to this:
SELECT 'Total' AS CountType, COUNT(*) AS Qty
FROM #test2
UNION ALL
SELECT 'Datakeys', COUNT(DISTINCT datakey)
FROM #test2
UNION ALL
SELECT 'Landlines',
COUNT(DISTINCT landline)
FROM (SELECT landline
FROM #test2
WHERE landline > ''
AND datakey > ''
GROUP BY landline
HAVING COUNT(*) = 1) AS Sub
UNION ALL
SELECT 'Mobiles',
COUNT(DISTINCT mobile)
FROM (SELECT mobile
FROM #test2
WHERE mobile > ''
AND datakey > ''
GROUP BY mobile
HAVING COUNT(*) = 1) AS Sub
UNION ALL
SELECT 'Emails',
COUNT(DISTINCT email)
FROM (SELECT email
FROM #test2
WHERE email > ''
AND datakey > ''
GROUP BY email
HAVING COUNT(*) = 1) AS Sub;
(All scripts tested in SQL 2008 R2 Dev Edition.)
I would only require this to be counted as 1 record only...From your solution for this the results would be:
Total9
Datakeys6
Landlines5
Mobiles4
Emails1
The results should be:
Total9
Datakeys6
Landlines6
Mobiles5
Emails3
The results from my first query are exactly what you list as the "should be" answer.
Re-ran the first version, copy-and-paste of results is:
CountTypeQty
Total9
Datakeys6
Landlines6
Mobiles5
Emails3

Hello GSquared

I have run your code on my sample of approx 100,000 records but its difficult to understand why this would work, I dont see where you code ensures that if I had 2 different landlines at the same datakey, these wouldn't be counted seperately (I ran the 1st code)

Jared SSC Guru Points: 61793 More actions · Answer 12

bicky1980 (1/17/2012)I have run your code on my sample of approx 100,000 records but its difficult to understand why this would work, I dont see where you code ensures that if I had 2 different landlines at the same datakey, these wouldn't be counted seperately (I ran the 1st code)

To confirm... You are saying that if the ONLY time 2 landlines appear is with the same datakey, that should only count as 1. Correct? To shorten the data for ease of understanding:

DataKey, LandLine

1, 100

1, 200

2, 100

2, 200

2, 300

How many landlines is this? 2? And if the line with the 300 was not there... it would be 1? I have to be honest that, having a background in statistics, I am really puzzled at the question that this is trying to answer. Any chance you could appease me by phrasing the business question as something like "How many landlines are in use by different datakeys?" OR "Collapsing across landlines and then datakeys, what is the count?" Something just doesn't make sense here as an interpretation of results. I am not saying that you are wrong or anything, just trying to understand because right now I can't answer a question that I don't understand.

I have done several similar analyses and have scripts in producation that are giving a similar output. If I can understand the question and thereby the business rules a bit clearer, I can simply adapt my current script and post it for you.

Jared
CE - Microsoft

TomThomson SSC Guru Points: 104768 More actions · Answer 13

GSquared (1/17/2012)
The results from my first query are exactly what you list as the "should be" answer.
Re-ran the first version, copy-and-paste of results is:
CountTypeQty
Total9
Datakeys6
Landlines6
Mobiles5
Emails3

Yes, but there's a problem Gus; that test case doesn't cover all the requirements. The requirement is to count not landlines, but a set of datakey-landline pairs such that each landline and each datakey occurs only once. The other counts have a similar requirement. So with this test dataset

TRUNCATE TABLE #test2

INSERT INTO #test2 VALUES

('1', '0001', '01234567890', '', '1@test.co.uk')

, ('2', '0001', '01234567891', '', '1@test.co.uk')

, ('3', '0001', '01234567892', '', '2@test.co.uk') ;

the required result is 3,1,1,0,1 but your code returns 3,1,3,0,2.

Actually there is still a lot of detail missing from the requirement, which is a pity because I suspect that if we had the exact requirements we could either come up with a solution or explain that it would be sensible to derive the answers using some tool other than T-SQL.

Tom

Jared SSC Guru Points: 61793 More actions · Answer 14

L' Eomot Inversé (1/17/2012)a set of datakey-landline pairs such that each landline and each datakey occurs only once

As far as I can tell, this is an impossible statement without saying WHICH comes first.. Eliminate duplicate landlines when one is found and then duplicate datakeys, or the other way around. It seems that the request is not set based, but order based. However, I still don't know what the order is...

Jared
CE - Microsoft

bicky1980 SSCrazy Points: 2079 More actions · Answer 15

SQLKnowItAll (1/17/2012)
L' Eomot Inversé (1/17/2012)a set of datakey-landline pairs such that each landline and each datakey occurs only once
As far as I can tell, this is an impossible statement without saying WHICH comes first.. Eliminate duplicate landlines when one is found and then duplicate datakeys, or the other way around. It seems that the request is not set based, but order based. However, I still don't know what the order is...

The required order is dedupe datakey first, followed by landline.