January 17, 2012 at 5:21 am
Hi Chris
I have run the query on a sample of 100,000 records, in my results there are no duplicate telephone numbers and no duplicate datakeys, which is great! (Ran fast too)
Just need to incorporate the other fields into the query
Thanks
January 17, 2012 at 5:37 am
The Query does however, appear to exclude some records, which I cant explain.
I had 4 records with the same datakey and same landline, but these results were excluded from the final table?
Any ideas why?
Thanks
January 17, 2012 at 5:42 am
bicky1980 (1/17/2012)
The Query does however, appear to exclude some records, which I cant explain.I had 4 records with the same datakey and same landline, but these results were excluded from the final table?
Any ideas why?
Thanks
Run a select against your table, with a where clause like the following:
SELECT...
FROM ...
WHERE datakey = datakeyvalue OR landline = landlinevalue
For better assistance in answering your questions, please read this[/url].
Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]
January 17, 2012 at 5:53 am
The boxes and balls cannot be applied/used until we know the order of operations. Maybe I missed it in this long thread :), so I'm sorry if I did... but let's take landlines first. Say my data for datakey and landline is this:
0014, 1234567890
0014, 1234567891
0015, 1234567890
0015, 1234567891
How many unique landlines is this according to what you are wanting; 1 or 2? Again, I apologize if this has been answered already, but some of the posts are pretty long individually, so I want a summation. Also, when you tell me the number, can you explain exactly why. Please don't use the word "unique" as it is non-descriptive the way you have been using it. Say something like "this only counts as 1 because we first see that a datakey has more than one landline, but that only counts as 1 because it is the same data key. The second datakey has landlines that have been seen previously in the set, so they do not count at all."
Note the word "previously"... That is important here because if we discard all duplicates as part of the set, this would be 0.
Jared
CE - Microsoft
January 17, 2012 at 5:54 am
The datakey & landline doesnt exist in my final table
I edited the query at the end to put the results into a new table
SELECT *
into lvl
FROM Level1
UNION ALL
SELECT * FROM Level2
ORDER BY datakey, telno
Was this correct?
January 17, 2012 at 6:09 am
bicky1980 (1/17/2012)
The datakey & landline doesnt exist in my final table
That's going to happen - can you post the results of the query I posted last?
I edited the query at the end to put the results into a new table
SELECT *
into lvl
FROM Level1
UNION ALL
SELECT * FROM Level2
ORDER BY datakey, telno
Was this correct?
Personally I'd go for a temp table and you will probably need to index it for performance.
For better assistance in answering your questions, please read this[/url].
Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]
January 17, 2012 at 6:29 am
bicky1980 (1/13/2012)
GSquared (1/13/2012)
The order of the rows in a dataset doesn't really matter to set-based SQL operations on it.I just need a clarification on this statement, "Total number of populated Landline fields (Needs to have a unique datakey as well as be unique itself - If the same landline is anywhere else in the results its needs to be excluded from the figure)"
If a landline exists multiple times in the data, and has multiple datakey values, don't count it all, or count it only one time?
E.g.:
datakey landline pairs:
1 100
2 100
Would that count as 1 or 0? Clarification of "If the same landline is anywhere else in the results its needs to be excluded from the figure". Does that mean exclude it entirely, or exclude the duplicates?
If that should count as 1, then this seems to work:
IF OBJECT_ID(N'tempdb..#test2') IS NOT NULL
DROP TABLE #test2;
CREATE TABLE #test2
(indkey NVARCHAR(2),
datakey NVARCHAR(4),
landline NVARCHAR(11),
mobile NVARCHAR(11),
email NVARCHAR(20))
INSERT INTO #test2
VALUES ('1', '0001', '01234567890', '0712345679', '1@test.co.uk'),
('2', '0001', '01234567890', '', '1@test.co.uk'),
('3', '0002', '01234567890', '', '2@test.co.uk'),
('4', '0002', '01234567891', '', '2@test.co.uk'),
('5', '0002', '', '07123456789', ''),
('6', '0003', '01234567892', '07123456791', ''),
('7', '0004', '01234567893', '07123456792', ''),
('8', '0005', '01234567894', '07123456793', '2@test.co.uk'),
('9', '0008', '01234567895', '07123456793', '9@test.co.uk') ;
SELECT 'Total' AS CountType, COUNT(*) AS Qty
FROM #test2
UNION ALL
SELECT 'Datakeys', COUNT(DISTINCT datakey)
FROM #test2
UNION ALL
SELECT 'Landlines', COUNT(DISTINCT landline)
FROM #test2
WHERE landline > ''
AND datakey > ''
UNION ALL
SELECT 'Mobiles', COUNT(DISTINCT mobile)
FROM #test2
WHERE mobile > ''
AND datakey > ''
UNION ALL
SELECT 'Emails', COUNT(DISTINCT email)
FROM #test2
WHERE email > ''
AND datakey > '';
If 0, then change the Select to this:
SELECT 'Total' AS CountType, COUNT(*) AS Qty
FROM #test2
UNION ALL
SELECT 'Datakeys', COUNT(DISTINCT datakey)
FROM #test2
UNION ALL
SELECT 'Landlines',
COUNT(DISTINCT landline)
FROM (SELECT landline
FROM #test2
WHERE landline > ''
AND datakey > ''
GROUP BY landline
HAVING COUNT(*) = 1) AS Sub
UNION ALL
SELECT 'Mobiles',
COUNT(DISTINCT mobile)
FROM (SELECT mobile
FROM #test2
WHERE mobile > ''
AND datakey > ''
GROUP BY mobile
HAVING COUNT(*) = 1) AS Sub
UNION ALL
SELECT 'Emails',
COUNT(DISTINCT email)
FROM (SELECT email
FROM #test2
WHERE email > ''
AND datakey > ''
GROUP BY email
HAVING COUNT(*) = 1) AS Sub;
(All scripts tested in SQL 2008 R2 Dev Edition.)
I would only require this to be counted as 1 record only...From your solution for this the results would be:
Total9
Datakeys6
Landlines5
Mobiles4
Emails1
The results should be:
Total9
Datakeys6
Landlines6
Mobiles5
Emails3
The results from my first query are exactly what you list as the "should be" answer.
Re-ran the first version, copy-and-paste of results is:
CountTypeQty
Total9
Datakeys6
Landlines6
Mobiles5
Emails3
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
January 17, 2012 at 6:32 am
Hello Chris
I cant post the results due to data protection 🙁 , which is going to be difficult to try and get you to see my issue, this was why I was using a small table of 'made up' records.
January 17, 2012 at 6:36 am
How do I copy code from the forum to paste into Management studio, without having to refomat it myself everytime?
January 17, 2012 at 7:31 am
How do I copy code from the forum to paste into Management studio, without having to refomat it myself everytime?
Paste it into a word document, then copy and paste again. I've got a word doc on my desktop specifically for this.
bicky1980 (1/17/2012)
Hello ChrisI cant post the results due to data protection 🙁 , which is going to be difficult to try and get you to see my issue, this was why I was using a small table of 'made up' records.
Change the phone numbers (probably the area codes alone will be enough), but be careful that you don't give different random numbers to dupes of the same real number!
When you get the opportunity, try GSquared's code.
For better assistance in answering your questions, please read this[/url].
Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]
January 17, 2012 at 7:51 am
GSquared (1/17/2012)
bicky1980 (1/13/2012)
GSquared (1/13/2012)
The order of the rows in a dataset doesn't really matter to set-based SQL operations on it.I just need a clarification on this statement, "Total number of populated Landline fields (Needs to have a unique datakey as well as be unique itself - If the same landline is anywhere else in the results its needs to be excluded from the figure)"
If a landline exists multiple times in the data, and has multiple datakey values, don't count it all, or count it only one time?
E.g.:
datakey landline pairs:
1 100
2 100
Would that count as 1 or 0? Clarification of "If the same landline is anywhere else in the results its needs to be excluded from the figure". Does that mean exclude it entirely, or exclude the duplicates?
If that should count as 1, then this seems to work:
IF OBJECT_ID(N'tempdb..#test2') IS NOT NULL
DROP TABLE #test2;
CREATE TABLE #test2
(indkey NVARCHAR(2),
datakey NVARCHAR(4),
landline NVARCHAR(11),
mobile NVARCHAR(11),
email NVARCHAR(20))
INSERT INTO #test2
VALUES ('1', '0001', '01234567890', '0712345679', '1@test.co.uk'),
('2', '0001', '01234567890', '', '1@test.co.uk'),
('3', '0002', '01234567890', '', '2@test.co.uk'),
('4', '0002', '01234567891', '', '2@test.co.uk'),
('5', '0002', '', '07123456789', ''),
('6', '0003', '01234567892', '07123456791', ''),
('7', '0004', '01234567893', '07123456792', ''),
('8', '0005', '01234567894', '07123456793', '2@test.co.uk'),
('9', '0008', '01234567895', '07123456793', '9@test.co.uk') ;
SELECT 'Total' AS CountType, COUNT(*) AS Qty
FROM #test2
UNION ALL
SELECT 'Datakeys', COUNT(DISTINCT datakey)
FROM #test2
UNION ALL
SELECT 'Landlines', COUNT(DISTINCT landline)
FROM #test2
WHERE landline > ''
AND datakey > ''
UNION ALL
SELECT 'Mobiles', COUNT(DISTINCT mobile)
FROM #test2
WHERE mobile > ''
AND datakey > ''
UNION ALL
SELECT 'Emails', COUNT(DISTINCT email)
FROM #test2
WHERE email > ''
AND datakey > '';
If 0, then change the Select to this:
SELECT 'Total' AS CountType, COUNT(*) AS Qty
FROM #test2
UNION ALL
SELECT 'Datakeys', COUNT(DISTINCT datakey)
FROM #test2
UNION ALL
SELECT 'Landlines',
COUNT(DISTINCT landline)
FROM (SELECT landline
FROM #test2
WHERE landline > ''
AND datakey > ''
GROUP BY landline
HAVING COUNT(*) = 1) AS Sub
UNION ALL
SELECT 'Mobiles',
COUNT(DISTINCT mobile)
FROM (SELECT mobile
FROM #test2
WHERE mobile > ''
AND datakey > ''
GROUP BY mobile
HAVING COUNT(*) = 1) AS Sub
UNION ALL
SELECT 'Emails',
COUNT(DISTINCT email)
FROM (SELECT email
FROM #test2
WHERE email > ''
AND datakey > ''
GROUP BY email
HAVING COUNT(*) = 1) AS Sub;
(All scripts tested in SQL 2008 R2 Dev Edition.)
I would only require this to be counted as 1 record only...From your solution for this the results would be:
Total9
Datakeys6
Landlines5
Mobiles4
Emails1
The results should be:
Total9
Datakeys6
Landlines6
Mobiles5
Emails3
The results from my first query are exactly what you list as the "should be" answer.
Re-ran the first version, copy-and-paste of results is:
CountTypeQty
Total9
Datakeys6
Landlines6
Mobiles5
Emails3
Hello GSquared
I have run your code on my sample of approx 100,000 records but its difficult to understand why this would work, I dont see where you code ensures that if I had 2 different landlines at the same datakey, these wouldn't be counted seperately (I ran the 1st code)
January 17, 2012 at 8:11 am
bicky1980 (1/17/2012)I have run your code on my sample of approx 100,000 records but its difficult to understand why this would work, I dont see where you code ensures that if I had 2 different landlines at the same datakey, these wouldn't be counted seperately (I ran the 1st code)
To confirm... You are saying that if the ONLY time 2 landlines appear is with the same datakey, that should only count as 1. Correct? To shorten the data for ease of understanding:
DataKey, LandLine
1, 100
1, 200
2, 100
2, 200
2, 300
How many landlines is this? 2? And if the line with the 300 was not there... it would be 1? I have to be honest that, having a background in statistics, I am really puzzled at the question that this is trying to answer. Any chance you could appease me by phrasing the business question as something like "How many landlines are in use by different datakeys?" OR "Collapsing across landlines and then datakeys, what is the count?" Something just doesn't make sense here as an interpretation of results. I am not saying that you are wrong or anything, just trying to understand because right now I can't answer a question that I don't understand.
I have done several similar analyses and have scripts in producation that are giving a similar output. If I can understand the question and thereby the business rules a bit clearer, I can simply adapt my current script and post it for you.
Jared
CE - Microsoft
January 17, 2012 at 10:33 am
GSquared (1/17/2012)
The results from my first query are exactly what you list as the "should be" answer.
Re-ran the first version, copy-and-paste of results is:
CountTypeQty
Total9
Datakeys6
Landlines6
Mobiles5
Emails3
Yes, but there's a problem Gus; that test case doesn't cover all the requirements. The requirement is to count not landlines, but a set of datakey-landline pairs such that each landline and each datakey occurs only once. The other counts have a similar requirement. So with this test dataset
TRUNCATE TABLE #test2
INSERT INTO #test2 VALUES
('1', '0001', '01234567890', '', '1@test.co.uk')
, ('2', '0001', '01234567891', '', '1@test.co.uk')
, ('3', '0001', '01234567892', '', '2@test.co.uk') ;
the required result is 3,1,1,0,1 but your code returns 3,1,3,0,2.
Actually there is still a lot of detail missing from the requirement, which is a pity because I suspect that if we had the exact requirements we could either come up with a solution or explain that it would be sensible to derive the answers using some tool other than T-SQL.
Tom
January 17, 2012 at 10:40 am
L' Eomot Inversé (1/17/2012)a set of datakey-landline pairs such that each landline and each datakey occurs only once
As far as I can tell, this is an impossible statement without saying WHICH comes first.. Eliminate duplicate landlines when one is found and then duplicate datakeys, or the other way around. It seems that the request is not set based, but order based. However, I still don't know what the order is...
Jared
CE - Microsoft
January 17, 2012 at 11:17 am
SQLKnowItAll (1/17/2012)
L' Eomot Inversé (1/17/2012)a set of datakey-landline pairs such that each landline and each datakey occurs only once
As far as I can tell, this is an impossible statement without saying WHICH comes first.. Eliminate duplicate landlines when one is found and then duplicate datakeys, or the other way around. It seems that the request is not set based, but order based. However, I still don't know what the order is...
The required order is dedupe datakey first, followed by landline.
Viewing 15 posts - 76 through 90 (of 120 total)
You must be logged in to reply to this topic. Login to reply