January 12, 2012 at 7:41 am
stephen99999 (1/12/2012)
in table format, bicky, please post what you are expecting your results to look like. Earlier you stated you wanted to see 9, 6, etc to be returned, and now you say it is not.Just confused of your requirements.
Total Total Unqiue Datakey Total Unique Landline Total Unique Mobile Total Unique Email
----- -------------------- -------------------- ------------------- -----------------
9 6 6 5 3
January 12, 2012 at 7:49 am
bicky1980 (1/12/2012)
stephen99999 (1/12/2012)
in table format, bicky, please post what you are expecting your results to look like. Earlier you stated you wanted to see 9, 6, etc to be returned, and now you say it is not.Just confused of your requirements.
Total Total Unqiue Datakey Total Unique Landline Total Unique Mobile Total Unique Email
----- -------------------- -------------------- ------------------- -----------------
9 6 6 5 3
Isn't that what my query posted earlier returns?
____________________________________________________
Deja View - The strange feeling that somewhere, sometime you've optimised this query before
How to get the best help on a forum
http://www.sqlservercentral.com/articles/Best+Practices/61537January 12, 2012 at 7:58 am
Mark-101232 (1/12/2012)
bicky1980 (1/12/2012)
stephen99999 (1/12/2012)
in table format, bicky, please post what you are expecting your results to look like. Earlier you stated you wanted to see 9, 6, etc to be returned, and now you say it is not.Just confused of your requirements.
Total Total Unqiue Datakey Total Unique Landline Total Unique Mobile Total Unique Email
----- -------------------- -------------------- ------------------- -----------------
9 6 6 5 3
Isn't that what my query posted earlier returns?
wondering the same. Being a Jr. dev myself, I really want to work this out. Now I am addicted to this post lol.
Stephen
January 12, 2012 at 9:04 am
stephen99999 (1/12/2012)
Mark-101232 (1/12/2012)
bicky1980 (1/12/2012)
stephen99999 (1/12/2012)
in table format, bicky, please post what you are expecting your results to look like. Earlier you stated you wanted to see 9, 6, etc to be returned, and now you say it is not.Just confused of your requirements.
Total Total Unqiue Datakey Total Unique Landline Total Unique Mobile Total Unique Email
----- -------------------- -------------------- ------------------- -----------------
9 6 6 5 3
Isn't that what my query posted earlier returns?
wondering the same. Being a Jr. dev myself, I really want to work this out. Now I am addicted to this post lol.
Stephen
Yes the results where correct, but dont see how the logic works, doesnt appear to group the results by datakey - If you could explain how your query ensures the datakey and variable (landline, mobile or email) are unique then thats great, but I dont see it...
January 12, 2012 at 10:36 am
I see your point now. In all actuality, your prior code:
select count(distinct datakey) as Unique_Dataset
from (select *, row_number() over(partition by datakey order by case when Datakey!='' then 0 else 1 end) as pref
from test where Datakey!='') z
where pref = 1
select count(distinct landline) as Unique_landlines
from (select *, row_number() over(partition by datakey order by case when landline!='' then 0 else 1 end) as pref
from test where landline!='') z
where pref = 1
select count(distinct mobile) as Unique_Mobiles
from (select *, row_number() over(partition by datakey order by case when mobile!='' then 0 else 1 end) as pref
from test where mobile!='') z
where pref = 1
select count(distinct email) as Unique_Emails
from (select *, row_number() over(partition by datakey order by case when email!='' then 0 else 1 end) as pref
from test where email!='') z
where pref = 1
does what you are looking for. To put it all into one statement just set each select statement to an INT variable and then select them, ie., SELECT @a Total, @b-2 UniqueDataKeys, @C UniqueLandLines, etc
My question is, how did you figure out to do it this way? I have not used row_number(), over, partition by very much and really don't understand it. Can anyone elaborate? Haven't found a good blog that explains this in "idiot terms."
January 13, 2012 at 2:44 am
stephen99999 (1/12/2012)
I see your point now. In all actuality, your prior code:
select count(distinct datakey) as Unique_Dataset
from (select *, row_number() over(partition by datakey order by case when Datakey!='' then 0 else 1 end) as pref
from test where Datakey!='') z
where pref = 1
select count(distinct landline) as Unique_landlines
from (select *, row_number() over(partition by datakey order by case when landline!='' then 0 else 1 end) as pref
from test where landline!='') z
where pref = 1
select count(distinct mobile) as Unique_Mobiles
from (select *, row_number() over(partition by datakey order by case when mobile!='' then 0 else 1 end) as pref
from test where mobile!='') z
where pref = 1
select count(distinct email) as Unique_Emails
from (select *, row_number() over(partition by datakey order by case when email!='' then 0 else 1 end) as pref
from test where email!='') z
where pref = 1
does what you are looking for. To put it all into one statement just set each select statement to an INT variable and then select them, ie., SELECT @a Total, @b-2 UniqueDataKeys, @C UniqueLandLines, etc
My question is, how did you figure out to do it this way? I have not used row_number(), over, partition by very much and really don't understand it. Can anyone elaborate? Haven't found a good blog that explains this in "idiot terms."
I'm not convinced these queries work at all even though they apparently give desired results.
This one
select count(distinct landline) as Unique_landlines
from (select *, row_number() over(partition by datakey order by case when landline!='' then 0 else 1 end) as pref
from test where landline!='') z
where pref = 1
appears to pick a random landline from each datakey group based on an indeterminate ordering and then counts how many distinct items are found.
Adding unique ordering changes the results
select count(distinct landline) as Unique_landlines
from (select *, row_number() over(partition by datakey order by case when landline!='' then 0 else 1 end, INDKEY ASC) as pref
from test where landline!='') z
where pref = 1
select count(distinct landline) as Unique_landlines
from (select *, row_number() over(partition by datakey order by case when landline!='' then 0 else 1 end, INDKEY DESC) as pref
from test where landline!='') z
where pref = 1
____________________________________________________
Deja View - The strange feeling that somewhere, sometime you've optimised this query before
How to get the best help on a forum
http://www.sqlservercentral.com/articles/Best+Practices/61537January 13, 2012 at 5:15 am
Mark-101232 (1/13/2012)
stephen99999 (1/12/2012)
I see your point now. In all actuality, your prior code:
select count(distinct datakey) as Unique_Dataset
from (select *, row_number() over(partition by datakey order by case when Datakey!='' then 0 else 1 end) as pref
from test where Datakey!='') z
where pref = 1
select count(distinct landline) as Unique_landlines
from (select *, row_number() over(partition by datakey order by case when landline!='' then 0 else 1 end) as pref
from test where landline!='') z
where pref = 1
select count(distinct mobile) as Unique_Mobiles
from (select *, row_number() over(partition by datakey order by case when mobile!='' then 0 else 1 end) as pref
from test where mobile!='') z
where pref = 1
select count(distinct email) as Unique_Emails
from (select *, row_number() over(partition by datakey order by case when email!='' then 0 else 1 end) as pref
from test where email!='') z
where pref = 1
does what you are looking for. To put it all into one statement just set each select statement to an INT variable and then select them, ie., SELECT @a Total, @b-2 UniqueDataKeys, @C UniqueLandLines, etc
My question is, how did you figure out to do it this way? I have not used row_number(), over, partition by very much and really don't understand it. Can anyone elaborate? Haven't found a good blog that explains this in "idiot terms."
I'm not convinced these queries work at all even though they apparently give desired results.
This one
select count(distinct landline) as Unique_landlines
from (select *, row_number() over(partition by datakey order by case when landline!='' then 0 else 1 end) as pref
from test where landline!='') z
where pref = 1
appears to pick a random landline from each datakey group based on an indeterminate ordering and then counts how many distinct items are found.
Adding unique ordering changes the results
select count(distinct landline) as Unique_landlines
from (select *, row_number() over(partition by datakey order by case when landline!='' then 0 else 1 end, INDKEY ASC) as pref
from test where landline!='') z
where pref = 1
select count(distinct landline) as Unique_landlines
from (select *, row_number() over(partition by datakey order by case when landline!='' then 0 else 1 end, INDKEY DESC) as pref
from test where landline!='') z
where pref = 1
I was thinking the case would prioritise any populaed landline field over the empty ones and then perform a distinct count of them. I'm not sure but with the order by indkey, does that change the order of the landline priority?
January 13, 2012 at 6:25 am
<snip>
I'm not convinced these queries work at all even though they apparently give desired results.
This one
select count(distinct landline) as Unique_landlines
from (select *, row_number() over(partition by datakey order by case when landline!='' then 0 else 1 end) as pref
from test where landline!='') z
where pref = 1
appears to pick a random landline from each datakey group based on an indeterminate ordering and then counts how many distinct items are found.
Adding unique ordering changes the results
select count(distinct landline) as Unique_landlines
from (select *, row_number() over(partition by datakey order by case when landline!='' then 0 else 1 end, INDKEY ASC) as pref
from test where landline!='') z
where pref = 1
select count(distinct landline) as Unique_landlines
from (select *, row_number() over(partition by datakey order by case when landline!='' then 0 else 1 end, INDKEY DESC) as pref
from test where landline!='') z
where pref = 1
I was thinking the case would prioritise any populaed landline field over the empty ones and then perform a distinct count of them. I'm not sure but with the order by indkey, does that change the order of the landline priority?
The CASE expression does prioritise the populated ones before the unpopuated, but you're excluding the unpopulated ones anyway in the WHERE clause. Effectively
select *, row_number() over(partition by datakey order by case when landline!='' then 0 else 1 end) as pref
from test where landline!=''
is
select *, row_number() over(partition by datakey order by 0) as pref
from test where landline!=''
In other words, the ordering is undefined. But I still don't understand why you want to take just one landline value from each datakey group.
____________________________________________________
Deja View - The strange feeling that somewhere, sometime you've optimised this query before
How to get the best help on a forum
http://www.sqlservercentral.com/articles/Best+Practices/61537January 13, 2012 at 6:45 am
Mark-101232 (1/13/2012)
<snip>
I'm not convinced these queries work at all even though they apparently give desired results.
This one
select count(distinct landline) as Unique_landlines
from (select *, row_number() over(partition by datakey order by case when landline!='' then 0 else 1 end) as pref
from test where landline!='') z
where pref = 1
appears to pick a random landline from each datakey group based on an indeterminate ordering and then counts how many distinct items are found.
Adding unique ordering changes the results
select count(distinct landline) as Unique_landlines
from (select *, row_number() over(partition by datakey order by case when landline!='' then 0 else 1 end, INDKEY ASC) as pref
from test where landline!='') z
where pref = 1
select count(distinct landline) as Unique_landlines
from (select *, row_number() over(partition by datakey order by case when landline!='' then 0 else 1 end, INDKEY DESC) as pref
from test where landline!='') z
where pref = 1
I was thinking the case would prioritise any populaed landline field over the empty ones and then perform a distinct count of them. I'm not sure but with the order by indkey, does that change the order of the landline priority?
The CASE expression does prioritise the populated ones before the unpopuated, but you're excluding the unpopulated ones anyway in the WHERE clause. Effectively
select *, row_number() over(partition by datakey order by case when landline!='' then 0 else 1 end) as pref
from test where landline!=''
is
select *, row_number() over(partition by datakey order by 0) as pref
from test where landline!=''
In other words, the ordering is undefined. But I still don't understand why you want to take just one landline value from each datakey group.
I want to take one landline value as I can only count one landline against each unique dataset
January 13, 2012 at 6:52 am
I think the best way to do this is to just test on various sample data. Really change it up, and see if you can "break" the code. If not, then I guess it works (even if we don't fully understand it).
Stephen
January 13, 2012 at 7:02 am
Sorry, I still don't get it. Why only one landline value per datakey, surely you need to consider all landline values to count the number of distinct ones.
Here's a simple example, what result are you expecting?
truncate table test
insert into test
values ('1','0001','1111111111','','')
insert into test
values ('2','0001','9999999999','','')
insert into test
values ('3','0002','1111111111','','')
select count(distinct landline) as Unique_landlines
from (select *, row_number() over(partition by datakey order by case when landline!='' then 0 else 1 end,indkey asc) as pref
from test where landline!='') z
where pref = 1
select count(distinct landline) as Unique_landlines
from (select *, row_number() over(partition by datakey order by case when landline!='' then 0 else 1 end,indkey desc) as pref
from test where landline!='') z
where pref = 1
____________________________________________________
Deja View - The strange feeling that somewhere, sometime you've optimised this query before
How to get the best help on a forum
http://www.sqlservercentral.com/articles/Best+Practices/61537January 13, 2012 at 7:29 am
OK, Yes you are right my query does not work correctly...
I have edited the data slightly using the following:
create table test2(
indkey nvarchar(2),
datakey nvarchar(4),
landline nvarchar(11),
mobile nvarchar(11),
email nvarchar(20))
insert into test2
values ('1','0001','01234567890','0712345679','1@test.co.uk')
insert into test2
values ('2','0001','01234567890','','1@test.co.uk')
insert into test2
values ('3','0002','01234567890','','2@test.co.uk')
insert into test2
values ('4','0002','01234567891','','2@test.co.uk')
insert into test2
values ('5','0002','','07123456789','')
insert into test2
values ('6','0003','01234567892','07123456791','')
insert into test2
values ('7','0004','01234567893','07123456792','')
insert into test2
values ('8','0005','01234567894','07123456793','2@test.co.uk')
insert into test2
values ('9','0008','01234567895','07123456793','9@test.co.uk')
Now my results are different, what the row_number is doing is selecting a preference that populated landline values are prioritised over empty values. 🙁 What I need the query to do is preference a populated value which isnt returned for another datakey...Probablu not making sense so I will try to demonstrate:
Run the following query on the test2 data:
select *, row_number() over(partition by datakey order by case when landline!=''
then 0 else 1 end) as pref
from test2
As you can see the preference has been assigned to populated value over empty values, but where there are 2 landline values, the query has just selected them in the default order of the file. So using these results we would be using the following:
select * from (
select *, row_number() over(partition by datakey order by case when landline!=''
then 0 else 1 end) as pref
from test2) z where pref=1
As you can see this has left us a duplicate landline value for datakey 0001 & 0002, when really I needed the query to see that the number 01234567890 has already been used for datakey 0001 so then use the next populated landline value (as long as that also does not exist in any previous records)
Does this make sense?
January 13, 2012 at 8:56 am
Hi Bicky
Some very experienced and talented folks are having real difficulty figuring out exactly what you want here, and it looks to me that you might still be figuring it out yourself. Can I suggest you take a step back and review your requirement, then return with a more complete description and consolidate that with a representative sample data set?
I suspect the solving query won't be complicated at all when it's done. It's not the answer which is causing a problem for folks, it's the question.
Cheers
ChrisM
For better assistance in answering your questions, please read this[/url].
Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]
January 13, 2012 at 11:02 am
stephen99999 (1/13/2012)
I think the best way to do this is to just test on various sample data. Really change it up, and see if you can "break" the code. If not, then I guess it works (even if we don't fully understand it).Stephen
That's a pretty dangerous approach - trusting code you don't understand to be correct after testing on a set of small samples. If this code doesn't work, how will you ever know it doesn't work, if the data combination that breaks it is so complex that it can only arise with a sample of at least 5000 rows?
Tom
January 13, 2012 at 11:05 am
ChrisM@home (1/13/2012)
Hi BickySome very experienced and talented folks are having real difficulty figuring out exactly what you want here, and it looks to me that you might still be figuring it out yourself. Can I suggest you take a step back and review your requirement, then return with a more complete description and consolidate that with a representative sample data set?
I suspect the solving query won't be complicated at all when it's done. It's not the answer which is causing a problem for folks, it's the question.
Cheers
ChrisM
I'm am not trying to figure it out myself I know exactly what I want, and what the constraints are.
I willl try to clarify exactly what I need:
I have a table which contains 40 million records. I need to provided population statistics on the following (I will put in brackets what each of the constraints are)
Total number of records (No Constraints)
Total number of Unique Datakey (Needs to be unique)
Total number of populated Landline fields (Needs to have a unique datakey as well as be unique itself - If the same landline is anywhere else in the results its needs to be excluded from the figure)
Total number of populated Mobiles (Needs to have a unique datakey as well as be unique itself - If the same mobile is anywhere else in the results its needs to be excluded from the figure)
Total number of populated Emails (Needs to have a unique datakey as well as be unique itself - If the same email is anywhere else in the results its needs to be excluded from the figure)
Now this makes sense to me, if people need any more clarification, please ask, and I will try to clarify my requirements more.
Please use the code creating test2 previous created to test any queries
Thanks All
Viewing 15 posts - 31 through 45 (of 120 total)
You must be logged in to reply to this topic. Login to reply