Help with Counts

Question

Help with Counts

Viewing 15 posts - 31 through 45 (of 120 total)

You must be logged in to reply to this topic. Login to reply

bicky1980 SSCrazy Points: 2079 More actions · Answer 1

stephen99999 (1/12/2012)
in table format, bicky, please post what you are expecting your results to look like. Earlier you stated you wanted to see 9, 6, etc to be returned, and now you say it is not.
Just confused of your requirements.

Total Total Unqiue Datakey Total Unique Landline Total Unique Mobile Total Unique Email

----- -------------------- -------------------- ------------------- -----------------

9 6 6 5 3

Mark Cowne One Orange Chip Points: 26969 More actions · Answer 2

bicky1980 (1/12/2012)
stephen99999 (1/12/2012)
in table format, bicky, please post what you are expecting your results to look like. Earlier you stated you wanted to see 9, 6, etc to be returned, and now you say it is not.
Just confused of your requirements.
Total Total Unqiue Datakey Total Unique Landline Total Unique Mobile Total Unique Email
----- -------------------- -------------------- ------------------- -----------------
9 6 6 5 3

Isn't that what my query posted earlier returns?

____________________________________________________

Deja View - The strange feeling that somewhere, sometime you've optimised this query before

How to get the best help on a forum

http://www.sqlservercentral.com/articles/Best+Practices/61537

stephen99999 Ten Centuries Points: 1197 More actions · Answer 3

Mark-101232 (1/12/2012)
bicky1980 (1/12/2012)
stephen99999 (1/12/2012)
in table format, bicky, please post what you are expecting your results to look like. Earlier you stated you wanted to see 9, 6, etc to be returned, and now you say it is not.
Just confused of your requirements.
Total Total Unqiue Datakey Total Unique Landline Total Unique Mobile Total Unique Email
----- -------------------- -------------------- ------------------- -----------------
9 6 6 5 3
Isn't that what my query posted earlier returns?

wondering the same. Being a Jr. dev myself, I really want to work this out. Now I am addicted to this post lol.

Stephen

bicky1980 SSCrazy Points: 2079 More actions · Answer 4

stephen99999 (1/12/2012)
Mark-101232 (1/12/2012)
bicky1980 (1/12/2012)
stephen99999 (1/12/2012)
in table format, bicky, please post what you are expecting your results to look like. Earlier you stated you wanted to see 9, 6, etc to be returned, and now you say it is not.
Just confused of your requirements.
Total Total Unqiue Datakey Total Unique Landline Total Unique Mobile Total Unique Email
----- -------------------- -------------------- ------------------- -----------------
9 6 6 5 3
Isn't that what my query posted earlier returns?
wondering the same. Being a Jr. dev myself, I really want to work this out. Now I am addicted to this post lol.
Stephen

Yes the results where correct, but dont see how the logic works, doesnt appear to group the results by datakey - If you could explain how your query ensures the datakey and variable (landline, mobile or email) are unique then thats great, but I dont see it...

stephen99999 Ten Centuries Points: 1197 More actions · Answer 5

I see your point now. In all actuality, your prior code:

select count(distinct datakey) as Unique_Dataset

from (select *, row_number() over(partition by datakey order by case when Datakey!='' then 0 else 1 end) as pref

from test where Datakey!='') z

where pref = 1

select count(distinct landline) as Unique_landlines

from (select *, row_number() over(partition by datakey order by case when landline!='' then 0 else 1 end) as pref

from test where landline!='') z

where pref = 1

select count(distinct mobile) as Unique_Mobiles

from (select *, row_number() over(partition by datakey order by case when mobile!='' then 0 else 1 end) as pref

from test where mobile!='') z

where pref = 1

select count(distinct email) as Unique_Emails

from (select *, row_number() over(partition by datakey order by case when email!='' then 0 else 1 end) as pref

from test where email!='') z

where pref = 1

does what you are looking for. To put it all into one statement just set each select statement to an INT variable and then select them, ie., SELECT @a Total, @b-2 UniqueDataKeys, @C UniqueLandLines, etc

My question is, how did you figure out to do it this way? I have not used row_number(), over, partition by very much and really don't understand it. Can anyone elaborate? Haven't found a good blog that explains this in "idiot terms."

Mark Cowne One Orange Chip Points: 26969 More actions · Answer 6

stephen99999 (1/12/2012)
I see your point now. In all actuality, your prior code:
select count(distinct datakey) as Unique_Dataset
from (select *, row_number() over(partition by datakey order by case when Datakey!='' then 0 else 1 end) as pref
from test where Datakey!='') z
where pref = 1
select count(distinct landline) as Unique_landlines
from (select *, row_number() over(partition by datakey order by case when landline!='' then 0 else 1 end) as pref
from test where landline!='') z
where pref = 1
select count(distinct mobile) as Unique_Mobiles
from (select *, row_number() over(partition by datakey order by case when mobile!='' then 0 else 1 end) as pref
from test where mobile!='') z
where pref = 1
select count(distinct email) as Unique_Emails
from (select *, row_number() over(partition by datakey order by case when email!='' then 0 else 1 end) as pref
from test where email!='') z
where pref = 1
does what you are looking for. To put it all into one statement just set each select statement to an INT variable and then select them, ie., SELECT @a Total, @b-2 UniqueDataKeys, @C UniqueLandLines, etc
My question is, how did you figure out to do it this way? I have not used row_number(), over, partition by very much and really don't understand it. Can anyone elaborate? Haven't found a good blog that explains this in "idiot terms."

I'm not convinced these queries work at all even though they apparently give desired results.

This one

select count(distinct landline) as Unique_landlines

from (select *, row_number() over(partition by datakey order by case when landline!='' then 0 else 1 end) as pref

from test where landline!='') z

where pref = 1

appears to pick a random landline from each datakey group based on an indeterminate ordering and then counts how many distinct items are found.

Adding unique ordering changes the results

select count(distinct landline) as Unique_landlines

from (select *, row_number() over(partition by datakey order by case when landline!='' then 0 else 1 end, INDKEY ASC) as pref

from test where landline!='') z

where pref = 1

select count(distinct landline) as Unique_landlines

from (select *, row_number() over(partition by datakey order by case when landline!='' then 0 else 1 end, INDKEY DESC) as pref

from test where landline!='') z

where pref = 1

____________________________________________________

Deja View - The strange feeling that somewhere, sometime you've optimised this query before

How to get the best help on a forum

http://www.sqlservercentral.com/articles/Best+Practices/61537

bicky1980 SSCrazy Points: 2079 More actions · Answer 7

Mark-101232 (1/13/2012)
stephen99999 (1/12/2012)
I see your point now. In all actuality, your prior code:
select count(distinct datakey) as Unique_Dataset
from (select *, row_number() over(partition by datakey order by case when Datakey!='' then 0 else 1 end) as pref
from test where Datakey!='') z
where pref = 1
select count(distinct landline) as Unique_landlines
from (select *, row_number() over(partition by datakey order by case when landline!='' then 0 else 1 end) as pref
from test where landline!='') z
where pref = 1
select count(distinct mobile) as Unique_Mobiles
from (select *, row_number() over(partition by datakey order by case when mobile!='' then 0 else 1 end) as pref
from test where mobile!='') z
where pref = 1
select count(distinct email) as Unique_Emails
from (select *, row_number() over(partition by datakey order by case when email!='' then 0 else 1 end) as pref
from test where email!='') z
where pref = 1
does what you are looking for. To put it all into one statement just set each select statement to an INT variable and then select them, ie., SELECT @a Total, @b-2 UniqueDataKeys, @C UniqueLandLines, etc
My question is, how did you figure out to do it this way? I have not used row_number(), over, partition by very much and really don't understand it. Can anyone elaborate? Haven't found a good blog that explains this in "idiot terms."
I'm not convinced these queries work at all even though they apparently give desired results.
This one
select count(distinct landline) as Unique_landlines
from (select *, row_number() over(partition by datakey order by case when landline!='' then 0 else 1 end) as pref
from test where landline!='') z
where pref = 1
appears to pick a random landline from each datakey group based on an indeterminate ordering and then counts how many distinct items are found.
Adding unique ordering changes the results
select count(distinct landline) as Unique_landlines
from (select *, row_number() over(partition by datakey order by case when landline!='' then 0 else 1 end, INDKEY ASC) as pref
from test where landline!='') z
where pref = 1
select count(distinct landline) as Unique_landlines
from (select *, row_number() over(partition by datakey order by case when landline!='' then 0 else 1 end, INDKEY DESC) as pref
from test where landline!='') z
where pref = 1

I was thinking the case would prioritise any populaed landline field over the empty ones and then perform a distinct count of them. I'm not sure but with the order by indkey, does that change the order of the landline priority?

Mark Cowne One Orange Chip Points: 26969 More actions · Answer 8

<snip>
I'm not convinced these queries work at all even though they apparently give desired results.
This one
select count(distinct landline) as Unique_landlines
from (select *, row_number() over(partition by datakey order by case when landline!='' then 0 else 1 end) as pref
from test where landline!='') z
where pref = 1
appears to pick a random landline from each datakey group based on an indeterminate ordering and then counts how many distinct items are found.
Adding unique ordering changes the results
select count(distinct landline) as Unique_landlines
from (select *, row_number() over(partition by datakey order by case when landline!='' then 0 else 1 end, INDKEY ASC) as pref
from test where landline!='') z
where pref = 1
select count(distinct landline) as Unique_landlines
from (select *, row_number() over(partition by datakey order by case when landline!='' then 0 else 1 end, INDKEY DESC) as pref
from test where landline!='') z
where pref = 1
I was thinking the case would prioritise any populaed landline field over the empty ones and then perform a distinct count of them. I'm not sure but with the order by indkey, does that change the order of the landline priority?

The CASE expression does prioritise the populated ones before the unpopuated, but you're excluding the unpopulated ones anyway in the WHERE clause. Effectively

select *, row_number() over(partition by datakey order by case when landline!='' then 0 else 1 end) as pref

from test where landline!=''

is

select *, row_number() over(partition by datakey order by 0) as pref

from test where landline!=''

In other words, the ordering is undefined. But I still don't understand why you want to take just one landline value from each datakey group.

____________________________________________________

Deja View - The strange feeling that somewhere, sometime you've optimised this query before

How to get the best help on a forum

http://www.sqlservercentral.com/articles/Best+Practices/61537

bicky1980 SSCrazy Points: 2079 More actions · Answer 9

Mark-101232 (1/13/2012)
<snip>
I'm not convinced these queries work at all even though they apparently give desired results.
This one
select count(distinct landline) as Unique_landlines
from (select *, row_number() over(partition by datakey order by case when landline!='' then 0 else 1 end) as pref
from test where landline!='') z
where pref = 1
appears to pick a random landline from each datakey group based on an indeterminate ordering and then counts how many distinct items are found.
Adding unique ordering changes the results
select count(distinct landline) as Unique_landlines
from (select *, row_number() over(partition by datakey order by case when landline!='' then 0 else 1 end, INDKEY ASC) as pref
from test where landline!='') z
where pref = 1
select count(distinct landline) as Unique_landlines
from (select *, row_number() over(partition by datakey order by case when landline!='' then 0 else 1 end, INDKEY DESC) as pref
from test where landline!='') z
where pref = 1
I was thinking the case would prioritise any populaed landline field over the empty ones and then perform a distinct count of them. I'm not sure but with the order by indkey, does that change the order of the landline priority?
The CASE expression does prioritise the populated ones before the unpopuated, but you're excluding the unpopulated ones anyway in the WHERE clause. Effectively
select *, row_number() over(partition by datakey order by case when landline!='' then 0 else 1 end) as pref
from test where landline!=''
is
select *, row_number() over(partition by datakey order by 0) as pref
from test where landline!=''
In other words, the ordering is undefined. But I still don't understand why you want to take just one landline value from each datakey group.

I want to take one landline value as I can only count one landline against each unique dataset

stephen99999 Ten Centuries Points: 1197 More actions · Answer 10

I think the best way to do this is to just test on various sample data. Really change it up, and see if you can "break" the code. If not, then I guess it works (even if we don't fully understand it).

Stephen

Mark Cowne One Orange Chip Points: 26969 More actions · Answer 11

Sorry, I still don't get it. Why only one landline value per datakey, surely you need to consider all landline values to count the number of distinct ones.

Here's a simple example, what result are you expecting?

truncate table test

insert into test

values ('1','0001','1111111111','','')

insert into test

values ('2','0001','9999999999','','')

insert into test

values ('3','0002','1111111111','','')

select count(distinct landline) as Unique_landlines

from (select *, row_number() over(partition by datakey order by case when landline!='' then 0 else 1 end,indkey asc) as pref

from test where landline!='') z

where pref = 1

select count(distinct landline) as Unique_landlines

from (select *, row_number() over(partition by datakey order by case when landline!='' then 0 else 1 end,indkey desc) as pref

from test where landline!='') z

where pref = 1

____________________________________________________

Deja View - The strange feeling that somewhere, sometime you've optimised this query before

How to get the best help on a forum

http://www.sqlservercentral.com/articles/Best+Practices/61537

bicky1980 SSCrazy Points: 2079 More actions · Answer 12

OK, Yes you are right my query does not work correctly...

I have edited the data slightly using the following:

create table test2(

indkey nvarchar(2),

datakey nvarchar(4),

landline nvarchar(11),

mobile nvarchar(11),

email nvarchar(20))

insert into test2

values ('1','0001','01234567890','0712345679','1@test.co.uk')

insert into test2

values ('2','0001','01234567890','','1@test.co.uk')

insert into test2

values ('3','0002','01234567890','','2@test.co.uk')

insert into test2

values ('4','0002','01234567891','','2@test.co.uk')

insert into test2

values ('5','0002','','07123456789','')

insert into test2

values ('6','0003','01234567892','07123456791','')

insert into test2

values ('7','0004','01234567893','07123456792','')

insert into test2

values ('8','0005','01234567894','07123456793','2@test.co.uk')

insert into test2

values ('9','0008','01234567895','07123456793','9@test.co.uk')

Now my results are different, what the row_number is doing is selecting a preference that populated landline values are prioritised over empty values. 🙁 What I need the query to do is preference a populated value which isnt returned for another datakey...Probablu not making sense so I will try to demonstrate:

Run the following query on the test2 data:

select *, row_number() over(partition by datakey order by case when landline!=''

then 0 else 1 end) as pref

from test2

As you can see the preference has been assigned to populated value over empty values, but where there are 2 landline values, the query has just selected them in the default order of the file. So using these results we would be using the following:

select * from (

select *, row_number() over(partition by datakey order by case when landline!=''

then 0 else 1 end) as pref

from test2) z where pref=1

As you can see this has left us a duplicate landline value for datakey 0001 & 0002, when really I needed the query to see that the number 01234567890 has already been used for datakey 0001 so then use the next populated landline value (as long as that also does not exist in any previous records)

Does this make sense?

ChrisM@home SSC-Insane Points: 24260 More actions · Answer 13

Hi Bicky

Some very experienced and talented folks are having real difficulty figuring out exactly what you want here, and it looks to me that you might still be figuring it out yourself. Can I suggest you take a step back and review your requirement, then return with a more complete description and consolidate that with a representative sample data set?

I suspect the solving query won't be complicated at all when it's done. It's not the answer which is causing a problem for folks, it's the question.

Cheers

ChrisM

[font="Arial"]^{Low-hanging fruit picker and defender of the moggies}[/font]

For better assistance in answering your questions, please read this[/url].

Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

TomThomson SSC Guru Points: 104773 More actions · Answer 14

stephen99999 (1/13/2012)
I think the best way to do this is to just test on various sample data. Really change it up, and see if you can "break" the code. If not, then I guess it works (even if we don't fully understand it).
Stephen

That's a pretty dangerous approach - trusting code you don't understand to be correct after testing on a set of small samples. If this code doesn't work, how will you ever know it doesn't work, if the data combination that breaks it is so complex that it can only arise with a sample of at least 5000 rows?

Tom

bicky1980 SSCrazy Points: 2079 More actions · Answer 15

ChrisM@home (1/13/2012)
Hi Bicky
Some very experienced and talented folks are having real difficulty figuring out exactly what you want here, and it looks to me that you might still be figuring it out yourself. Can I suggest you take a step back and review your requirement, then return with a more complete description and consolidate that with a representative sample data set?
I suspect the solving query won't be complicated at all when it's done. It's not the answer which is causing a problem for folks, it's the question.
Cheers
ChrisM

I'm am not trying to figure it out myself I know exactly what I want, and what the constraints are.

I willl try to clarify exactly what I need:

I have a table which contains 40 million records. I need to provided population statistics on the following (I will put in brackets what each of the constraints are)

Total number of records (No Constraints)

Total number of Unique Datakey (Needs to be unique)

Total number of populated Landline fields (Needs to have a unique datakey as well as be unique itself - If the same landline is anywhere else in the results its needs to be excluded from the figure)

Total number of populated Mobiles (Needs to have a unique datakey as well as be unique itself - If the same mobile is anywhere else in the results its needs to be excluded from the figure)

Total number of populated Emails (Needs to have a unique datakey as well as be unique itself - If the same email is anywhere else in the results its needs to be excluded from the figure)

Now this makes sense to me, if people need any more clarification, please ask, and I will try to clarify my requirements more.

Please use the code creating test2 previous created to test any queries

Thanks All