May 17, 2012 at 11:06 am
Hey, Steve, can you change the birth year in my profile to read 1776, please?
😀
May 17, 2012 at 11:08 am
Brandie Tarvin (5/17/2012)
Hey, Steve, can you change the birth year in my profile to read 1776, please?😀
AD or BCE?
--------------------------------------
When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
--------------------------------------
It’s unpleasantly like being drunk.
What’s so unpleasant about being drunk?
You ask a glass of water. -- Douglas Adams
May 17, 2012 at 11:09 am
Stefan Krzywicki (5/17/2012)
Steve Jones - SSC Editor (5/17/2012)
Thanks to new data mining abilities I think I read that they have the ability, to 75% or 90% or something like that, to find an individual in the States with nothing more than your Zip Code, gender, age and birthdate. I may have even specified too many parameters there.
Zip, gender, birthdate gets some crazy % identified, or closely identified. Add in one other thing and I think it gets to like 98% likelihood.
Here we go: This paper says 87%. And that's just zip code, gender and birthdate. I'd bet it hits the mid to high 90s if you add in age.
If you already have date of birth how does age give you any extra information? Or do you think Dr Sweeney says "date of birth" and means only ay and month (she's a distinguished mathematician and the founder and director of Harvard's Data Privacy Lab, so sloppiness like that strikes me as unlikely).
Tom
May 17, 2012 at 11:09 am
Stefan Krzywicki (5/17/2012)
Steve Jones - SSC Editor (5/17/2012)
Thanks to new data mining abilities I think I read that they have the ability, to 75% or 90% or something like that, to find an individual in the States with nothing more than your Zip Code, gender, age and birthdate. I may have even specified too many parameters there.
Zip, gender, birthdate gets some crazy % identified, or closely identified. Add in one other thing and I think it gets to like 98% likelihood.
Here we go: This paper says 87%. And that's just zip code, gender and birthdate. I'd bet it hits the mid to high 90s if you add in age.
You have birthdate in there and you expect an increase in effectiveness if you add in age???? Umm ... think about that one for a second there.
Edit: Even if it's only month and day, not year, as per later post, based on distributions on those stats, you don't need year to nail it down to very few people in any given Zip. Even high-density Zips, like Manhatten.
Assume 1/365.25 of the population of a Zip has a birthday on any given day of the year (not fully accurate, since September has more births than other months, but it'll do for this exercise). Assume 50% for each gender (again, not fully accurate, but close enough). Assume a Zip code with a population of 10k, and with birthday (mm/dd) and gender, you have 13 people. There are approximately 40,000 Zip codes in the US, and approximately 313M people, so 7825 per average Zip code. If we call it 8k in the Zip code, birthday (minus year), and gender, narrows it down to 11 people.
Add in year to the birthday, and yeah, there's a very good chance that there's one person per Zip code with a given bithday and gender. Very good chance. Even given stats on twins as a percentage of population.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
May 17, 2012 at 11:14 am
Lynn Pettis (5/17/2012)
L' Eomot Inversé (5/17/2012)
Revenant (5/17/2012)
Stefan Krzywicki (5/17/2012)
Thanks to new data mining abilities I think I read that they have the ability, to 75% or 90% or something like that, to find an individual in the States with nothing more than your Zip Code, gender, age and birthdate. I may have even specified too many parameters there.I believe that is exaggerated. Several years ago I developed a pharmacy system for Eckerd Drugs and ascertaining identity of customers was a really tough problem, even when you had a name, date of birth and address. We had two Maria Martinezes living in the same building in Austin. By a freak of coincidence, they had the same birthday and no social insurance numbers because they were living with their immigrated children and they themselves were not expected to earn any income.
But how often does it happen that when you look at a name, gender, address, and birthdate you find your data fits more than one person? Hardly ever. So that data will identify an individual almost all the time - much closer to 100% of teh time than it is to 99%.
If you substitute zip for address, it will still usually identify an individual - Stefan's "70% or 90% or something like that" is in the right ballpark. And Stefan has indeed specified too many parameters, since he had age and also specified birthdate, from which age is deducible.
It's even worse in UK with postcode instead of ZIP, because most post codes cover only a small number of buildings. In Spain postcode isn't much help - my postcode there covers many thousands of dwellings.
ZIP+4 here the USA will narrow it even closer.
I agree, Lynn. With name in the mix, you are getting close to 100 percent, but only close. In many situations -- I mentioned pharmacy -- that is not good enough. For targeted marketing it is, so as usual, "it depends."
May 17, 2012 at 11:14 am
GSquared (5/17/2012)
Stefan Krzywicki (5/17/2012)
Steve Jones - SSC Editor (5/17/2012)
Thanks to new data mining abilities I think I read that they have the ability, to 75% or 90% or something like that, to find an individual in the States with nothing more than your Zip Code, gender, age and birthdate. I may have even specified too many parameters there.
Zip, gender, birthdate gets some crazy % identified, or closely identified. Add in one other thing and I think it gets to like 98% likelihood.
Here we go: This paper says 87%. And that's just zip code, gender and birthdate. I'd bet it hits the mid to high 90s if you add in age.
You have birthdate in there and you expect an increase in effectiveness if you add in age???? Umm ... think about that one for a second there.
No, no, clearly that was a typo and I meant... phage... If you'd suffered in the great Plague of 2007 then you'd...
Yeah, I got nothin'.
--------------------------------------
When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
--------------------------------------
It’s unpleasantly like being drunk.
What’s so unpleasant about being drunk?
You ask a glass of water. -- Douglas Adams
May 17, 2012 at 11:15 am
Revenant (5/17/2012)
Lynn Pettis (5/17/2012)
L' Eomot Inversé (5/17/2012)
Revenant (5/17/2012)
Stefan Krzywicki (5/17/2012)
Thanks to new data mining abilities I think I read that they have the ability, to 75% or 90% or something like that, to find an individual in the States with nothing more than your Zip Code, gender, age and birthdate. I may have even specified too many parameters there.I believe that is exaggerated. Several years ago I developed a pharmacy system for Eckerd Drugs and ascertaining identity of customers was a really tough problem, even when you had a name, date of birth and address. We had two Maria Martinezes living in the same building in Austin. By a freak of coincidence, they had the same birthday and no social insurance numbers because they were living with their immigrated children and they themselves were not expected to earn any income.
But how often does it happen that when you look at a name, gender, address, and birthdate you find your data fits more than one person? Hardly ever. So that data will identify an individual almost all the time - much closer to 100% of teh time than it is to 99%.
If you substitute zip for address, it will still usually identify an individual - Stefan's "70% or 90% or something like that" is in the right ballpark. And Stefan has indeed specified too many parameters, since he had age and also specified birthdate, from which age is deducible.
It's even worse in UK with postcode instead of ZIP, because most post codes cover only a small number of buildings. In Spain postcode isn't much help - my postcode there covers many thousands of dwellings.
ZIP+4 here the USA will narrow it even closer.
I agree, Lynn. With name in the mix, you are getting close to 100 percent, but only close. In many situations -- I mentioned pharmacy -- that is not good enough. For targeted marketing it is, so as usual, "it depends."
Yep, good enough for statistical liklihood, not good enough for a primary key.
--------------------------------------
When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
--------------------------------------
It’s unpleasantly like being drunk.
What’s so unpleasant about being drunk?
You ask a glass of water. -- Douglas Adams
May 17, 2012 at 11:40 am
Stefan Krzywicki (5/17/2012)
GSquared (5/17/2012)
Stefan Krzywicki (5/17/2012)
Steve Jones - SSC Editor (5/17/2012)
Thanks to new data mining abilities I think I read that they have the ability, to 75% or 90% or something like that, to find an individual in the States with nothing more than your Zip Code, gender, age and birthdate. I may have even specified too many parameters there.
Zip, gender, birthdate gets some crazy % identified, or closely identified. Add in one other thing and I think it gets to like 98% likelihood.
Here we go: This paper says 87%. And that's just zip code, gender and birthdate. I'd bet it hits the mid to high 90s if you add in age.
You have birthdate in there and you expect an increase in effectiveness if you add in age???? Umm ... think about that one for a second there.
No, no, clearly that was a typo and I meant... phage... If you'd suffered in the great Plague of 2007 then you'd...Yeah, I got nothin'.
Oh, you fools. Don't you see what he's doing? Two words:
DBCC Timewarp!
Of course you would need both birthdate and age in order to narrow down the identity of those who have mastered the great DT. How could you possibly have forgotten that?
And I used to look up to you people. Yeesh.
May 17, 2012 at 11:46 am
Brandie Tarvin (5/17/2012)
Stefan Krzywicki (5/17/2012)
GSquared (5/17/2012)
Stefan Krzywicki (5/17/2012)
Steve Jones - SSC Editor (5/17/2012)
Thanks to new data mining abilities I think I read that they have the ability, to 75% or 90% or something like that, to find an individual in the States with nothing more than your Zip Code, gender, age and birthdate. I may have even specified too many parameters there.
Zip, gender, birthdate gets some crazy % identified, or closely identified. Add in one other thing and I think it gets to like 98% likelihood.
Here we go: This paper says 87%. And that's just zip code, gender and birthdate. I'd bet it hits the mid to high 90s if you add in age.
You have birthdate in there and you expect an increase in effectiveness if you add in age???? Umm ... think about that one for a second there.
No, no, clearly that was a typo and I meant... phage... If you'd suffered in the great Plague of 2007 then you'd...Yeah, I got nothin'.
Oh, you fools. Don't you see what he's doing? Two words:
DBCC Timewarp!
Of course you would need both birthdate and age in order to narrow down the identity of those who have mastered the great DT. How could you possibly have forgotten that?
And I used to look up to you people. Yeesh.
Ha, I should have left it the way I wrote it originally with the great Plague of 2014, but thought it'd take too much backstory. I should have remembered where I was. : -)
--------------------------------------
When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
--------------------------------------
It’s unpleasantly like being drunk.
What’s so unpleasant about being drunk?
You ask a glass of water. -- Douglas Adams
May 17, 2012 at 1:13 pm
http://www.geekologie.com/2012/05/im-not-special-chart-of-how-common-a-bir.php Think it depends when your birthday is, too.
---------------------------------------------------------
How best to post your question[/url]
How to post performance problems[/url]
Tally Table:What it is and how it replaces a loop[/url]
"stewsterl 80804 (10/16/2009)I guess when you stop and try to understand the solution provided you not only learn, but save yourself some headaches when you need to make any slight changes."
May 17, 2012 at 1:28 pm
jcrawf02 (5/17/2012)
http://www.geekologie.com/2012/05/im-not-special-chart-of-how-common-a-bir.php Think it depends when your birthday is, too.
Interesting. The second half of September is the most common!
May 17, 2012 at 1:46 pm
Steve Jones - SSC Editor (5/17/2012)
Smart-*** Kiwi.
Paul Randal started it. :laugh:
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
May 17, 2012 at 2:04 pm
Lynn Pettis (5/17/2012)
jcrawf02 (5/17/2012)
http://www.geekologie.com/2012/05/im-not-special-chart-of-how-common-a-bir.php Think it depends when your birthday is, too.Interesting. The second half of September is the most common!
9 months after New Year's Eve parties. Yep. Well-known phenomenon.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
May 17, 2012 at 3:41 pm
GSquared (5/17/2012)
Lynn Pettis (5/17/2012)
jcrawf02 (5/17/2012)
http://www.geekologie.com/2012/05/im-not-special-chart-of-how-common-a-bir.php Think it depends when your birthday is, too.Interesting. The second half of September is the most common!
9 months after New Year's Eve parties. Yep. Well-known phenomenon.
Where I come from, new year's eve parties are usually a bit earlier than the second half of January. 😎
Tom
May 17, 2012 at 4:27 pm
L' Eomot Inversé (5/17/2012)
GSquared (5/17/2012)
Lynn Pettis (5/17/2012)
jcrawf02 (5/17/2012)
http://www.geekologie.com/2012/05/im-not-special-chart-of-how-common-a-bir.php Think it depends when your birthday is, too.Interesting. The second half of September is the most common!
9 months after New Year's Eve parties. Yep. Well-known phenomenon.
Where I come from, new year's eve parties are usually a bit earlier than the second half of January. 😎
I do not get your calendar calculation, Tom, I admit.
Viewing 15 posts - 36,136 through 36,150 (of 66,712 total)
You must be logged in to reply to this topic. Login to reply