September 14, 2011 at 1:02 am
Hi All,
I have rectnly read the below article on simple-talk website.
I would like to know more about DENSITY & CARDINALITY (which i didn't get the complete understainding from this article) with some good example. I have searched on GOOGLE and don't find useful links.
I am not still clear with the below terminology.
1) Magic Density
2) Density Formula
SELECT (1.0 / COUNT(DISTINCT <ColumnName>)) FROM <TableName>
3) Relationship between Density & Cardinality
4) Formula to identify the Cardinality
5) How SQL server find the cardinality value
Correct me, If i am wrong.
If the sql server query optimizer doesn't identify the cardinality value, it applies the below formula right. If yes, is the below values are constant?
For "=", uses COUNT(*) * column Density, translating COUNT(*) * (1.0 / (COUNT (DISTINCT <column>)))
For "BETWEEN", uses 9%, translating (COUNT(*) * 9.0) / 100
For ">, >=, < and <=", uses 30%, translating (COUNT(*) * 30.0) / 100
If there are no statistics for a column and the equality operator is used, that means the SQL can’t know the density of the column, so it uses a 10 percent fixed value.
could anybody clear my doubts?
karthik
September 15, 2011 at 1:22 am
any inputs?
karthik
September 15, 2011 at 5:28 am
Too deep for me. I'll see if I can ping in someone.
September 15, 2011 at 7:00 am
1) Magic Density
This is simply the name for how SQL Server guesses when it doesn't know. The name probably comes from the programming concept of "magic numbers", which are "... a special constant used for some specific purpose. It is called magic because its value or presence is inexplicable without some additional knowledge." (http://www.wordiq.com/definition/Magic_number_%28programming%29)
In other words, "Magic Density" is just the answer to keeping the computer from spending too much time trying to figure out stats when they really can't be readily figured out.
2) Density Formula
SELECT (1.0 / COUNT(DISTINCT <ColumnName>)) FROM <TableName>
Density is the number of values in a column compared to the number of rows. If, for example, you have a table with a million rows, and all of them have the value 1 in one of the columns, the density for that column is 1, because 1.0/1 = 1. If one of the rows has 0 instead of 1, then the density is .5, because there are now two different values.
This is of limited use, because 1 zero and 999,999 ones makes for a column without much selectivity most of the time.
3) Relationship between Density & Cardinality
Cardinality is the percentage of a value in a column (essentially). So, in the example above, value "1" would have a HUGE cardinality, while value "0" would have very low cardinality. In more normal cases, high density usually results in high cardinality.
4) Formula to identify the Cardinality
It's just the percentage of rows with a certain value. If you have 20 rows with "Bob" as the value, in a 100-row table, then "Bob"'s cardinality is 20%.
5) How SQL server find the cardinality value
It stores stats for that. Number of rows per value, divided by total number of row, stored for use by the query engine.
Overall, it's all about helping the query engine figure out how many rows it should expect to have to work with. Since that determines a lot of how it will have to do the work of the query, it's an important thing, but it really just boils down to "how many rows is that"? It stores stats so that it can estimate that before it goes and starts looking.
Here's an example that I've found useful for explaining this:
Let's say you work in a big office building, and you need to figure out who owns certain cars in the parking garage. There are hundreds of cars. If you need to figure out the owners of all of the white cars, does that mean you need to walk in the front of the garage and glace at the first parking space, or does it mean you need to walk through the garage and make a list? You don't know? What if you happen to know that 40% of the cars in there are white? That'll help you plan how long this project will take, and what method to use for getting it done efficiently. That "40% are white" is cardinality. Density would be "how many different colors are there"?
What happens if you have a list of the cars at your desk, including color and owner? Viola! You can do the work very easily. That's a covering index. It still helps you to get the work done faster if you know the stats, because you can estimate whether you'll need to do any serious work or can just jot it down on a PostIt and be done. But you won't have to actually walk over to the garage and look at all the cars.
What if you also need to know license plate numbers, and the list doesn't have that, but it does include assigned parking spaces? That's going to require a "bookmark lookup", where you can use the list to narrow down the spaces you'll have to go to, but you will still have to go into the garage.
And so on from there.
I've found making it more physical like that helps to visualize the use of these things.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
September 15, 2011 at 10:04 am
I like that parking garage analogy Gus. Good going.
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
September 15, 2011 at 10:17 am
GSquared (9/15/2011)
What if you happen to know that 40% of the cars in there are white? That'll help you plan how long this project will take, and what method to use for getting it done efficiently. That "40% are white" is cardinality.
<pedantic>
The 40% is actually the selectivity of the predicate (Where CarColour = 'white'). If there were 500 cars in total with white cars having a selectivity of 40%, the cardinality of white cars would be 200. (Cardinality from 'Cardinal Number' which is a whole number expressing a quantity)
</pedantic>
http://technet.microsoft.com/en-us/library/cc966419.aspx
Selectivity: The fraction of rows from the input set of the predicate that satisfy the predicate. More sophisticated selectivity measures are also used to estimate the number of rows produced by joins, DISTINCT, and other operators. For example, SQL Server 2005 estimates the selectivity of the predicate "Sales.SalesOrderHeader.OrderID = 43659" in the AdventureWorks database as 1/31465 = 0.00003178.
Cardinality estimate: An estimate of the size of a result set. For example, if a table T has 100,000 rows and a query contains a selection predicate of the form T.a=10, and a histogram shows that the selectivity of T.a=10 is 10%, then the cardinality estimate for the fraction of rows of T that must be considered by the query is 10% * 100,000 = 10,000.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
September 16, 2011 at 6:31 am
GilaMonster (9/15/2011)
GSquared (9/15/2011)
What if you happen to know that 40% of the cars in there are white? That'll help you plan how long this project will take, and what method to use for getting it done efficiently. That "40% are white" is cardinality.
<pedantic>
The 40% is actually the selectivity of the predicate (Where CarColour = 'white'). If there were 500 cars in total with white cars having a selectivity of 40%, the cardinality of white cars would be 200. (Cardinality from 'Cardinal Number' which is a whole number expressing a quantity)
</pedantic>
http://technet.microsoft.com/en-us/library/cc966419.aspx
Selectivity: The fraction of rows from the input set of the predicate that satisfy the predicate. More sophisticated selectivity measures are also used to estimate the number of rows produced by joins, DISTINCT, and other operators. For example, SQL Server 2005 estimates the selectivity of the predicate "Sales.SalesOrderHeader.OrderID = 43659" in the AdventureWorks database as 1/31465 = 0.00003178.
Cardinality estimate: An estimate of the size of a result set. For example, if a table T has 100,000 rows and a query contains a selection predicate of the form T.a=10, and a histogram shows that the selectivity of T.a=10 is 10%, then the cardinality estimate for the fraction of rows of T that must be considered by the query is 10% * 100,000 = 10,000.
Yep. Didn't quite nail that one precisely, did I?
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
September 16, 2011 at 6:57 am
GSquared (9/16/2011)
Yep. Didn't quite nail that one precisely, did I?
Almost everyone mixes up density (measure of how unique an index is), selectivity (measure of the fraction of total rows affected by a predicate) and cardinality (measure of the number of rows processed by query/query operator)
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
September 16, 2011 at 7:35 am
GilaMonster (9/16/2011)
GSquared (9/16/2011)
Yep. Didn't quite nail that one precisely, did I?Almost everyone mixes up density (measure of how unique an index is), selectivity (measure of the fraction of total rows affected by a predicate) and cardinality (measure of the number of rows processed by query/query operator)
Yep. I know what they are, just didn't edit before I hit "post" after changing what I was writing in mid-post.
I usually explain cardinality in that 20 rows from a million row table or 20 rows from a 20 row table is still 20 rows. Different selectivity, but same cardinality.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply