cardinality and nchar columns, query problems

  • I have a table containing address data (PK_ID, HouseNumber, StreetName, StreetType, UnitNumber, City, State, ZIP, AddressParseStatus).

    AddressParseStatus is nchar(1) and is used to indicate the value returned by our USPS validation software. It has pretty low cardinality, only 9 possible values. There are roughly 1B rows in this table.

    I hadn't really planned on running any queries using this column as a filter, but as it turns out, we need to and they will essentially not run at all. SS won't even generate an execution plan for something like:

    select count(*) from MyAddresses where AddressParseStatus = 'X'

    Obviously, I wouldn't index a column like this, but then again, I don't think that's the problem. I also don't have State (nchar(2)) indexed, but if I run the following, it returns very quickly:

    select count(*) from MyAddresses where State = 'CA'

    I'm sure this is something bonehead that I've overlooked, but it's annoying.

  • Just out of curiosity, how do some of these run? Do estimated execution plans display for anything? Not that I have any answers or anything, just a curious case especially since you can query by state!

    select count(*) from MyAddresses where State = 'CA' and AddressParseStatus = 'X'

    or

    select count(*) from (

    select * from MyAddresses where State = 'CA'

    ) t1

    where AddressParseStatus = 'X'

  • If the column AddressParseStatus comes for the first time in a query as a part of filter SQL Server automatically creates statistics for the column (if AUTO_CREATE_STATISTICS is ON, which is default). The final execution plan generation waits until the statistics generation is finishe din order to consume its result for cardinality estimation.

    ___________________________
    Do Not Optimize for Exceptions!

  • Have you considered changing those columns from nchar to char? One of them has 9 possible values and the other seems to be US States. With a billion rows it would cut storage consumption by about 2 billion bytes. That is nearly 2GB of storage recovered. That is also 2GB less for accessing the disc during these queries.

    Can you post the table definition along with indexes so we know what your table looks like?

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • TheGreenShepherd (4/12/2012)

    AddressParseStatus is nchar(1) and is used to indicate the value returned by our USPS validation software. It has pretty low cardinality, only 9 possible values. There are roughly 1B rows in this table.

    [W]e need to [filter on AddressParseStatus]l. SS won't even generate an execution plan for something like:

    select count(*) from MyAddresses where AddressParseStatus = 'X'

    Obviously, I wouldn't index a column like this.

    [emphasis added]

    Why not? If you just need a count of rows for a given status(es), an index would satisfy that query extremely well.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply