Primary key vs NOT NULL unique key..

  • If you use even a tinyint, you are still forced to make a join that wouldn't otherwise be necessary. Making no join is surely faster than making a tinyint join.

    One odd note with the tinyint PK/FK relationship: I have been unsuccessful when used in an OLAP environment to get the view reader to recognize the relationship. When I change it to smallint, the issue goes away. It's very odd, but seemed real enough to me to move on, even though in an OLAP environment you are scrapping for every byte you can remove.

  • jezemine (5/1/2008)


    you might want use a tinyint for the PK on a states table if it was being referred to from other large tables.

    one of the dbs I have to work with has a few tables over 2 billion rows, each with a couple different FKs that are tinyints. if these FKs were character types instead of tinyint it would have a significant impact on the size on disk, among other things. total rows in the entire db is over 14 billion.

    and what do these large tables use for a pk? bigint. 🙂

    Tinyint works for that. It's so limited it doesn't deal well with much else, but for very small data sets that aren't going to grow (or at least not rapidly), they do work.

    I'd have to test to see whether improved performance (no join) was worth the disk space. Even at 14 billion rows, if every single one of them has tinyint vs every single one of them with char(2), we're only talking 14 billion bytes, which is a little over 14 Gig. 14 Gig isn't exactly a huge expense these days. Even if you keep a full week of uncompressed, nightly, full database backups, we're talking about less than 100 Gig of extra backup space. Most likely, not all of the tables have State data in them, which means less than that.

    I don't know if the extra join would be worth saving that much disk space. I'd have to test. Assuming you have done those tests, and found it worthwhile, I'll take your judgement as good enough on that one on your servers.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • If I am evaluating the soundness of a general rule, it doesn't bother me one bit that the rule would not be applicable to a 14 billion row MegaTable. At this scale, very few general rules will apply -- you're pretty much operating on exceptions only. Such a table will not be browsed by users casually scrolling through a datatable embedded on a web page. Such a table will be processed by an application that probably will read through the records in the order they were inserted and produce a report or highly condensed records to insert into a smaller table. Probably not a lot of joining going on.

    So testing a general rule on such a behemoth tells you nothing of the value of that rule in most real-world, day-to-day situations. If the test fails, it doesn't mean the rule lacks value in average use; if the test succeeds, it doesn't mean the rule would work well in average conditions.

    There is value in testing at edge conditions. But at least use an edge that we can reasonably be expected to run up against. I work at a company that has expanded to international markets. The largest table we have has about 16.5 million rows. Take a round figure roughly triple that and we are satisfied if get good response with a table of 50 million rows. That is our edge. Even if we had one of those 14 billion row monsters in our database, we wouldn't design the rest of the tables to its requirements. Geez, at least I hope not.

    -- An optimist sees the glass as half full, a pessimist sees the glass as half empty, an engineer sees the glass as twice the needed capacity.

    Tomm Carr
    --
    Version Normal Form -- http://groups.google.com/group/vrdbms

Viewing 3 posts - 31 through 32 (of 32 total)

You must be logged in to reply to this topic. Login to reply