PK / FK design question

  • Harveysburger (3/27/2008)


    BAM! Jeff has spoken

    gotta say although it's not the most popular answer your arguments are strong

    thanks for the input, I will meditate on that for a while and see what comments others put here in the meantime,

    Thanks for the feedback... hope this helps in your design quest. Hard part will be to get folks to do things the "normalized" way.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • IMHO Mr. Moden is absolutly correct.

    Shortly after starting my new job, I was thrown some SQL that no-one had been able to tune. I discovered that they basically used Scenario 2 - for inspection info in this case. The SQL had to retrieve ALL types of inspections - hence all of the different parents with those children mixed up in the one "overloaded" table (our nickname for Scenario 2&3). The only solution the developers could come up with were correlated subqueries to go back and get the correct "parent" information depending upon the value in the TYPE column. RBAR!

    Once I realized the design, I wrote 4 queries and UNIONED them - as if there were proper parent/child table relationships (Scenario 1). The performance gain was amazing - no more RBAR. ( Of course each query was slowed somewhat by all the useless rows in that table for each particular relationship.)

    After 1.5 years, I still can't get the developers to agree that proper table design = MUCH easier SQL and MUCH better performance, and that SQL Server is a RELATIONAL database.

    Oh well...again IMHO Mr. Moden is right on the money!

    jg

  • Thanks John G... and spot on... the hard part most certainly is getting developers and designers to think in correct set based frames of mind... Sometimes that even includes SQL Developers!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Well, as I said, I am working on a schema design that was similar to the posters. I ended up creating a mock up of 2 models; one using the Super/Sub types (scenario 2) with 3 SubTypes and one using 3 individual normalized tables and a view to pull together common data (scenario 1) and I found the performance differences significant. Once I had my tables schemas in place, I ran a script to populate the tables. For the test, I went with 1 million rows of each type. This means that our SuperType table for schenario 2 had 3 million rows and each SubType table held 1 million. The normalized method (scenario 1) had 3 tables each with 1 million rows (and the view of course has 3 million).

    Once I had the tables loaded, I ran a couple of simple queries. The first query ran to get all types, all data for the last 3 days. The second query only returned data for one specific SubType. Here's a breakdown of what I found:

    KEY:SS - SuperType / SubType Model

    NV - Normalized Tables w/ UNION ALL View

    Query SS NV NV % Improvement

    All Types in last 3 days

    CPU8094 4297 47%

    Reads45108 20846 64%

    Duration372369 334678 10%

    Type 1 in last 3 days

    CPU 1422 234 84%

    Reads 17358 3472 80%

    Duration 88686 60284 32%

    I've attached my scripts for anyone who may want to see the work. You'll notice that I've duplicated the BusinessDate column in all tables. I realize this is redundant, but we are preparing to implement table partitioning and BusinessDate will be our partitioning key so it must exist in all of the tables that use the partitioning scheme. The partitioning function and scheme are not used in the scripts as I did not want to take the test out that far, but the column is still persisted into all of the tables because that is how it will be when we put this into work.

    **Disclaimer** This is a quick mock up and by no means our final solution, so please don't hammer me on design!! I was merely trying to prove a point here with a simple example.

    PS...I just noticed that you'll need a numbers table (100,000 rows) to work the data create scripts. I've attached a script to create it......

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • Nicely done John... You saying you like Scenario 1 or something? 😉

    You should see how fast things run in the NV tables when you've had the time to add the proper indexes. 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • You should see how fast things run in the NV tables when you've had the time to add the proper indexes.

    Thanks. For my example, I've only added a single clustered index on the key. This may or may not be what we'd end up with for our solution. What I like is the fact (pointed out by you in your earlier post) that each type table can have it's own indexing that won't get in the way of the other types like in the other scenarios. I can see how this could make a big difference down the road.

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • I went back to get the timings from our tests. I was mistaken, this was a financial_transactions design not inspections ( this was 1.5 years ago ;)). There are 4 types in the trans table - and a parent table for each of the 4 types.

    Note that our tables are still currently in Scenario 2 design.

    Both SQL statements I tested return 3 rows - exact same result set.

    1) original: Select with possible 4 correlated subqueries (decided by CASE statement):

    reads 43million, duration 510756 ms (approx 8.5 minutes), CPU 1183625 ms

    2) Four select statements written as if each trans table was normalized and UNIONed.

    reads 3815, duration 1828 ms , CPU 266 ms

    (I could not come up with any other possiblities to get the results.)

    Moral: even if you decide to combine subtypes in the transaction table, write your SQL as if they were separate tables.

    Warning: as the tables grow, performance will begin to suffer because of all of the rows it has to go thru to get the results.

    hope this helps - from a real-world example.

    jg

  • Moral: even if you decide to combine subtypes in the transaction table, write your SQL as if they were separate tables.

    Heh... moral of the story should be that if you decide to combine transaction subtypes in the transaction table, you've done it wrong and people will forget to take your sage advice on writing the code in the effective divide and conquer manner you suggest.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • John Rowan (4/1/2008)


    You should see how fast things run in the NV tables when you've had the time to add the proper indexes.

    Thanks. For my example, I've only added a single clustered index on the key. This may or may not be what we'd end up with for our solution. What I like is the fact (pointed out by you in your earlier post) that each type table can have it's own indexing that won't get in the way of the other types like in the other scenarios. I can see how this could make a big difference down the road.

    Thanks for the feedback, John.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Outstanding set of posts on this thread, both pro and con! Thanks for the very interesting conversation and the testing that some of you have done! You guys are the best!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Mr. Moden,

    your moralizing was better than mine...:)

    jg

  • Thanks all, plenty food for thoughts in there 🙂 Thanks for the effort, especially those who put together test case and such.

    Let's assume only 3 types out of let's say 20 need data logged in a details/child table, where as the other 17 types got everything they need in the main table. And also let's assume the main table only contains fields used by ALL types such as customerID, date, amount.... And let's assume the number of rows for each type is the same. Would this affect anyone's conclusion on the design?

    I just thought at throwing this out there because every example we discuss were assuming that each type had its own child table (or parent table... depending how you look at it...)

  • Harveysburger (4/1/2008)


    Would this affect anyone's conclusion on the design?

    No... disparate transaction types should live in their own tables. Rules of normalization such as no duplicates and no nulls should prevail.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 13 posts - 16 through 27 (of 27 total)

You must be logged in to reply to this topic. Login to reply