Storing multiple variations of the same data?

  • One of the rules of RDBMS is that we don't do display formatting in the database layer of a tiered architecture. It goes out to a presentation layer. In the case of names and addresses, you probably to get a tool to edit everything clean up your data to at least the CASS standards to keep the post office happy.

    This is what I remember... RDBMS is for data.  Let your application do the formatting.  To keep your RDBMS fast, you don't want to weigh it down with unnecessary data.  You want to keep it lean and mean.

  • aaron.reese - Monday, May 21, 2018 9:46 AM

    This is really a question of semantics, if it is necessary to have a properly conformed name or address for legal purposes, I would argue that that is not a presentation issue. This conformed data must be provided by the system,  Whether you compute it once and serve it up from a computed column for do it via a view or function at run-time is a design decision and there are valid reasons for doing both.  The lines become more blurred when trying to calculate subjective values (e.g. account delinquency).  Should the database return a scalar value for the application to assign to a banding and color code, or should the database rules apply the banding and just leave the display parameters up the application.  E.g. should delinquency be 82%, 'Severe' or 'Red'. - I would argue (almost) never Red - that is a presentation decision.  The score or banding is more subtle and really depends on the business requirements.

    @jeff, in your last statement you referred to making the database bigger than it needs to be - but who defines need.  Storage is cheap and the compute probably happens outside of business hours.  If speed of delivery is the most important thing then I say calculate the values and store them in derived columns, after all, speed of calculation is one reason you have built a data warehouse in the first place.

    Storage isn't cheap when you waste 6 times more than you actually need especially since SQL Server has to read things into memory before they can actually be used not to mention all of the hidden overhead because people will also add an index to each of those duplicated columns, which is another duplication of data that will ultimately result in 12 times more waste than actually needed.  I've also found that people defining the need don't necessarily understand either the problems or the possibilities without such waste.

    I'm not saying your wrong.  I'm just saying that would be wrong for me and I'd be compelled to fix it while also possibly increasing the performance and greatly reducing clock cycles that are dedicated only to the purpose of maintenance.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 2 posts - 16 through 16 (of 16 total)

You must be logged in to reply to this topic. Login to reply