Loading Data To Normalized Tables

  • Something I've always wondered is how is data loaded in a normalized OLTP database? One example that I think about is an ATM transaction because it captures data ranging from customer to ATM location. If the database is designed in at least 3NF then it would have multiple tables related to customer to handle name, account information, etc and the same can be said for ATM because of zip code, city, and state characteristics. My thinking is that all data is captured in a batch and broken out into the normalized tables. For example, all the data pertaining to the customer and ATM location is captured in one table at the time of the transaction and then another step normalizes it using the appropriate logic. However this doesn't seem efficient so I'm thinking the logic to normalize it is handled at the time of the transaction. I'm not having luck finding examples of how data is inserted once normalized tables are designed so does anyone have a good example or reference to follow?

  • The customer would have an identifying customer number and account number, and the ATM would have an identifying ATM number.

    Therefore, the transaction would broadly only need to capture cust#, acct#, ATM#, amount, type of transaction (deposit/withdrawal/etc) and datetime.

    That data is already normalized for the transaction, since all of the data is required to identity a single trans.

    The ATM location, for example, is available from the ATM# alone.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • That makes sense. What about something like a web form where a new user enters all of their information (name, age, city, state, zip, income, ethnicity, email, etc) for whatever service they are signing up for? There doesn't appear to be any defining characteristics since the customer doesn't previously exist.

  • In the ATM case, the customer details you described belong to the customer's bank. As Scott described, the ATM transaction details are quite narrow. There are APIs/protocols that allow the ATM to get details from the customer's bank using the identification information in the card.

    In the case of a web sign-up from, it depends. The defining characteristics are the values in the form (and whatever may be filled in later). For low-security, always-accepted scenarios, it may go straight into permanent tables.

    In other cases, it may initially go into a queue or staging table, and after some sort of confirmation process (self-confirmation or internal approval), and perhaps sign-up completion with login, password, preferences, etc., go into permanent member tables.

    In such scenarios, there is a minimal set of fields that must be known to function correctly. That can be implemented by not allowing the member to be created until all such data is populated, or by allowing a member to be created as invalid/incomplete but identifying them as invalid/incomplete.

  • RonMexico wrote:

    That makes sense. What about something like a web form where a new user enters all of their information (name, age, city, state, zip, income, ethnicity, email, etc) for whatever service they are signing up for? There doesn't appear to be any defining characteristics since the customer doesn't previously exist.

    if a new customer populates the fields “city”, “state” with string values “Danever”, “Arizona” - will the application accept those entries?

    if “yes” - what’s the point of allowing such a rubbish to the database?

    if “no” - then the form would need to lookup for existing valid names for cities in states, and if a matching valid combination is found then a corresponding CityID can be retrieved and submitted with the registration details instead of corresponding names.

    _____________
    Code for TallyGenerator

  • RonMexico wrote:

    Something I've always wondered is how is data loaded in a normalized OLTP database? One example that I think about is an ATM transaction because it captures data ranging from customer to ATM location. If the database is designed in at least 3NF then it would have multiple tables related to customer to handle name, account information, etc and the same can be said for ATM because of zip code, city, and state characteristics. My thinking is that all data is captured in a batch and broken out into the normalized tables. For example, all the data pertaining to the customer and ATM location is captured in one table at the time of the transaction and then another step normalizes it using the appropriate logic. However this doesn't seem efficient so I'm thinking the logic to normalize it is handled at the time of the transaction. I'm not having luck finding examples of how data is inserted once normalized tables are designed so does anyone have a good example or reference to follow?

    It's too general of a question imo.  Notice the responses are all regarding the DDL.  Insertions depend on the data model and there's seemingly little agreement on what constitutes "a good example" of that.  Also, imo any answer seems dependent on the method(s) of data access and the requirements of the system(s) making database request(s).  For example, when the data leaves the web form in what format is it in?  Probably JSON.  Is the question how to send a JSON document to SQL and populate a normalized data structure?  If so, what actually gets passed to SQL?  JSON or SQL Server typed parameters?  How/what converts JSON to SQL Server data types?  Makes a big difference to how the code gets written

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • I realize the question is very broad and that is likely due to me not having experience loading transactional data. What I'm getting out of all of the responses is that there can be and, most likely, will need to be some form of data staging in order to prep the data. That's how I envisioned it but my concern was that process could slow down the overall loading of the transactional data especially if it scales be hundreds in a short period of time.

  • Ideally, one plans for how valid they need/want the data to be vs. how much they're willing to pay to achieve that. e.g., if we must have valid addresses, then don't allow invalid addresses in the first place -- use an address service to validate addresses (or even provide autofill).

    Unfortunately, many businesses will opt for the cheap and fast option, and then choose/have to spend far more time and lost productivity & revenue manually fixing such data than they would have had they implemented constraints/validation in the first place.

    Can it slow things down? Maybe. But fast unusable transactions and data aren't very valuable. And the throughput of manually fixing such data after the fact is orders of magnitude slower than up-front validation.

  • RonMexico wrote:

    I realize the question is very broad and that is likely due to me not having experience loading transactional data. What I'm getting out of all of the responses is that there can be and, most likely, will need to be some form of data staging in order to prep the data. That's how I envisioned it but my concern was that process could slow down the overall loading of the transactional data especially if it scales be hundreds in a short period of time.

    Maybe.

    Simply, the application inserts values in the right places in the right order.  Think of a person, signing up for a web site.  Things names and addresses are entered by the user.  Demographic data is fixed.  Things like "hair color" is fixed list of choices, there's no need to worry if a lookup table contains the right value.  User name and passwords needs to be validated.

    In the case of say, an ATM transaction, it's the same type of process. The application inserts the values into the tables into the proper order.

    This is pretty basic stuff for most developers.

     

    Michael L John
    If you assassinate a DBA, would you pull a trigger?
    To properly post on a forum:
    http://www.sqlservercentral.com/articles/61537/

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply