normalizing SQL configurations (and a "hopeful" regarding denali)

  • Assume I have 3 SQL servers in my environment which may be used as part of a source or destination in SSIS, and that the names of the servers are server1, server2 and server3.

    Assume also that I have two SSIS packages, package1 and package2.

    Package1 uses server1 and server2.

    Package2 uses server2 and server3.

    My configuration table would contain four rows. Two would be for package1. Two would be for package2. But we have repeating information in this table: the connection string for server2 has been defined twice.

    This is bad, because it means the two rows which are both meant to define the location of the same server (server2) could get out of synch. In other words, we have violated third normal form. This kind of thing upsets me to an almost irrational degree.

    It would be nice to be able to normalize this, having the configured values as one table, and the application of those values to packages as another. When we need to use a configuration we would be joining these two tables together and adding a predicate based on the package name.

    Unfortunately, as I understand it, this is not possible because when selecting a configuration location one must select a table, one cannot select a function and pass the package name as a parameter, nor can one write a query.

    I could create a normalized schema as desired, and then create two views on top of the schema, one view for the package1 configurations, and one view for the package2 configurations. But I find the necessity to create a new object in the database for each new package somewhat inelegant.

    Does anyone have a more elegant solution? If there isn't one in 2k8, does anyone know if there will be one in 2011?

  • While it is certainly desireable to have only one reference to a server in the config table, it appears that your issue is more about strict adherence to the 3NF. Correct me if I am mistaken. Outside of school strict adherence to any NF is not the norm. Most places design to the 3NF for systems they write, but departures into other NFs are common depending on the needs. I wouldn't get overly hung up on it.

    The package configurations in Denali are effectively the same, HOWEVER, they introduce an entirely new set of methodologies and package configurations are considered legacy. However, with that said I don't think it addresses your major issue.

    CEWII

  • I'm not overly fussed with "strict" adherence to 3NF. First of all, I'm more of a BCNF man myself. Secondly, this design in SSIS is still a form of OTLT and will never really be 3NF.

    However, I am definitely concerned with the structure for the same reason that 3NF and other NF's exist: to ensure my data doesn't become inconsistent.

    Fortunately, as you say, actual businesses in the real world don't really need consistent data.

  • Seems it's more of an issue with your ETL framework than with a limitation of SSIS. You have three rows in your configuration table, each associated with a different configuration and containing the connection string to one of the servers. Package 1 uses configuration 1 and 2 while package 2 uses configuration 2 and 3. QED. Or am I missing something here?

  • The configurationfilter column would specify which packages use which rows.

    Since the configurationfilter column cannot contain two values, each package needs its own row for server2.

    I know that one solution people use is to try to come up with a "generic" filter for values used by all packages, but not all packages always use these generic values. For instance, what if I have package3 which only uses machine3? As the number of packages and package configuration values increases, the complexity of deciding what is really "generic" goes up very quickly.

    I will probably go with the one view per package solution if no better option arises.

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply