How best to store data recieved with different column layouts

  • I have a design problem that I am trying to figure out. Hopefully somebody here has some insight or experience that can help.

    My database will have to store data sent to us from multiple vendors and each vendor will return the data with a potentially different set of columns.

    For example:

    Vendor A returns PK, ColA,ColB,ColC,ColD

    Vendor B returns PK, ColE,ColF,ColG,ColH

    Vendor C returns PK, ColI,ColJ,ColK,ColL

    Vendor D returns PK, ColA,ColB,ColM

    etc..

    In addition to the returned columns being different, each vendor may return between as few as 10 fields, or as many as 60+ fields.

    The tables to track the PK and the basic transaction data is pretty easy. My problem is trying to figure out how to store the additional columns returned by each of the different vendors.

    Solution 1: I could create a very wide table that contains all the columns across all of the different returned data. The problem with this is that it may result in many rows with many null columns.

    Solution 2: I could create multiple tables where each table is specific to a vendors return data format. But this will partition my data across multiple tables. So to query for and extract a subset of data I may have to perform a union of all these different tables. The result of this union will probably contain many rows with many null columns.

    This will not be data warehouse, but it will be a large database with over 25 million rows across a dozen or more differing column layouts. These returned table layouts are subject to change at any time.

    Does anybody have any ideas on how to tackle such a problem ? I guess I could use SSIS to try to standardize the returned data into a common format but I would like to preserve the original data as much as possible. Plus it may be possible that VendorA and VendorB return columns with the same name but they may contain different data, so processing/transforming the data into the same target column may not be a simple task.

    What I'm looking for is any feedback about design strategies that may be helpful for such a situation.

    This database will be used to track and store this data but we will not have applications hitting it directly. We will be extracting subsets of this data that will be used by client apps but those chunks will probably be only several million rows in size.

    Any help or suggestions will be appreciated.

  • Consider:

    1. EAV - see http://en.wikipedia.org/wiki/Entity-attribute-value_model

    2. Storing the data in XML, and using XML indexes and persisted computed columns to promote properties

    3. Consider using SQL 2K8's sparse columns (url]http://msdn.microsoft.com/en-us/library/cc280604.aspx[/url])

    Paul

Viewing 2 posts - 1 through 1 (of 1 total)

You must be logged in to reply to this topic. Login to reply