Architecture and Management Question - Multiple Schemas and Tables Vs Few Tables with Thousands of Rows

  • I will be working on a plan to import data from multiple sources (say about 1000 sources). These sources will have different data, however, the actual table layouts for each source will be the same. Some people are pushing for an architecture where each source will have their own schema with separate tables. This means that we're looking at something like something like 30k+ tables which I think sounds like an awful way to go. Even splitting the data up into multiple databases sounds ridiculous to me. I was thinking that I would have a reference table for each source; so, 1000 records in a reference table with a unique id. I would then have 30 tables, that contains all data from all sources and each record would reference back to the reference table via the unique id. My guess is that the largest table would have maybe 1 million records which isn't bad as long as it's referenced properly.

    Any thoughts?

  • JoshDBGuy (3/14/2014)


    I will be working on a plan to import data from multiple sources (say about 1000 sources). These sources will have different data, however, the actual table layouts for each source will be the same. Some people are pushing for an architecture where each source will have their own schema with separate tables. This means that we're looking at something like something like 30k+ tables which I think sounds like an awful way to go. Even splitting the data up into multiple databases sounds ridiculous to me. I was thinking that I would have a reference table for each source; so, 1000 records in a reference table with a unique id. I would then have 30 tables, that contains all data from all sources and each record would reference back to the reference table via the unique id. My guess is that the largest table would have maybe 1 million records which isn't bad as long as it's referenced properly.

    Any thoughts?

    The separate tables thing would make for an absolute nightmare of dynamic SQL when it comes to batch jobs. The second option would easily resolve that but the question then would be one of security. Will it be necessary to make it so certain users can't see parts of the table depending on the source? This isn't a big problem and can easily be handled by a stored procedure but it is something that you need to be aware of.

    In either case, I would definitely go with option 2, the single table with a reference table kicker. Overall, it will make life much easier for everyone and a million rows is nothing compared to a thousand schemas and tens of thousand of tables. Just imagine the sources of data having one column added to it. 😉 Don't say that won't ever happen because I've seen it happen way too many times for me to ever allow such a thing ever again. 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (3/14/2014)


    JoshDBGuy (3/14/2014)


    I will be working on a plan to import data from multiple sources (say about 1000 sources). These sources will have different data, however, the actual table layouts for each source will be the same. Some people are pushing for an architecture where each source will have their own schema with separate tables. This means that we're looking at something like something like 30k+ tables which I think sounds like an awful way to go. Even splitting the data up into multiple databases sounds ridiculous to me. I was thinking that I would have a reference table for each source; so, 1000 records in a reference table with a unique id. I would then have 30 tables, that contains all data from all sources and each record would reference back to the reference table via the unique id. My guess is that the largest table would have maybe 1 million records which isn't bad as long as it's referenced properly.

    Any thoughts?

    The separate tables thing would make for an absolute nightmare of dynamic SQL when it comes to batch jobs. The second option would easily resolve that but the question then would be one of security. Will it be necessary to make it so certain users can't see parts of the table depending on the source? This isn't a big problem and can easily be handled by a stored procedure but it is something that you need to be aware of.

    In either case, I would definitely go with option 2, the single table with a reference table kicker. Overall, it will make life much easier for everyone and a million rows is nothing compared to a thousand schemas and tens of thousand of tables. Just imagine the sources of data having one column added to it. 😉 Don't say that won't ever happen because I've seen it happen way too many times for me to ever allow such a thing ever again. 😀

    Security isn't a problem, the data doesn't contain private info, it's basically a staging environment.

  • JoshDBGuy (3/14/2014)


    Jeff Moden (3/14/2014)


    JoshDBGuy (3/14/2014)


    I will be working on a plan to import data from multiple sources (say about 1000 sources). These sources will have different data, however, the actual table layouts for each source will be the same. Some people are pushing for an architecture where each source will have their own schema with separate tables. This means that we're looking at something like something like 30k+ tables which I think sounds like an awful way to go. Even splitting the data up into multiple databases sounds ridiculous to me. I was thinking that I would have a reference table for each source; so, 1000 records in a reference table with a unique id. I would then have 30 tables, that contains all data from all sources and each record would reference back to the reference table via the unique id. My guess is that the largest table would have maybe 1 million records which isn't bad as long as it's referenced properly.

    Any thoughts?

    The separate tables thing would make for an absolute nightmare of dynamic SQL when it comes to batch jobs. The second option would easily resolve that but the question then would be one of security. Will it be necessary to make it so certain users can't see parts of the table depending on the source? This isn't a big problem and can easily be handled by a stored procedure but it is something that you need to be aware of.

    In either case, I would definitely go with option 2, the single table with a reference table kicker. Overall, it will make life much easier for everyone and a million rows is nothing compared to a thousand schemas and tens of thousand of tables. Just imagine the sources of data having one column added to it. 😉 Don't say that won't ever happen because I've seen it happen way too many times for me to ever allow such a thing ever again. 😀

    Security isn't a problem, the data doesn't contain private info, it's basically a staging environment.

    That's a horse of a slightly different color. Will you need to do DELETEs from this table as soon as the source data has been used to update final tables?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply