Strategy? Performing UNIONs, etc., on 10+ dbs...

  • My question is essentially this: What would be the best way to aggregate data from across several same-schema databases, for SELECTs especially, but also INSERTs, UPDATEs and DELETEs.

    To explain... I build/maintain a web-application for my client. This system has always kept their own clients' data (call them CompanyA, CompanyB, Company[n], etc.) in separate databases. Each company's database has the same schema, so the programmers (me) just have to track which customer-context the user is in, and can trust that the same code will work. This has worked just fine. Users could pull their reports, run their processes, use all the system functions through their browsers. Peace and joy reigned in the land.

    A new requirement just came down, to allow about half of our user population to see those same web pages...but to now collate the data from arbitrary combinations of those company databases.

    So consider User1, who has access to the data for CompanyA and CompanyD. The web application lets them review a list of products, and a list of orders, and a list of shipments, for those two clients...but only one client at a time. They have to explicitly select which client's data they want to view. On the programming side, we just change the variable with the client-name, and User1 sees the right stuff.

    Now User1 needs to see the data for both CompanyA and CompanyD in a single list when they bring up the list of products, or the orders, or the shipments. (Luckily, our key ID fields all have a client-name-abbreviation prefix, so we won't run into collisions.)

    The issue is that we have about 1000 to 1500 different queries that hit these databases. Each database has about 40-50 tables. I am looking for suggestions for how to reach this end-goal of aggregated views of the data, without just dumping all the data into one big database (causing performance issues, given the # of rows and users), and without (dear heavens) rewriting all of those queries and the code that use them.

    Any thoughts?

    - Tom Kelleher

    kelleher-[at]-tkelleher.com

  • And to add further complication to this happy task, if we could keep the technique that performs the aggregating flexible...that would help. Because they are signing on new Companies all the time. We don't want to have to keep editing underlying queries each time to tag on another UNION statement with each new company...

    Thanks again...

  • That's a tough one. IT's late and I don't have a good answer for you, but off the top of my head, I'd say aggregate on the client code. More flexible.

    Steve Jones

    sjones@sqlservercentral.com

    http://www.sqlservercentral.com/columnists/sjones

  • I think 10+ db's is a lot to union. If you have real reasons for keeping it separate it's that or Steve's note about just doing it in code. It's ultimately the same amount of data so you might want to reconsider putting it all in one db.

    Andy

    http://www.sqlservercentral.com/columnists/awarren/

  • Folks, thanks. From a simplicity point of view, I like the notion of just pouring all the data into one database. There might be contractual issues there though; from the start they may have signed on their companies with a promise to keep their data segregated from others'. Need to look into it. It's certainly an attractive notion, to get to the end-goal without changing our code all in one step. On the other hand, it *does* need to stay flexible, and the selection of company databases that do/don't need to be aggregated this way is sure to change over time. We don't want to be in a situation where we have to keep adding and subtracting data based on changing contractual arrangements with these folks.

    It's a snarly problem. Thanks a bunch for your input.

    -Tom

  • Totally understand. We've separated our data here by client as well, but we have more than 200, which rules out unions! Right now we run a process at night to aggregate for some of the other maintenance type things we do. If you just rebuild each night it's not bad, ours is actually a fake replication where we can make changes to either and they sync.

    Andy

    http://www.sqlservercentral.com/columnists/awarren/

  • It really does look like a Data Mart issue, but that is more of a batch process, not real time.

    I guess you could use replication to move relevant data (duplicated) from one database to others, but that could get ugly.

    I think code is the best way to stay flexible. You can more easily put result sets together.

    The other option is to use Distributed Views, but you'd need a way to determine from which database an item is coming.

    Steve Jones

    sjones@sqlservercentral.com

    http://www.sqlservercentral.com/columnists/sjones

  • The replication idea is something that has come to mind several times. Perhaps we get the best of both that way: Our data's actual *home* is in the individual segregated databases, while all queries are based on the replicated (aggregated) database.

    I'll spend some time in the Replication forum and see what's being discussed there...

  • If the aggregated DB has to be kept up to date in real-time, you could use triggers on the segregated DB's on every action. Should work fine, provided there aren't too many changes to the segregated DB's.

    Also depends on the kind of actions you want to perform on the aggregated DB (insert, update, delete?).

  • I've been thinking about triggers as well. That might be the best solution, in the end.

    And yes, we would want to do tons of SELECTs, a lot of UPDATEs, some INSERTs and the occasional DELETE.

    Perhaps I could just aggregate the one central table (call it the Products table), which would allow for most of the SELECTs. Updates could be done on the actual databases, and a trigger there would propagate the changes back into the aggregated table -- kind of like a bank-shot in pool. Same for INSERTs and DELETEs.

    I think that would work...but it would also essentially double the number of SQL commands to fire when the user performs a task. And I'm probably fooling myself to think I can do this to just one (or even just a few) tables, when we have over 50 of them to think about.

    Still thinking. But triggers have some promise. Gonna keep thinking on this...

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply