November 23, 2006 at 10:16 am
Hi SQL Gurus,
I have 2 databases which have identical schemas. They contain client data that must remain segragated.
We are running some analysis that goes across both these databases where I hit on a performance question.
In Query Analyser if I run the query
USE database3
SELECT * FROM database1.MyTable
UNION
SELECT * FROM database2.MyTable
the query takes around 300 seconds.
I also have a view which is defined as
USE database3
CREATE VIEW MyView AS
SELECT * FROM database1.MyTable
UNION
SELECT * FROM database2.MyTable
If I run the query
USE database3
SELECT * FROM MyView
the query takes 360 seconds, around a minute slower. Why is this? I would have expected the view to be in some way optimised. I certainly wouldn't have expected it to be slower. I have run some timing tests around 20 times and get consistent results.
Does anyone know why this is, or can point me to any relevant documentation?
Each table has around 10 million rows. We are using SQL 2000 Enterprise running on Windows 2000
Thanks in advance!
November 23, 2006 at 11:13 am
What's the difference in the execution plans?
November 23, 2006 at 11:32 am
Do each of the seperate queries that the union combine return values distinct from each other, if so you should use 'union all' this will just union the results, whereas 'union' alone will sort the result sets and eliminate duplicates. This sort takes a lot of processing.
November 24, 2006 at 3:12 am
UNION ALL fixes the problem! I hadn't appreciated the difference between UNION and UNION ALL. I knew I must be doing something wrong!
Thanks for very much for your help.
November 24, 2006 at 6:42 am
So what are the final run times?
November 27, 2006 at 12:02 pm
Final times were basically all the same at around the 300 seconds mark. It would be nice if it were faster, but as this is for our own internal purposes, it doesn't really matter.
Joe,
We have separate databases because each database represents the data from a different client. We process data for a large number of external clients. If any client saw another clients data, we as a company would be royally screwed! Without going into any detail about how our app works, we regularly need to import client data, process and back up, but often we will need to restore from a previous back up a day or so after processing; if all data was in the same database, any restore would overwrite any updates from other clients. Secondly, we have something in the region of a terabyte of client databases, they are much easier to manage at the file level as individual databases. We have more than one SQL Server, so small databases allows us to do load balancing when required; we can physically move a database from one machine to another if we want to. Thirdly, we have also found that as the databases grow larger, their performance degrades, so separating them out makes sense from that angle also.
We only go across the databases for our own internal auditing purposes, which is not part of the process for our clients. No data appears in 2 places (I'm not sure why you think it is) this is the whole point of using a view!
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply