April 16, 2012 at 8:22 am
I have a number of tables (around 40) containing snapshot data about 40 million plus vehicles. Each snapshot table is at a specific point in time (the end of the quarter) and is more or less identical in terms of structure (over the years extra variables have been added, but on the whole, most of the tables are identical).
Whilst most of our analysis is against single snapshots, on occasion we need to run some analysis against all the snapshots. For instance, we may need to build a new table containing all the Ford Focus cars from every single snapshot.
To achieve this we currently have two options:
a) write a long, long, long batch file repeating the same code over and over again, just changing the FROM clause [drawbacks - it takes a long time to write and changing a single line of code in one of blocks requires fiddly changes in all the other blocks]
b) use a view to union all the tables together and query that instead [drawbacks - our tables are stored in separate database instances and cannot be indexed, plus the resulting view is something like 600 million records long by 125 columns wide, so is incredibly slow]
So, what I would like to find out is whether I can either use dynamic sql or put the SQL into a loop to spool through all tables. This would be something like:
for each table in TableList
INSERT INTO output_table
SELECT table as OriginTableName, Make, Model
FROM table
next table in TableList
Is this possible? This would mean that updating the original SQL when our client changes what they need (a very regular occurrence!) would be very simple and we would benefit from all the indexes we already have on the original tables.
Any pointers, suggestions or help will be much appreciated.
Many thanks in advance,
Daryl
April 16, 2012 at 8:37 am
Dynamic SQL should do the job given required columns exists in all tables with the same name. Even if the columns names are different in some tables, you may be able to include the column name in your list table and build the query from there.
April 16, 2012 at 9:14 am
Another way, using sp_msforeachtable:
create table dbo.OutputTable (OriginTableName nvarchar(500), RecordCount INT)
create table dbo.TableList (Name nvarchar (500))
insert dbo.TableList
select '[dbo].[swap]'
union select '[dbo].[products]'
union select '[dbo].[structures]'
union select '[dbo].[stagingdata]'
exec sp_msforeachtable @command1 = 'INSERT INTO dbo.OutputTable SELECT ''?'', COUNT(*) from ?'
,@whereand = 'and syso.object_id in (select object_id(Name) from dbo.TableList)'
select * from dbo.OutputTable
April 17, 2012 at 12:54 am
I've not tried it yet, but your suggestion looks perfect, Eugene. It is exactly what I hoped that someone would come up with - I can build a list of tables and it looks like sp_msforeachtable basically gives me an inbuilt looping function.
I'll try it out later this week and report back, but I am very hopeful that this will fit the bill.
EDIT
I have just built my table list and tested it for some simply queries. It works perfectly and gives me precisely what I needed. Many thanks for your help.
April 17, 2012 at 4:02 am
I'm glad it did help you.
But it worth to mention that I would just use normal cursor and dynamic sql. That is exactly the case where using cursor is totally justified.
Undocumented sp_msforeachtable does exactly the same.
If you want to see how it works just execute:
sp_helptext sp_msforeachtable
Your cursor should be for SELECT TableName FROM dbo.TableList, then inside of cursor you can build required SQL string and execute it.
April 17, 2012 at 4:20 am
One thing that I have noticed is that I can not use a GROUP BY clause with sp_msforeachtable. This would be very useful for other, related, queries, so perhaps I do need to look at cursors to achieve that. Now that you've given me a pointer to this, hopefully it will not take me long to learn how to achieve the same with a cursor and expand that number of things I can do with the query. (plus write some very basic code that my less-SQL Server savvy colleagues can follow)
Thanks again.
April 18, 2012 at 1:38 am
Just for the record, I used Dynamic SQL with the TableList table of table names. As a very simple example, this looked like:
DECLARE @TableName varchar(500)
DECLARE @curTable CURSOR
DECLARE @sql NVARCHAR(1000)
SET @curTable = CURSOR FOR
SELECT [Name] FROM Vehicles_LookupTables.dbo.AllStockTableList
OPEN @curTable
FETCH NEXT
FROM @curTable INTO @TableName
WHILE @@FETCH_STATUS = 0
BEGIN
SET @sql = 'SELECT ''' +@TableName + ''', Make, sum(1) as Total FROM ' + @TableName + ' GROUP BY Make'
PRINT @sql
EXEC sp_executesql @sql
FETCH NEXT
FROM @curTable INTO @TableName
END
CLOSE @curTable
DEALLOCATE @curTable
GO
April 18, 2012 at 7:21 am
Daryl Lloyd (4/16/2012)
...b) use a view to union all the tables together and query that instead [drawbacks - our tables are stored in separate database instances and cannot be indexed, plus the resulting view is something like 600 million records long by 125 columns wide, so is incredibly slow]
...
Don't outright dismiss the partitioned view option without giving it a try. In this case it's called a Distributed Partitioned View, because it unionizes multiple tables across multiple databases, and it's been proven to work in many scenarios.
Distributed Partitioned Views / Federated Databases: Lessons Learned
"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho
April 18, 2012 at 7:34 am
Eric M Russell (4/18/2012)
Don't outright dismiss the partitioned view option without giving it a try. In this case it's called a Distributed Partitioned View, because it unionizes multiple tables across multiple databases, and it's been proven to work in many scenarios.
Distributed Partitioned Views / Federated Databases: Lessons Learned
We make limited use of this type of partition, and, as you say, it certainly does work. However, I think the problem in our situation is that we have to union 20-odd tables each containing 30 million plus records and (for some reason that I cannot recall at the moment) are unable to make use of any indexes. So it produces the right answer but at a terribly slow speed. I estimate that it takes roughly 5 to 10 times longer to run a unionised view query in comparison with a set of sequential queries on separate tables.
So I suspect that this type is view is very useful, but only the in the right context. Like most things in life, I suppose ;-). I will have a look at the link above, though, to learn a it more about Distributed Partitioned Views (ands perhaps I'll find something key that we've be doing wrong!).
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply