June 21, 2012 at 10:01 am
Hi folks,
I have a question in regards to a design I've been pondering. Perhaps i'm merely over analyzing the problem but I wouldn't mind some input.
I have a raw table that is used to collect large amounts of statistical data. Right now a web application hits this table directly and of course it is inefficiant and slow. I'm trying to redesign to process by having a job that parses through and summarizes the data and outputs it to a staging table before then adding it to a small table that will be exposed to the web application. I have this working great and seeing huge performance gains in a test environment.
My concern (and it might not be warranted) is with the updating of the final table. It needs to be refreshed every hour with the new statistics. Keep in mind as well that the web application will sometimes query the table thousands of times in a span of a few minutes. My thoughts was to have a bit field let's say "isNew" in the destination table that is defaulted to true as new data comes in. (The calling stored procedure always grabbing data from rows where "isNew" is false.) Then I delete "old" rows then change all rows makred as new to old. Making them available once again and ready for the next data load.
The resulting table should never grow beyond 10,000 rows thus the point of trying to keep it small, accessible and fast. Does anybody see this solution causing issues with a query returning no data in that small timeframe when I do the DELETE/ALTER or causing any kind of significant delay or wait to the application. Is there a beter way? Sorry about the verbage folks and any comments are appreciated.
Thanks,
June 21, 2012 at 10:14 am
Assuming that you are doing this update in a stored procedure, I would do something more along the lines of this psudo code:
... prep work
set transaction isolation level serializable;
begin transaction
begin try
truncate destination_table;
insert into destination_table;
commit transaction
end try
begin catch
rollback transaction
... other error code as needed
end catch
end -- end of update procedure.
June 21, 2012 at 10:27 am
This actually sounds like a case for partitioning. Partition the main table, make sure the staging table is identical (indexes, constraints, columns, etc) then to move the data from staging into main you could just add a new partition to the main table and switch that new partiton with the staging table. Metadata operation, near instant.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
June 21, 2012 at 10:29 am
GilaMonster (6/21/2012)
This actually sounds like a case for partitioning. Partition the main table, make sure the staging table is identical (indexes, constraints, columns, etc) then to move the data from staging into main you could just add a new partition to the main table and switch that new partiton with the staging table. Metadata operation, near instant.
This works really fast. Only thing is, you need to be using the Enterprise Edition of SQL Server. What edition are you using?
June 21, 2012 at 11:04 am
Yes, I am using Enterprise edition. Partitioning is not something I have yet experimented with but that sounds like a great solution. Thanks for the idea.
June 21, 2012 at 11:06 am
Since you are using Enterprise Editon, I'd go with Gail's suggestion.
June 21, 2012 at 11:12 am
GilaMonster (6/21/2012)
This actually sounds like a case for partitioning. Partition the main table, make sure the staging table is identical (indexes, constraints, columns, etc) then to move the data from staging into main you could just add a new partition to the main table and switch that new partiton with the staging table. Metadata operation, near instant.
Just a quick question regarding this method. This loads the 'new' data into the main table quickly but you still end up with the old data in that table. As such could result in unwated data...or perhaps I read it incorrectly.
June 21, 2012 at 11:14 am
yb751 (6/21/2012)
GilaMonster (6/21/2012)
This actually sounds like a case for partitioning. Partition the main table, make sure the staging table is identical (indexes, constraints, columns, etc) then to move the data from staging into main you could just add a new partition to the main table and switch that new partiton with the staging table. Metadata operation, near instant.Just a quick question regarding this method. This loads the 'new' data into the main table quickly but you still end up with the old data in that table. As such could result in unwated data...or perhaps I read it incorrectly.
You would also swap out the old data. This means each set of data will have to be in its own partition.
June 21, 2012 at 11:22 am
Hmmmm...just quickly reading over msdn articles on switching it seems you can only switch a partition with an empty one. So then I would see it as a three partition cycle.
1. Staging -> Holding
2. Main -> Staging
3. Holding -> Main
4. Delete Staging
or perhaps simply
1. Main -> Holding
2. Staging -> Main
3. Delete Holding
Thanks folks, I'll have to do some more reading and testing but it sounds promising.
June 21, 2012 at 11:35 am
Typically it's done like this:
Load new data into staging
Add a new partition to main
Switch new partition in main and staging (now staging is empty)
Switch partition containing old data with staging
Merge empty partition in main table
Truncate staging.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
June 21, 2012 at 11:42 am
Even better...that's fantastic Gila.
Thank You
June 21, 2012 at 12:43 pm
Sorry one more question if I may.
What would be the best stategy for creating a new partition on the main table in such a way that I would not have to recreate the partition function and scheme every time I run the stored procedure?
June 21, 2012 at 1:18 pm
Been a while since I even created a partitioned table, but bassed on the information provided so far, I have absolutely no idea, not even a shot in the dark.
It would help if you would provide the current DDL (CREATE TABLE statement) for the table, including indexes.
June 21, 2012 at 1:22 pm
yb751 (6/21/2012)
Sorry one more question if I may.What would be the best stategy for creating a new partition on the main table in such a way that I would not have to recreate the partition function and scheme every time I run the stored procedure?
You can't, not easily. You could 'rotate' the partitions by using a contrived control value for them, but I don't recommend it as it defeats some of the optimization techniques used under the hood to help partitioning be seamless to the end user's speed.
You nearly always end up programatically rebuilding the partition function and scheme.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
June 21, 2012 at 1:31 pm
Evil Kraig F (6/21/2012)
yb751 (6/21/2012)
Sorry one more question if I may.What would be the best stategy for creating a new partition on the main table in such a way that I would not have to recreate the partition function and scheme every time I run the stored procedure?
You can't, not easily. You could 'rotate' the partitions by using a contrived control value for them, but I don't recommend it as it defeats some of the optimization techniques used under the hood to help partitioning be seamless to the end user's speed.
You nearly always end up programatically rebuilding the partition function and scheme.
But we can give you a better answer, if you provide us with more details.
Viewing 15 posts - 1 through 15 (of 23 total)
You must be logged in to reply to this topic. Login to reply