Need some help with a large data volume optimization

Question

Need some help with a large data volume optimization

Viewing 15 posts - 16 through 30 (of 72 total)

You must be logged in to reply to this topic. Login to reply

Paul White SSC Guru Points: 150468 More actions · Answer 1

Craig,

I've spent a good bit of time looking at the script and the execution plans. There's a fair bit to say about them at face value, but I'm reluctant to do so without a good feel for the overall process. It may be that there's something more fundamental to address.

Any chance you could give us a fairly brief and easy to grasp summary of the data load rules? For example, I'm unclear on what it is the source files contain, whether they need to be loaded once, or on an ongoing basis, whether they contain replacement records, deltas, aggregated records...or something else.

A more usual process is for the SSIS package to read the data from the files, match surrogate keys, perform validation, and then load an empty staging table. Your staging table appears to have lots of data in it...

Sorry if I'm being thick here.

Paul

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

WayneS SSC Guru Points: 95461 More actions · Answer 2

Craig,

When I opened the update plan, what immediately popped out at me is a missing index on the staging table with an impact of 80%!

The suggested index to make on this is:

/*

Missing Index Details from UpdateToPortSecAgg.sqlplan

The Query Processor estimates that implementing the following index could improve the query cost by 80.8588%.

*/

/*

USE [PHX_Cello]

GO

CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]

ON [dbo].[PortSecAgg_Staging] ([BatchGroup])

INCLUDE ([PortfolioID],[SecurityID],[Quantity],[SecTypeCode],[Symbol],[Cusip],[UnitCost],[TotalCost],[Price],[MarketValue],[AccruedInterest],[UnrealizedGL],[IsSubAccount],[AsOfDate],[IsShortPosition])

GO

*/

Seeing as this update is running for > 6 hrs, this would seems like something to try out.

Edit: Whoops, this plan is for the update in the cursor, not the initial 6 hour one. Still, it is taking about 8 minutes per batch - this index really ought to help out there.

Edit2: The delete plan has this suggested index:

/*

Missing Index Details from deleteFromPortSecAgg.sqlplan

The Query Processor estimates that implementing the following index could improve the query cost by 39.5086%.

*/

/*

USE [PHX_Cello]

GO

CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]

ON [dbo].[PortSecAgg_Staging] ([BatchGroup])

INCLUDE ([PortfolioID],[SecurityID],[PortfolioCode],[AsOfDate],[IsShortPosition])

GO

*/

Or, just add the PortfolioCode column to the include of the previously suggested index.

Edit3: and the insert has this suggested index, which would cover both of the above suggested indexes:

/*

Missing Index Details from InsertIntoPortSecAgg.sqlplan

The Query Processor estimates that implementing the following index could improve the query cost by 73.0801%.

*/

/*

USE [PHX_Cello]

GO

CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]

ON [dbo].[PortSecAgg_Staging] ([BatchGroup])

INCLUDE ([PortfolioID],[SecurityID],[PortfolioCode],[Quantity],[SecTypeCode],[Symbol],[Cusip],[UnitCost],[TotalCost],[Price],[MarketValue],[AccruedInterest],[UnrealizedGL],[IsSubAccount],[AsOfDate],[IsShortPosition])

GO

*/

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

WayneS SSC Guru Points: 95461 More actions · Answer 3

Craig,

Any chance of getting an execution plan on that first update (the 6 hour one)? Also, can we get updated Delete/Insert/Update plans after you implement the 3rd suggested index from my previous post?

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 4

Paul White NZ (12/11/2010)
Craig,
I've spent a good bit of time looking at the script and the execution plans. There's a fair bit to say about them at face value, but I'm reluctant to do so without a good feel for the overall process. It may be that there's something more fundamental to address.
Any chance you could give us a fairly brief and easy to grasp summary of the data load rules? For example, I'm unclear on what it is the source files contain, whether they need to be loaded once, or on an ongoing basis, whether they contain replacement records, deltas, aggregated records...or something else.

I doubt you're being thick.

I'll attempt for the easy to grasp summary. I'm not sure brief will be available. 🙂

We are integrating an existing financial tracking system (name withheld to protect my arse, I haven't read our actual contract) which is vendor driven into an online, client visible reporting system. Said reporting system ALSO already exists, and I'm playing bridge the gap.

To do this, you must understand a few things about the source system. The first and primary item is that to arrive at intelligent, and aggregated, results for the system, you need to go through three distinct softwares to arrive at the result our clients currently see when we run reports from the source. The first is the SQL DB, the second is the Application itself, and the third is yet another proprietary report building software lying on the top of that.

Reverse engineering at this point is considered too complex by the powers that be. So, for client position data, I need to load results from this reporting system. They dump out load files, and I eat them. This process is the core load of those files. SecByBatch is another file I have them produce, as a point in time snapshot of some auxilliary data.

The majority of the rest of the information I direct load from the SQL DB into this staging system. This staging system's purpose is to massage, clean, and associate the data before shipping to the actual reporting mechanism. We're using it to minimize the downtime of the reporting system while this data is massaged.

So, for example, I get the Portfolio table directly from the source. That may be changing shortly, too... Found another data inconsistency.

Now, to add to the issue, we do historical editing to correct for errors. Due to this, we will be getting yesterday's full data, and a delta for anyone who got an edit in the last 30 days. This results in me getting ~20 files a day for loading. These files, because of the limitations of the source proprietary reporting system, run for each specific day and produce a unique file.

Please note, the data I'm fighting with is not 20 days, but 360 of them. Optimizing for worst case scenario during the pre-load process. This is in case we lose a midpoint server over a weekend and I'm loading in 4 batches at a clip on Monday.

Now, you'll notice there's PortSecAgg_staging, and PortSecAgg. PortSecAgg is the primary storage table. It contains the day to day position aggregations for a specific portfolioID/securityID/IsShort combination. _staging is the drop location for all the said text files prior to manipulation.

The data pushed to _staging is a replacement delta for PortSecAgg. If a portfolio gets an item in any file for a specific PortfolioID on a day, we replace that day, as we cannot produce 'delete' deltas. In an effort to reduce churn, I setup an UpSert methodology. I may just make it a delete/insert methodology in the near future. It didn't give me much from a time perspective but it'll be easier to maintain, and require less lookbacks to _staging.

I think that covers most of it.

Of note, I've changed a portion of this process. The initial update that resets PortfolioID and SecurityID in PortSecAgg_staging is now done as a LOOKUP object in the SSIS package during the initial loads of the aggregation reports. I had not realized that the Lookup object would do parallel data lookups, and had thought they would be post-stream activities, thus not making a difference to the time of processing if I did it then, or after. I was incorrect, and Lutz helped me fix that mess.

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 5

WayneS (12/11/2010)
Craig,
Edit: Whoops, this plan is for the update in the cursor, not the initial 6 hour one. Still, it is taking about 8 minutes per batch - this index really ought to help out there.

Wayne, idx_PortSecAgg_staging_3 has a leading edge on BatchGroup. I had thought this would do the same. It's also only ran once to get the cursor's build.

Edit2: The delete plan has this suggested index:
Or, just add the PortfolioCode column to the include of the previously suggested index.

I must be having the world's biggest brainfart. Where did you find that?

It's interesting that it wants portfolio_code in that. I don't use it again after the initial updates to get portfolioID. What the devil does it think it's doing? I'll have to look into that on Monday.

Edit3: and the insert has this suggested index, which would cover both of the above suggested indexes:

:blink: At that point I could set my Clustered to BatchGroup and cover those three at once. My gods, is BatchGroup REALLY that intrusive? I've done stuff like this before and not had that kind of headache from it, just ends up as an attachment to the where clause.

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 6

WayneS (12/11/2010)
Craig,
Any chance of getting an execution plan on that first update (the 6 hour one)? Also, can we get updated Delete/Insert/Update plans after you implement the 3rd suggested index from my previous post?

I've removed the need for the first component via SSIS LOOKUP objects. Takes a little longer for the SSIS loading process to run, but not significantly.

On Monday, I'll add in the index and see what we end up with. You'll probably see results Tuesday unless the system suddenly takes off and flies for me. 🙂

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

WayneS SSC Guru Points: 95461 More actions · Answer 7

Craig Farrell (12/11/2010)
WayneS (12/11/2010)
Craig,
Edit: Whoops, this plan is for the update in the cursor, not the initial 6 hour one. Still, it is taking about 8 minutes per batch - this index really ought to help out there.
Wayne, idx_PortSecAgg_staging_3 has a leading edge on BatchGroup. I had thought this would do the same. It's also only ran once to get the cursor's build.
Edit2: The delete plan has this suggested index:
Or, just add the PortfolioCode column to the include of the previously suggested index.
I must be having the world's biggest brainfart. Where did you find that?

Just a W-A-G here, but since you're posting in a SQL 2005 forum, I'm guessing that you're also using SSMS 2005? Well, if you were to use SSMS 2008 (like I do...), when you look at an execution plan that has missing indexes, it shows up just under the query text - and in nice green text (like remarks in code blocks) to make it really visible. When I said it popped out at me, I wasn't kidding! They made it very obvious, and I really like it. (Take a look at the attached image!)

Also, take a look at those execution plans using a text editor (they are just plain ole XML files) - you'll see a <MissingIndexes> node, with data on what the missing index is.

It's interesting that it wants portfolio_code in that. I don't use it again after the initial updates to get portfolioID. What the devil does it think it's doing? I'll have to look into that on Monday.

It's used in the DELETE:

WHERE

psa_s.PortfolioCode IS NULL

And in the INSERT, it's one of the columns pulled from the SELECT

Edit3: and the insert has this suggested index, which would cover both of the above suggested indexes:
:blink: At that point I could set my Clustered to BatchGroup and cover those three at once. My gods, is BatchGroup REALLY that intrusive? I've done stuff like this before and not had that kind of headache from it, just ends up as an attachment to the where clause.

A CI on BatchGroup should work. Also note that the third suggested index will also cover the previous two (only the third suggested one is needed). I do think that the CI on BatchGroup would probably be better - try it out, and see what you get with just that as an index.

Edit: removed img link to attachment - it's too wide and was messing up displaying it in browsers.

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

Paul White SSC Guru Points: 150468 More actions · Answer 8

Craig,

Thanks for that, it helps a great deal. Just one more question before I get to the good stuff.

The delta/replacement files you get for historical edits: do they contain aggregates for everything (all portfolios and securities) for that day, or just the corrections? The question is: does the delta/replacement file replace the data for one complete AsOfDate or do you need to merge it in with existing data for that day?

Thanks

Paul

edit: Wayne, that query plan graphic really messes up the page - how about leaving it as an attachment?

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 9

Paul White NZ (12/11/2010)
Craig,
Thanks for that, it helps a great deal. Just one more question before I get to the good stuff.
The delta/replacement files you get for historical edits: do they contain aggregates for everything (all portfolios and securities) for that day, or just the corrections? The question is: does the delta/replacement file replace the data for one complete AsOfDate or do you need to merge it in with existing data for that day?
Thanks
Paul
edit: Wayne, that query plan graphic really messes up the page - how about leaving it as an attachment?

They contain aggregates for any portfolio deemed to have a change in any one of it's securities, it reproduces all the security aggregates for that day, for that portfolio.

It's a straight swap. I'm doing the three way to try to get more speed out of it.

Thanks again for looking through this Paul

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 10

WayneS (12/11/2010)
Just a W-A-G here, but since you're posting in a SQL 2005 forum, I'm guessing that you're also using SSMS 2005?

Dead on target.

Also, take a look at those execution plans using a text editor (they are just plain ole XML files) - you'll see a <MissingIndexes> node, with data on what the missing index is.

One official brainfart. Thanks, forgot it showed that for some reason, usually because I ignore them.

It's used in the DELETE:

WHERE

psa_s.PortfolioCode IS NULL

In the INSERT, it's one of the columns pulled from the SELECT

Knew I'd left something foolish in there. Another thing to clean up. I'd forgotten to swap that out when I changed to pre-processing them to having the ID's for a direct CI linkup with PortSecAgg.

A CI on BatchGroup should work. Also note that the third suggested index will also cover the previous two (only the third suggested one is needed). I do think that the CI on BatchGroup would probably be better - try it out, and see what you get with just that as an index.

My concern had been the connections against PortSecAgg from _staging. I'm still finding it interesting that the BG is causing such a problem.

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

WayneS SSC Guru Points: 95461 More actions · Answer 11

Paul White NZ (12/11/2010)
edit: Wayne, that query plan graphic really messes up the page - how about leaving it as an attachment?

Done, sorry about that!

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

Paul White SSC Guru Points: 150468 More actions · Answer 12

Hi Craig,

Looking at the execution plans you posted, there are a number of common reasons for the slow performance:

1. Full table scans

2. Large hash joins

3. Large sorts

4. Table spools

5. Memory usage

6. Fully-logged insert/update/delete operations

Let's look at each of these in turn.

Full table scans: This is probably the least serious issue. You seem to have decent hardware, and it wouldn't surprise me if you said the tables were entirely in memory. Covering indexes, as have been suggested, might speed things up slightly, but the amount of data produced is the key factor. Every scan (or potential seek) is producing somewhere around 1GB of data.

Large hash joins: Parallel hash join scales extremely well, but hash matching 1GB of data to another 1GB data on a four-part compound key is challenging work for any hardware. Hash join is also best when the build input is small; in your case the build and probe sides are of similar sizes.

Large sorts: Sorts are expensive. Each of the three plans has a multi-key sort on a very large input.

Table spools: Every plan also has an eager table spool, which stores its entire (huge) input to tempdb before replaying those rows to operators above it in the plan. These are required for 'halloween' protection because each plan is writing to PortSecAgg as well as reading from it.

Memory usage: Each plan has the same (huge) overall memory grant, which suggests that SQL Server is reaching the limit of the amount of memory it is able to grant to a single query. Sorts, hashes, bitmaps, and parallelism operators (particularly repartition streams) all require a memory grant. It's worse because you are running such a high degree of parallelism (24) - each instance of, say, the parallel sort requires memory, and repartition streams requires memory proportional to the square of the DOP. You should trace the Hash and Sort Warnings event classes in Profiler while running these queries - my guess is that you will see many warnings as operators run out of memory grant and spill to tempdb.

Fully-logged operations: Speaks for itself I guess, but fully logging ten million wide-row deletes and inserts is always going to hurt!

That's enough for one post, there'll be something a lot more positive in my next one 🙂

Paul

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Paul White SSC Guru Points: 150468 More actions · Answer 13

Ok, so those are the problems. The approach I would take solves those problems by not performing all that work at all.

Imagine if the only requirement was to load complete new days of aggregated data as fast as possible. The design would be very simple:

1. Truncate the existing staging table (with just a clustered index in place)

2. Read the input file using SSIS

3. Use an SSIS Lookup or Merge component to assign surrogate keys

4. Fast-load the empty clustered staging table (in clustered index order)

5. Create an empty new partition in the destination table

6. Switch the staging table data directly into the new partition

So, my recommendations are:

1. Partition PortSecAgg on AsOfDate (one day per partition)

2. Match the schema of PortSecAgg_Stage to PortSecAgg (data types and nullability)

3. Modify the SSIS package to produce data of the correct type and nullability

4. Modify the SSIS package to produce IsShortPosition and IsSubAccount as bit columns

5. Ensure the SSIS package produces data in clustered order

6. Use the TABLOCK and ORDER FastLoadOptions of the OLEDB destination

The objective is to achieve a minimally-logged fast load of the staging table. Creating a new partition, and switching the new data in will then be practically instantaneous. This leaves you with an empty staging table, ready for the next full-day load file.

Let's look now at loading the delta/replacement files - the ones that contain a full set of data for one portfolio for one particular day. To handle these:

1. Bulk-load the staging table with minimal logging in much the same way as for a full-day file.

2. Create a new empty table, T1, that matches the schema of PortSecAgg.

3. Switch the partition containing the target day out into T1.

4. Use fully-logged DELETE to remove rows from T1 for the single portfolio we are loading.

5. Use fully-logged INSERT to copy new rows from the staging table to T1.

6. Switch T1 back into PortSecAgg.

Steps 4 and 5 will be fully-logged, but the number of rows will be relatively small, and so should complete quickly.

Paul

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Paul White SSC Guru Points: 150468 More actions · Answer 14

Demonstration of instant loading for a whole new day of data:

SET ARITHABORT ON

SET CONCAT_NULL_YIELDS_NULL ON

SET QUOTED_IDENTIFIER ON

SET ANSI_NULLS ON

SET ANSI_PADDING ON

SET ANSI_WARNINGS ON

SET NUMERIC_ROUNDABORT OFF

GO

USE master;

DROP DATABASE Craig;

GO

CREATE DATABASE Craig;

ALTER DATABASE Craig

MODIFY FILE (name = 'Craig', size = 2GB, filegrowth = 256MB);

ALTER DATABASE Craig

MODIFY FILE (name = 'Craig_log', size = 128MB, filegrowth = 64MB);

GO

USE Craig;

GO

-- Partition function

CREATE PARTITION FUNCTION PF (DATETIME)

AS RANGE

RIGHT

FOR VALUES

(

'2010-12-01', '2010-12-02', '2010-12-03', '2010-12-04', '2010-12-05', '2010-12-06', '2010-12-07',

'2010-12-08', '2010-12-09', '2010-12-10', '2010-12-11', '2010-12-12', '2010-12-13', '2010-12-14',

'2010-12-15', '2010-12-16', '2010-12-17', '2010-12-18', '2010-12-19', '2010-12-20', '2010-12-21',

'2010-12-22', '2010-12-23', '2010-12-24', '2010-12-25', '2010-12-26', '2010-12-27', '2010-12-28'

);

GO

-- Partition scheme, all on the same file group

CREATE PARTITION SCHEME PS

AS PARTITION PF

ALL TO ([PRIMARY]);

GO

-- Main table - partitioned

CREATE TABLE dbo.PortSecAgg

(

AsOfDate DATETIME NOT NULL,

PortfolioID INTEGER NOT NULL,

SecurityID INTEGER NOT NULL,

IsShortPosition BIT NOT NULL,

PortfolioCode VARCHAR(10) NULL,

Symbol VARCHAR(20) NULL,

Quantity DECIMAL(18,4) NULL,

SecTypeCode VARCHAR(5) NULL,

Cusip VARCHAR(15) NULL,

UnitCost MONEY NULL,

TotalCost MONEY NULL,

Price MONEY NULL,

MarketValue MONEY NULL,

AccruedInterest MONEY NULL,

UnrealizedGL MONEY NULL,

IsSubAccount BIT NULL,

CONSTRAINT

[PK dbo.PortSecAgg AsOfDate, PortfolioID, SecurityID, IsShortPosition]

PRIMARY KEY CLUSTERED

(

AsOfDate ASC,

PortfolioID ASC,

SecurityID ASC,

IsShortPosition ASC

)

ON PS (AsOfDate);

GO

-- 1 million rows of nonsense data (takes about 7s)

WITH Numbers (n)

AS (

SELECT TOP (1000000)

ROW_NUMBER() OVER (ORDER BY (SELECT 0))

FROM master.sys.columns C1,

master.sys.columns C2,

master.sys.columns C3

)

INSERT dbo.PortSecAgg

WITH (TABLOCKX)

(PortfolioID, SecurityID, AsOfDate, IsShortPosition)

SELECT PortfolioID = N.n,

SecurityID = N.n,

AsOfDate = DATEADD(DAY, N.n % 28, '2010-12-01'),

IsShortPosition = CONVERT(BIT, CASE WHEN RAND(CHECKSUM(NEWID())) < 0.3 THEN 1 ELSE 0 END)

FROM Numbers N;

GO

-- Unpartitioned table which SSIS will load with a new day of data

-- Created on the same file group as the partitioned table to avoid data movement

CREATE TABLE dbo.PortSecAgg_NewDay

(

AsOfDate DATETIME NOT NULL,

PortfolioID INTEGER NOT NULL,

SecurityID INTEGER NOT NULL,

IsShortPosition BIT NOT NULL,

PortfolioCode VARCHAR(10) NULL,

Symbol VARCHAR(20) NULL,

Quantity DECIMAL(18,4) NULL,

SecTypeCode VARCHAR(5) NULL,

Cusip VARCHAR(15) NULL,

UnitCost MONEY NULL,

TotalCost MONEY NULL,

Price MONEY NULL,

MarketValue MONEY NULL,

AccruedInterest MONEY NULL,

UnrealizedGL MONEY NULL,

IsSubAccount BIT NULL,

CONSTRAINT

[PK dbo.PortSecAgg_NewDay AsOfDate, PortfolioID, SecurityID, IsShortPosition]

PRIMARY KEY CLUSTERED

(

AsOfDate ASC,

PortfolioID ASC,

SecurityID ASC,

IsShortPosition ASC

)

ON [PRIMARY];

GO

-- Add a constraint to the new day table so SQL Server

-- can be sure the new data is all valid for the destination partition

-- (the

ALTER TABLE dbo.PortSecAgg_NewDay

WITH CHECK

ADD CONSTRAINT [CK dbo.PortSecAgg_NewDay Range]

CHECK (

AsOfDate = '2010-12-29'

);

GO

-- Simulate SSIS loading data for 29th Dec into PortSecAgg_NewDay

-- 350,000 nonsense records (about 4s)

WITH Numbers (n)

AS (

SELECT TOP (350000)

ROW_NUMBER() OVER (ORDER BY (SELECT 0))

FROM master.sys.columns C1,

master.sys.columns C2,

master.sys.columns C3

)

INSERT dbo.PortSecAgg_NewDay

WITH (TABLOCKX)

(PortfolioID, SecurityID, AsOfDate, IsShortPosition)

SELECT PortfolioID = N.n,

SecurityID = N.n,

AsOfDate = '2010-12-29',

IsShortPosition = CONVERT(BIT, CASE WHEN RAND(CHECKSUM(NEWID())) < 0.3 THEN 1 ELSE 0 END)

FROM Numbers N;

GO

-- Show data and partition information before we load the new day

SELECT PSA.AsOfDate,

PartitionNumber = $PARTITION.PF(PSA.AsOfDate),

RecordCount = COUNT_BIG(*)

FROM dbo.PortSecAgg PSA

GROUP BY

PSA.AsOfDate

ORDER BY

PSA.AsOfDate;

GO

-- Set the file group to use for the new partition (instant)

ALTER PARTITION SCHEME PS NEXT USED [PRIMARY];

GO

-- Create a new partition (instant)

ALTER PARTITION FUNCTION PF()

SPLIT RANGE ('2010-12-29');

GO

-- Switch the new data into the destination table (instant)

ALTER TABLE dbo.PortSecAgg_NewDay

SWITCH

TO dbo.PortSecAgg

PARTITION 30;

GO

-- Notice new partition (30) with 350,000 rows of data

SELECT PSA.AsOfDate,

PartitionNumber = $PARTITION.PF(PSA.AsOfDate),

RecordCount = COUNT_BIG(*)

FROM dbo.PortSecAgg PSA

GROUP BY

PSA.AsOfDate

ORDER BY

PSA.AsOfDate;

GO

-- Show the new day table is empty

SELECT *

FROM dbo.PortSecAgg_NewDay;

GO

Paul

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Paul White SSC Guru Points: 150468 More actions · Answer 15

Demonstration of handling a refresh for a particular portfolio for a day:

(uses the same database in the state it was at the end of the last demo)

-- ==========================

-- Demo modifying a partition

-- ==========================

--

-- Unpartitioned table that will hold the partition to modify

CREATE TABLE dbo.PortSecAgg_Partition

(

AsOfDate DATETIME NOT NULL,

PortfolioID INTEGER NOT NULL,

SecurityID INTEGER NOT NULL,

IsShortPosition BIT NOT NULL,

PortfolioCode VARCHAR(10) NULL,

Symbol VARCHAR(20) NULL,

Quantity DECIMAL(18,4) NULL,

SecTypeCode VARCHAR(5) NULL,

Cusip VARCHAR(15) NULL,

UnitCost MONEY NULL,

TotalCost MONEY NULL,

Price MONEY NULL,

MarketValue MONEY NULL,

AccruedInterest MONEY NULL,

UnrealizedGL MONEY NULL,

IsSubAccount BIT NULL,

CONSTRAINT

[PK dbo.PortSecAgg_Partition AsOfDate, PortfolioID, SecurityID, IsShortPosition]

PRIMARY KEY CLUSTERED

(

AsOfDate ASC,

PortfolioID ASC,

SecurityID ASC,

IsShortPosition ASC

)

ON [PRIMARY];

GO

-- Move data for 12 December (partition 13)

-- This is instantaneous as usual

ALTER TABLE dbo.PortSecAgg

SWITCH PARTITION 13

TO dbo.PortSecAgg_Partition;

GO

-- Delete some data (21,429 rows)

-- < 1 second

DELETE dbo.PortSecAgg_Partition

WITH (TABLOCKX)

WHERE PortfolioID < 600000;

GO

-- Insert some new data (a portfolio day refresh)

-- 25,000 rows for portfolio 5001 (again < 1s)

WITH Numbers (n)

AS (

SELECT TOP (25000)

ROW_NUMBER() OVER (ORDER BY (SELECT 0))

FROM master.sys.columns C1,

master.sys.columns C2,

master.sys.columns C3

)

INSERT dbo.PortSecAgg_Partition

(PortfolioID, SecurityID, AsOfDate, IsShortPosition)

SELECT 5001, N.n, '2010-12-12', 0

FROM Numbers N;

GO

-- Add a table constraint (instant)

ALTER TABLE dbo.PortSecAgg_Partition

WITH CHECK

ADD CONSTRAINT [CK PortSecAgg_Partition Range]

CHECK (

AsOfDate = '2010-12-12'

);

GO

-- Move the data back (instant)

ALTER TABLE dbo.PortSecAgg_Partition

SWITCH

TO dbo.PortSecAgg

PARTITION 13;

GO

-- Show the number of records for the portfolio we inserted

-- in place in the PortSecAgg table

SELECT RecordCount = COUNT_BIG(*)

FROM dbo.PortSecAgg PSA

WHERE PSA.AsOfDate = '2010-12-12'

AND PSA.PortfolioID = 5001;

GO

-- Partition table is empty

SELECT *

FROM dbo.PortSecAgg_Partition;

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi