June 13, 2009 at 10:52 am
Ok folks... NOW you can test for performance because NOW you can generate substantial amounts of test data quite easily. The following code creates a million rows of randomized but date ordered data in about 30 seconds on my 7 year old desktop... it should fly on your's.
[font="Courier New"]--===== 3. Creates a TestCurrentPurchases table with a clustered primary key index on CurrentPurchaseID.
-- This table has a row for each ProductTypeID (1:1 relationship). Each row contains calculations
-- of new purchase data on old purchase data, like cost difference, time in ms since last purchase,
-- and a running total of the cost of all purchases for that product type.
--===== Conditionally drop the test table
IF OBJECT_ID('dbo.TestCurrentPurchases','U') IS NOT NULL
DROP TABLE dbo.TestCurrentPurchases
--===== Create the skeleton of the table
CREATE TABLE dbo.TestCurrentPurchases
(
CurrentPurchaseID INT IDENTITY(1,1) PRIMARY KEY,
ProductTypeID INT NOT NULL,
PurchaseDateTime DATETIME NOT NULL,
Cost MONEY NULL,
CostDifference MONEY NULL,
TimeBetweenPurchasesOfProductType INT NULL,
RunningTotalCost MONEY NULL
)
--===== Define some obviously named variables
DECLARE @StartDate DATETIME,
@EndDate DATETIME,
@Days INT
--===== Presets
SELECT @StartDate = '2008-01-01',
@EndDate = '2008-12-31',
@Days = DATEDIFF(dd,@StartDate,@EndDate) + 1
--===== Insert a million rows of data ordered by PurchaseDateTime
INSERT INTO dbo.TestCurrentPurchases
(ProductTypeID, PurchaseDateTime, Cost)
SELECT ProductTypeID, PurchaseDateTime, Cost
FROM (--==== Derived table creates the data so we can sort it later with speed
SELECT TOP 1000000
ABS(CHECKSUM(NEWID())) %20 +1 AS ProductTypeID,
RAND(CHECKSUM(NEWID())) *@Days + @StartDate AS PurchaseDateTime,
CAST(RAND(CHECKSUM(NEWID())) *100 + .01 AS DECIMAL(9,2)) AS Cost
FROM Master.dbo.SysColumns sc1
CROSS JOIN Master.dbo.SysColumns sc2
) d
ORDER BY d.PurchaseDateTime
[/font]
--Jeff Moden
Change is inevitable... Change for the better is not.
June 13, 2009 at 11:01 am
I guess the next question would be, how "real time" does the running total thingy need to be? We might be able to eliminate the need for the cursor altogether if the answer is "we don't need it to be real time, just quick when we need it".
--Jeff Moden
Change is inevitable... Change for the better is not.
June 13, 2009 at 12:01 pm
OH... YES, I need this real-time. But, by real-time I mean I need about 2500 new rows coming in per second, more would be grand. My data provider can only provide about 2500 rows per second (including data bursts). But, luckily, these are blocks of (calculating conversion from prod to this hypothetical system in my head) 200 rows per block. So, 2500 rows/200 rows per block = 12.5 actual needed updates and update trigger executions per second. Hopefully this will put me at about 1/12.5 = 0.08s per block update/insert. 80ms per trigger update / insert isn't too fast is it, wait, nevermind that's synchronous / sequential execution... The jobs make this asynchronous / concurrent. That means that if every job is running in a cycle of 10 seconds, I have 10 seconds available to complete each individual update/insert.
The old (before I created the trigger) block inserts were happening at about 350ms, since it was concurrent with the other 50 jobs, it didn't really matter. It could complete all 50 jobs, running at 350ms each, in about half a second. That, and the LOCKING PROBLEM didn't pop up (and by pop-up I mean amplify the latency 100 fold) since it was fast enough.
But I know the locking problem is there, regardless. I watch the timestamps stairstep by a sequential quanta as all of the 50 jobs execute concurrently starting at the same time. Every data block's latency shouldn't increase linearly in a concurrent system on each cycle of the 50 jobs' 10 second recurrence cycles. I at least would expect all 8 of my processing cores to be used at the same time, where at least 8 blocks of data would execute NEARLY at the same time. But this isn't the case. The latency (time-slice information) increases almost linearly from block to consecutive block, so there's got to be some resource locking that's forcing asynchronous execution at some level. And that's what I'm trying to find.
-- On to better things.
That data injector is awesome. I considered using the forum etiquette faq to do the same, but you beat me to it!!! After writing all of that other code, _I_ was being lazy. This is excellent!
I'm going to plug this and convert it a bit for the test UPDATES after the trigger is created in my mangled code (I think I'll use the attachment instead of the copy/paste). Then I'll try a few of the different configuration based on above input. I'll grab some statistically timed numbers too, possibly post them on here.
I too am curious to see that 50 second latency in prod drop below a second again.
June 13, 2009 at 12:27 pm
In that case, I'd add the appropriate indexes to the million row test tables and try the trigger that Paul wrote.
--Jeff Moden
Change is inevitable... Change for the better is not.
June 13, 2009 at 6:46 pm
Jeff Moden (6/13/2009)
Ok folks... NOW you can test for performance because NOW you can generate substantial amounts of test data quite easily. The following code creates a million rows of randomized but date ordered data in about 30 seconds on my 7 year old desktop... it should fly on yours.
Ah! Jeff "million row" Moden strikes again!!! 😀 😀 😀
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
June 13, 2009 at 7:56 pm
Aaron (6/13/2009)PS: Why go through all of the trouble creating this code and formatting it if it's just going to get hosed. I attached a text file with the formatting so it doesn't look as ugly as above.
BWAA-HAAA!!!! I've been bitchin' to management about that very same problem for months, now. I thought they'd fix it when I said "They either seem unwilling or unable to make it work correctly" but to no avail. I think they have it setup to work nice and pretty with FireFox, but not IE.
Yes, under FireFox the SQL looks well formated but has a bizarre formating under IE
SQL = Scarcely Qualifies as a Language
June 13, 2009 at 9:24 pm
The verdict is, 50 second delays has now turned into 28 seconds with the FROM / JOINS fixed in the trigger.
Unless the trigger is super costly, I'm still leaning towards isolation /locking as being the issue. How do I diminish the table lock (if it exists) to a row lock in the UPDATE in the trigger above, instead; or alter the ISOLATION LEVEL?
-- A
June 13, 2009 at 10:02 pm
Aaron,
In the simulations I have run, there are a number of small changes which need to be made to get good performance - the most important of which is a new index.
However, having reviewed your previous posts, I would like to clarify a couple of things before going into detail:
My current understanding is that there are two parts to the problem. First, you need to add differentials to an existing 50-million-row Purchases table. Secondly, you wish to maintain a real-time 5000-row table of running totals per ProductTypeID. Is this correct? I must admit that I am slightly confused about the jobs, where they source their data from, and what the need is for the OUTPUT clause...
I would really like to fix the whole problem once and for all. To that end, it would help me enormously if you could correct my current understanding where necessary, and perhaps provide a very brief and clear explanation of what you have right now, and what you need as the end result.
Given a clear and short explanation of what you have and what you want, I am confident that between us we can provide a comprehensive solution.
Paul
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
June 14, 2009 at 12:26 am
Has the code for these Jobs been posted yet? If so I must have missed it. If not, then we definitely need to see it.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
June 14, 2009 at 7:11 am
Paul White has posted his understanding of the problem and I am also waiting for confirmation or revision. I have these additional questions:
Are the provided table definition for the two tables complete ? If no, please provide the complete table definition.
Have all index definitions been provided ?
Can the table definition be changed?
For the updates, does each job issue multiple single row updates or a single multiple row update? The example you provided is a single multiple row update:
UPDATE dbo.TestCurrentPurchases
SET
ProductTypeID = Src.ProductTypeID,
PurchaseDateTime = Src.PurchaseDateTime,
Cost = Src.Cost
FROM dbo.TestCurrentPurchases Trg
INNER JOIN
(
SELECT 1 [ProductTypeID], '6/2/2009 11:40:00.000' [PurchaseDateTime], 2.50 [Cost]
UNION
SELECT 2 [ProductTypeID], '6/2/2009 11:47:00.000' [PurchaseDateTime], 1.75 [Cost]
UNION
SELECT 3 [ProductTypeID], '6/2/2009 11:51:00.000' [PurchaseDateTime], 13.39 [Cost]
UNION
SELECT 4 [ProductTypeID], '6/2/2009 12:11:00.000' [PurchaseDateTime], 0.69 [Cost]
UNION
SELECT 5 [ProductTypeID], '6/2/2009 13:39:00.000' [PurchaseDateTime], 83.49 [Cost]
UNION
SELECT 6 [ProductTypeID], '6/2/2009 17:39:00.000' [PurchaseDateTime], 16.00 [Cost]
) Src ON Src.ProductTypeID = Trg.ProductTypeID
GO
SQL = Scarcely Qualifies as a Language
June 14, 2009 at 9:02 am
Aaron (6/13/2009)
The verdict is, 50 second delays has now turned into 28 seconds with the FROM / JOINS fixed in the trigger.Unless the trigger is super costly, I'm still leaning towards isolation /locking as being the issue. How do I diminish the table lock (if it exists) to a row lock in the UPDATE in the trigger above, instead; or alter the ISOLATION LEVEL?
-- A
Uh huh... and what are the indexes that you've added and what does the 28 second code look like for sure, now? And, how many rows did you test against? The million row example or your 50 million row production data? Remember, we can't see what you see unless you provide it here.
--Jeff Moden
Change is inevitable... Change for the better is not.
June 14, 2009 at 3:38 pm
You might want to try disabling page locking on the clustered index. From what I understand you only have 5000 records in the table and each job updates their own rows. If sql server used page locks, when each job will lock rows for other jobs. I believe that in your scenario you can get better performance if sql server uses rowlocks
/Michael
June 14, 2009 at 5:21 pm
I arrived at this when modifying the allow_row_locks, allow_page_locks, and configuring multiple "WITH " clauses on the updates to use ROWLOCKs, PAGLOCKs, and TABLOCKs... all while watching in SQL Profiler using the TSQL_Lock template.
PAGLOCKs definately exacerbated the deadlocks and lock:timeouts. The biggest problem is the UPDATE in the update trigger, a deadlock usually happens after that UPDATE executes.
I'm closing this thread. Either I scatter the updates a bit, disable locking using "set trace 1211", or rewrite the architecture to dynamically create a "Current" table for each thread/job. My last and final option is to rewrite this entire solution in a multiprogrammed .NET app (which I want to avoid).
Thank you all for your help.
-- Aaron
June 14, 2009 at 6:21 pm
Well that's a shame. I spent some time last night creating a solution for this which worked well. The final stage just involved two questions:
1. Is the table that contains the most recent purchase essential to the application - or can I lose it and just update the main table in-place?
2. Can each job insert multiple rows for the same product type ID in the same statement? This complicates the task a little if so, but it is still workable.
My in-progress solution updates all the running totals and deltas in a fraction of a second, so I had high hopes for it.
Oh well. :unsure:
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
June 14, 2009 at 6:23 pm
I still can't help thinking that doing the trigger logic as part of (or before) the Insert, partitioning the table and having a clustered index that keeps the Product ID's handled by each job physically adjacent would probably fix this problem.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
Viewing 15 posts - 16 through 30 (of 44 total)
You must be logged in to reply to this topic. Login to reply