January 25, 2015 at 9:26 am
Hi,
I am trying to build a salable proc which does the computation logic and currently i have 50k user. I am thinking in future perspective and targeting this logic for 500k users.
Sample table schema and test data
create table Comp_Detail(IDcompDetail int identity(1,1) primary key,IDcompAFR int,CompID int default 1050,UserId bigint,
TransferAmount money, processStatus bit default 0,AmtTransferDate datetime);
--===== Create and populate the balance table on-the-fly
;WITH
cteRowSource1 AS
(
SELECT TOP 500000
N = ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM master.sys.all_columns ac1
CROSS JOIN master.sys.all_columns ac2
)
INSERT into Comp_Detail(IDcompAFR,CompID,UserId,TransferAmount)
SELECT 1000, 1050,
UserId = ISNULL(N,0)
,TransferAmount = N*10
FROM cteRowSource1
-- select * from Comp_Detail
--===== Create and populate the balance table on-the-fly
Create table User_bank(IDUserBank int identity(1,1) primary key, UserId bigint,Amount_Pend money,Amount_Available money,
LastModDt datetime);
;WITH
cteRowSource AS
(
SELECT TOP 500000
N = ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM master.sys.all_columns ac1
CROSS JOIN master.sys.all_columns ac2
)
Insert into User_bank(UserId,Amount_Pend,Amount_Available)
SELECT UserId = ISNULL(N,0)
,PendingAmount = N*10
,AvailableAmount = N*2
FROM cteRowSource
;-- select * from member_balance;
update Comp_Detail set IDcompAFR = 1001 where IDcompDetail > 10000 and IDcompDetail < 20000 ;
update Comp_Detail set IDcompAFR = 1002 where IDcompDetail > 20000 and IDcompDetail < 30000;
update Comp_Detail set IDcompAFR = 1003 where IDcompDetail > 30000 and IDcompDetail < 40000;
update Comp_Detail set IDcompAFR = 1004 where IDcompDetail > 40000 and IDcompDetail <50000;
update Comp_Detail set IDcompAFR = 1005 where IDcompDetail > 50000 and IDcompDetail < 60000;
My build my logic as below.
Declare @CompID int = 1050;
BEGIN
-- Check if any data available to be processed
IF EXISTS (
SELECT TOP 1 IDcompAFR
FROM Comp_Detail
WHERE CompID = @CompID
AND coalesce(processStatus, 0) = 0
)
BEGIN
BEGIN TRY
-- Set it so if the first UPDATE fails, we won't even start the second update.This really says "If we're in a transaction
-- and something fails, stop processing the transaction and do a rollback if we can".
SET XACT_ABORT ON;
-- temp variable to hold the actual data. this will be used to get IDcompAFR once the balance updated
DECLARE @ActualData TABLE (
UserId BIGINT
,IDcompAFR BIGINT
,ProcessingAmount MONEY
);
-- table variable to capture the Affected UserId's
DECLARE @AffecedRecords TABLE (UserId BIGINT);
BEGIN TRANSACTION;
-- Get the whole data to be processed.
INSERT INTO @ActualData (
UserId
,IDcompAFR
,ProcessingAmount
)
SELECT UserId
,IDcompAFR
,ProcessingAmount = COALESCE(TransferAmount, 0)
FROM Comp_Detail
WHERE CompID = @CompID
AND coalesce(processStatus, 0) = 0
;
-- Aggregare the ProcessingAmount based on UserId
WITH AggregateData
AS (
SELECT UserId
,ProcessingAmount = SUM(COALESCE(ProcessingAmount, 0))
FROM @ActualData
GROUP BY UserId
)
--Do the Amount update and capture the UserId that are affected.
UPDATE UB
SET UB.Amount_Available = COALESCE(UB.Amount_Available, 0) + AD.ProcessingAmount
,UB.Amount_Pend = COALESCE(UB.Amount_Pend, 0) - AD.ProcessingAmount
,LastModDt = getdate()
OUTPUT deleted.UserId
INTO @AffecedRecords(UserId)
FROM User_bank UB
INNER JOIN AggregateData AD ON UB.UserId = AD.UserId;
--===== Using the captured UserId get the IDcompAFR from @ActualData temp variable
-- and then update the processStatus = 1
--- means OFR processed for the trip .
UPDATE Comp_Detail
SET processStatus = 1
,AmtTransferDate = getdate()
WHERE IDcompAFR IN (
SELECT DISTINCT AD.IDcompAFR
FROM @ActualData AD
INNER JOIN @AffecedRecords AR ON (AD.UserId = AR.UserId)
)
AND processStatus = 0;
COMMIT TRANSACTION;
END TRY
BEGIN CATCH
DECLARE @ErrorMessage NVARCHAR(4000);
DECLARE @ErrorSeverity INT;
DECLARE @ErrorState INT;
SELECT @ErrorMessage = ERROR_MESSAGE()
,@ErrorSeverity = ERROR_SEVERITY()
,@ErrorState = ERROR_STATE();
ROLLBACK TRANSACTION;
RAISERROR (
@ErrorMessage
,@ErrorSeverity
,@ErrorState
);
END CATCH;
END
END
GO
the query logic takes 20 + minutes and it keeps on running. not sure what mistake i did. any suggestion to improve the speed of this query
January 25, 2015 at 10:30 am
First, thank you for taking the time to build the test data. That will really help.
It would be helpful if, instead of trying to figure out what the overall goal of your code is, if you gave us an overview of the requirements you're trying to solve.
You're using Table Variables to handle a couple of half million row tables. That, in itself, will be a performance problem because SQL Server will not create statistics on such things nor can you directly create any useful indexes. Only those indexes automatically created to support unique constraints can be realized.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 25, 2015 at 10:44 am
thank you jeff,
from your comments, on what column should i need to create index?
Also, the requirement is, i have a process which runs once in a month and takes records from comp_details table to calculate the amount to be applied on user_bank table for the users and make the ProcessStatus = 1 for the amount applied for the users.
If temp variable doesn't a fit for this logic then what should i do? It would be great if you could help me changing the logic.
Even i tried to create Index on
CREATE NONCLUSTERED INDEX CI_compID ON [dbo].[Comp_Detail] ([CompID]);
CREATE NONCLUSTERED INDEX CI_processStatus ON [dbo].[Comp_Detail] ([processStatus]);
for 100k records it took 6 seconds, 200k it took 13 seconds,300k took 29 seconds, 400k 42 seconds and 500k took 54 seconds.
Is there any other way still i could improve the speed?
January 25, 2015 at 1:06 pm
Something is going on here. Using your test data and your code as you posted it, the code runs in 23 seconds. Is it this test code that you posted that's taking more than 20 minutes?
--Jeff Moden
Change is inevitable... Change for the better is not.
January 25, 2015 at 1:13 pm
Hi Jeff,
thanks for your reply.
I don't know what has happened. Initially it took 23 minutes. but now 500k took 54 seconds. Is thee any way to improve this please.
January 25, 2015 at 2:20 pm
Hi Jeff,
Quick Update,
I remove using the temp variable and used temp table instead. It took 31 seconds. Do you have any suggestion to spreed up still?
my latest try,
Declare @CompID int = 1050;
BEGIN
-- Check if any data available to be processed
IF EXISTS (
SELECT TOP 1 IDcompAFR
FROM Comp_Detail
WHERE CompID = @CompID
AND coalesce(processStatus, 0) = 0
)
BEGIN
BEGIN TRY
-- Set it so if the first UPDATE fails, we won't even start the second update.This really says "If we're in a transaction
-- and something fails, stop processing the transaction and do a rollback if we can".
SET XACT_ABORT ON;
--Create a table to remember the rows we updated.
IF OBJECT_ID('tempdb..#ActualData') IS NOT NULL
BEGIN
DROP TABLE #ActualData;
END
IF OBJECT_ID('tempdb..#AffecedRecords') IS NOT NULL
BEGIN
DROP TABLE #AffecedRecords;
END
CREATE TABLE #ActualData(UserId BIGINT
,IDcompAFR BIGINT
,ProcessingAmount MONEY);
Create table #AffecedRecords (UserId BIGINT);
-- temp variable to hold the actual data. this will be used to get IdcompanyOFR once the balance updated
--DECLARE @ActualData TABLE (
--UserId BIGINT
--,IDcompAFR BIGINT
--,ProcessingAmount MONEY
--);
-- table variable to capture the Affected UserId's
--DECLARE @AffecedRecords TABLE (UserId BIGINT);
BEGIN TRANSACTION;
-- Get the whole data to be processed.
INSERT INTO #ActualData (
UserId
,IDcompAFR
,ProcessingAmount
)
SELECT UserId
,IDcompAFR
,ProcessingAmount = COALESCE(TransferAmount, 0)
FROM Comp_Detail
WHERE CompID = @CompID
AND coalesce(processStatus, 0) = 0
;
-- Aggregare the ProcessingAmount based on UserId
WITH AggregateData
AS (
SELECT UserId
,ProcessingAmount = SUM(COALESCE(ProcessingAmount, 0))
FROM #ActualData
GROUP BY UserId
)
--Do the balance update and capture the UserId that are affected.
UPDATE UB
SET UB.Amount_Available = COALESCE(UB.Amount_Available, 0) + AD.ProcessingAmount
,UB.Amount_Pend = COALESCE(UB.Amount_Pend, 0) - AD.ProcessingAmount
,LastModDt = getdate()
OUTPUT deleted.UserId
INTO #AffecedRecords(UserId)
FROM User_bank UB
INNER JOIN AggregateData AD ON UB.UserId = AD.UserId;
--===== Using the captured UserId get the IDcompAFR from @ActualData temp variable
-- and then update the processStatus = 1
--- means OFR processed for the trip .
UPDATE Comp_Detail
SET processStatus = 1
,AmtTransferDate = getdate()
WHERE IDcompAFR IN (
SELECT DISTINCT AD.IDcompAFR
FROM #ActualData AD
INNER JOIN #AffecedRecords AR ON (AD.UserId = AR.UserId)
)
AND processStatus = 0;
COMMIT TRANSACTION;
END TRY
BEGIN CATCH
DECLARE @ErrorMessage NVARCHAR(4000);
DECLARE @ErrorSeverity INT;
DECLARE @ErrorState INT;
DROP TABLE #ActualData;
DROP TABLE #AffecedRecords;
SELECT @ErrorMessage = ERROR_MESSAGE()
,@ErrorSeverity = ERROR_SEVERITY()
,@ErrorState = ERROR_STATE();
ROLLBACK TRANSACTION;
RAISERROR (
@ErrorMessage
,@ErrorSeverity
,@ErrorState
);
END CATCH;
END
END
GO
--select * from Comp_Detail
--select * from User_bank
January 25, 2015 at 5:48 pm
I think this does basically the same thing but runs in about 6 seconds on my SQL 2012 box.
Declare @CompID int = 1050
,@ProcessingDate DATETIME = GETDATE();
BEGIN
-- Check if any data available to be processed
IF EXISTS (
SELECT 1
FROM Comp_Detail
WHERE CompID = @CompID
AND coalesce(processStatus, 0) = 0
)
BEGIN
BEGIN TRY
BEGIN TRANSACTION;
-- Aggregare the ProcessingAmount based on UserId
WITH AggregateData
AS (
SELECT UserId
,ProcessingAmount = SUM(COALESCE(TransferAmount, 0))
FROM Comp_Detail
WHERE CompID = @CompID
AND coalesce(processStatus, 0) = 0
GROUP BY UserId
)
--Do the Amount update
UPDATE UB
SET UB.Amount_Available = COALESCE(UB.Amount_Available, 0) + AD.ProcessingAmount
,UB.Amount_Pend = COALESCE(UB.Amount_Pend, 0) - AD.ProcessingAmount
,LastModDt = @ProcessingDate
FROM User_bank UB
INNER JOIN AggregateData AD ON UB.UserId = AD.UserId;
UPDATE a
SET processStatus = 1
,AmtTransferDate = @ProcessingDate
FROM Comp_Detail a
JOIN User_bank b ON a.Userid = b.Userid
WHERE CompID = @CompID AND coalesce(a.processStatus, 0) = 0 AND LastModDT = @ProcessingDate
COMMIT TRANSACTION
END TRY
BEGIN CATCH
DECLARE @ErrorMessage NVARCHAR(4000);
DECLARE @ErrorSeverity INT;
DECLARE @ErrorState INT;
SELECT @ErrorMessage = ERROR_MESSAGE()
,@ErrorSeverity = ERROR_SEVERITY()
,@ErrorState = ERROR_STATE();
ROLLBACK TRANSACTION;
RAISERROR (
@ErrorMessage
,@ErrorSeverity
,@ErrorState
);
END CATCH;
END
END
A couple of notes:
- My logic is based on what I think is going on. Your original loop wouldn't run in a period of time I was willing to wait for to check the final results.
- This bit in the WHERE clause (both of my queries)
coalesce(a.processStatus, 0) = 0
is not SARGable, so that could be improved on by making the processStatus column NOT NULL.
- Indexing might also help a bit but since 6 seconds is already quite a bit faster than the 24 I'm hearing above, maybe that will take care of it for you. Assuming my logic is correct that is.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
January 25, 2015 at 5:55 pm
According to the Execution Plan, the following INDEX would help the second query in my batch.
CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[User_bank] ([UserId])
INCLUDE ([IDUserBank],[Amount_Pend],[Amount_Available])
Edit: But in fact with that INDEX the query ran in 15 seconds.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
January 25, 2015 at 6:32 pm
Hi Dwain,
Thanks a lot for the reply and shortening the logic. Forgot to mention about Isprocessed is not null column only. So i removed the coalesce.
But when i execute the script, it took 23 seconds. This might be because of i am running script remotely.I will try this logic on my server box and post the result.
January 25, 2015 at 6:47 pm
Finally Quick update,
My assumption helped me little.
I ran dwain's sugestion on my production box and it ran in 10 sec.
Tried my logic as well and it took 11 sec.
The reason i got it 28 second is i was running the query using Virtual Path Network. thanks dwain for your help.
Any more suggestion is also welcome. Would like to leave this thread open as i am hunger to reduce the execution time.
January 25, 2015 at 7:34 pm
KGJ-Dev (1/25/2015)
Finally Quick update,My assumption helped me little.
I ran dwain's sugestion on my production box and it ran in 10 sec.
Tried my logic as well and it took 11 sec.
The reason i got it 28 second is i was running the query using Virtual Path Network. thanks dwain for your help.
Any more suggestion is also welcome. Would like to leave this thread open as i am hunger to reduce the execution time.
Glad to hear it helped you out, whatever the reason.
So are you saying that the work I removed was unnecessary and that my logic was correct?
BTW. The thing I did with storing off a GETDATE() for reuse is something I do quite often to assure that, if multiple tables are to be impacted by a single transaction, they all get the same date/time applied. I like consistency like that.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
January 26, 2015 at 5:19 am
Hi Dwain,
Your workout was really helpful and of course i will be switching to your sample as there is no need of temp tables to hold the data. I was extra cautious and tried to get the IDcompAFR from temp table. By the way, your solution is clean and short.
I will be testing it with multiple scenario and start using your logic.
thank you so much for your time on this.
January 26, 2015 at 1:56 pm
dwain.c (1/25/2015)
coalesce(a.processStatus, 0) = 0
is not SARGable, so that could be improved on by making the processStatus column NOT NULL.
It's much better to code it as:
(a.processStatus is null or a.processStatus = 0)
When an index is available, SQL can still do a seek for the code above.
In short, the rule is:
NEVER use ISNULL/COALESCE in a WHERE or JOIN clause; you can always code around it.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
January 26, 2015 at 4:59 pm
ScottPletcher (1/26/2015)
dwain.c (1/25/2015)
coalesce(a.processStatus, 0) = 0
is not SARGable, so that could be improved on by making the processStatus column NOT NULL.
It's much better to code it as:
(a.processStatus is null or a.processStatus = 0)
When an index is available, SQL can still do a seek for the code above.
In short, the rule is:
NEVER use ISNULL/COALESCE in a WHERE or JOIN clause; you can always code around it.
I mentioned it thinking that ultimately he'd need to resort to some kind of indexing to speed this up. I probably should have mentioned that.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
January 26, 2015 at 6:11 pm
Hi Scott,
thanks for your suggestion and i already removed coalesce because IsProcessed column is not null. Even i mentioned in previous thread.
Viewing 15 posts - 1 through 15 (of 17 total)
You must be logged in to reply to this topic. Login to reply