October 26, 2020 at 6:56 pm
Hi All
I have a table which contains the transactions done by cards. Below is the table structure and sample data:
create table CardTransactions(CardNo int, Transactionid int, Processed bit)
Insert into CardTransactions values(1,1001,0)
Insert into CardTransactions values(1,1002,0)
Insert into CardTransactions values(1,1003,0)
Insert into CardTransactions values(1,1004,0)
Insert into CardTransactions values(1,1005,0)
Insert into CardTransactions values(2,1006,0)
Insert into CardTransactions values(2,1007,0)
Insert into CardTransactions values(2,1008,0)
Insert into CardTransactions values(3,1009,0)
Insert into CardTransactions values(3,1010,0)
Insert into CardTransactions values(4,1011,0)
Insert into CardTransactions values(4,1012,0)
Insert into CardTransactions values(5,1013,0)
Insert into CardTransactions values(6,1014,0)
Insert into CardTransactions values(7,1015,0)
Now a procedure will be called from 4 sessions. Each session is expected to pick data in almost equal amount (not exact). So here we have 15 transactions, so each session should pick approx 4 transactions. Transaction of 1 card should stay only in one session. So here CardNo 1 has 5 transactions, so one of the sessions should get all 5 transactions, not 4. Second session might get CardNo 2 and 5 transactions(3+1=4). Third Session might get CardNo 3 and 4 (2+2=4). Fourth session will get CardNo 6 and 7(1+1=2) transactions.
Can any query or function in SQL Server does this kind of approximate distribution?
Thanks
Online Trainer For SQL DBA and Developer @RedBushTechnologies with 18 yrs exp.
October 26, 2020 at 7:29 pm
NTILE(n) will divide the set into n nearly evenly sized chunks. Not sure how you would force full groups into a single "chunk" though
October 26, 2020 at 7:43 pm
I think you're going to need a master proc that has this logic or SSIS. Then the proc could call the procedure 4 times and pass in the TransactionIDs that the process should work with. You could kick them off asynchronously, then monitor for when they complete
For better, quicker answers, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
October 26, 2020 at 9:16 pm
You could do it by offset paging if each of the "sessions" requests a different page. Is it possible for each request to query for a different page? Otherwise, it seems like you'd have to keep track of which rows were allocated and exhaust the supply
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
October 26, 2020 at 9:29 pm
Can you expand a little on what the 'sessions' are?
How should one 'session' know what the other sessions are doing (so that they don't have any overlapping or missed rows))?
How is the 'batch' of data which the four sessions are working with determined? Is this a process which will run repeatedly while transactions are being created?
What happens if that batch of data cannot be divided into four (because there are fewer than four card numbers in the batch)?
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
October 27, 2020 at 5:27 am
Can you expand a little on what the 'sessions' are?
How should one 'session' know what the other sessions are doing (so that they don't have any overlapping or missed rows))?
How is the 'batch' of data which the four sessions are working with determined? Is this a process which will run repeatedly while transactions are being created?
What happens if that batch of data cannot be divided into four (because there are fewer than four card numbers in the batch)?
Hi Phil
This proc is called from 4 different SSMS sessions i.e. through 4 spids. (actual implementation use SSIS to start 4 sessions, but for simplicity, we can assume we are running this proc manually by opening 4 SSMS sessions). Whatever cards are picked by a session, all their transactions are marked as Processed =1 so that other sessions are not picking the same cards again. (Atual implementation use tablockx etc to make sure no overlapping cards are picked).
Batch size can be determined like this: total no of transactions/4. So in this case, we get 15/4 ~4
The purpose of this entire exercise is to be able to run one single proc from multiple sessions to get performance gain as we have millions of records to processes. This gives us approx 40% performance gain when all other optimization tricks are already tried out.
Online Trainer For SQL DBA and Developer @RedBushTechnologies with 18 yrs exp.
October 27, 2020 at 12:28 pm
It appears you're looking to use horizontal partitioning.
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
October 27, 2020 at 3:44 pm
The following code will not divide the values exactly. But since you have a large number of records, it uses the card number to split the records. It will guarantee that all the records from a given card number are kept in the same session.
Start by pre-diving the transactions in the table based on CardID
ALTER TABLE CardTransactions
ADD SessionID AS CardNo % 4 + 1 PERSISTED;
CREATE NONCLUSTERED INDEX ix_CardTransactions_Session
ON CardTransactions (Processed, SessionID)
INCLUDE (CardNo, Transactionid);
Then each process needs to know which session it is, and pass that into the proc to update the table and return the data
CREATE PROCEDURE dbo.pr_GetNextBatch
@SessionID int -- Paramter to identify the sesison
, @BatchSize int = 1000 -- Paramter to limit batch sizes
AS
BEGIN
UPDATE TOP ( @BatchSize )
CardTransactions
SET Processed = 1
OUTPUT INSERTED.CardNo, INSERTED.Transactionid
WHERE Processed = 0
AND SessionID = @SessionID;
END;
October 27, 2020 at 3:56 pm
@desnorton, this sounds like a good solution, well done.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
October 30, 2020 at 11:36 am
These schemes will fall apart if the data is too skewed. For example, if for some reason 80% of the card numbers were divisible by 4, one session would be doing 80% of the work.
A much more complex solution could be devised by getting a count of all the card numbers, and then assign them in card number order. The next card number goes to the session that has the least currently assigned to it. But cure of the overhead for doing that may be worse than the disease.
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply