January 15, 2016 at 12:17 am
Sergiy (1/14/2016)
Apart from that.How do you deal with concurrent updates?
If 2 or more users edit the same record in their independent instances of C++ application - which version of the record takes precedence?
OP commented that a bunch of updates is a business transaction. So the system should either have its own transaction control or rely on MS SQL transactions to address this issue (and deadlocks as well).
January 15, 2016 at 3:09 am
MMartin1 (1/14/2016)
If this is indeed the case, I am not sure why updating one field means blanking out the other, but as I mentioned I am probably missing a thing or two in important information here.
Hi MMartin1, not sure if we are talking along side each other here. It is not always easy exactly explaining everything.
I was just trying to say that it seems like a lot of the solutions, be it using a SP or a temp table don't seem to be able to handle setting a field (back) to NULL.
With SPs, you need to give all OPTIONAL parameters a default value. Usually this is done by @paramX=NULL. This way inside the SP you test if the value is not null and use the param/value accordingly (say to update a field). This works in most cases, but what if you have a couple of fields that DO allow NULL, and at times you want to update a field back to NULL. Passing NULL as a 'value' to the SP causes this input parameter to be ignored.
I would expect the same issue to arise when using a temp table?
January 15, 2016 at 3:12 am
lnardozi 61862 (1/14/2016)
in this case the application code has processed all changes in memory and needs to (at set times) write all changes to the database.
I'm surprised no one hasn't offered this solution. Create a user-defined type for each table updated. Pass those tables as readonly table valued parameters into a proc that has MERGE statements for each user-defined type. It'll be screaming fast because no latency between client and server.
Hey Inardozi. Cool to see a fresh idea, I will definitely look into this option! If anything just to learn about how it works. Not sure how much code change this would require but sounds like a fast technique. Google is my friend, but if you would have any specific recommendations on good examples on this topic, I'm all ears! thanks.
EDIT: I experimented with this and run in the exact same problem: the fact that I can't get around working with optional parameters, AND at the same time the need to still be able to update to NULL if required/
I am probably missing something very obvious here ... as I doubt I'm the only person ever who's run into this. 🙂
the below TSQL would show exactly the issue that I'm trying to get passed. If anyone knows of an elegant solution, I'm all ears!
-- example table
CREATE TABLE [dbo].[example_table] (
[id] [bigint] NOT NULL PRIMARY KEY,
[name] varchar(20) NOT NULL,
[description] varchar(200) NULL
);
-- example TVP
CREATE TYPE [dbo].[example_tvp] AS TABLE (
[id] [bigint] NOT NULL PRIMARY KEY,
[name] varchar(20) NOT NULL,
[description] varchar(200) NULL
);
-- example merge_sp
CREATE PROCEDURE example_merge @merge_object example_tvp READONLY
AS
BEGIN
SET NOCOUNT ON;
MERGE [example_table] AS target
USING @merge_object AS [Source]
ON [target].[id] = [Source].[id]
WHEN MATCHED THEN
UPDATE SET [name] = [Source].[name],
[description] = [Source].[description]
WHEN NOT MATCHED THEN
INSERT ( id, name, description )
VALUES ( [Source].[id], [Source].[Name], [Source].[Description] );
END
-- test code
DECLARE @tvp_table1 example_tvp
INSERT INTO @tvp_table1 VALUES (1,'test1','Description Test One')
INSERT INTO @tvp_table1 VALUES (2,'test2','Description Test Two')
INSERT INTO @tvp_table1 VALUES (3,'test3','Description Test Three')
INSERT INTO @tvp_table1 VALUES (4,'test4','Description Test Four')
INSERT INTO @tvp_table1 VALUES (5,'test5',null)
exec example_merge @tvp_table1
SELECT * FROM example_table
Perfect! As expected :)
Output:
1test1Description Test One
2test2Description Test Two
3test3Description Test Three
4test4Description Test Four
5test5NULL
-- test code2
DECLARE @tvp_table1 example_tvp
INSERT INTO @tvp_table1 VALUES (3,'test3','UPDATED Description Test Three')
INSERT INTO @tvp_table1 VALUES (6,'test6','NEW Description Test Six')
exec example_merge @tvp_table1
SELECT * FROM example_table
Beautiful! As expected :)
Output:
1test1Description Test One
2test2Description Test Two
3test3UPDATED Description Test Three
4test4Description Test Four
5test5NULL
6test6NEW Description Test Six
-- test code3 - Dangerous territoy
DECLARE @tvp_table1 example_tvp
-- I want to update ONLY the name for id=3
INSERT INTO @tvp_table1 (id, [name]) VALUES (3,'test3 X')
-- I want to update the name and description for id=5, and by doing so, update the description back to NULL
INSERT INTO @tvp_table1 VALUES (5,'test5 X',null)
-- with the above version of the merge SP, it will also update the description (to null) for id=3
exec example_merge @tvp_table1
SELECT * FROM example_table
Output:
1test1Description Test One
2test2Description Test Two
3test3 XNULL
4test4Description Test Four
5test5 XNULL
6test6NEW Description Test Six
-- it will work with the below version of the merge SP , but then I can no longer update something back to null
-- Alternative version of the merge_sp
ALTER PROCEDURE example_merge @merge_object example_tvp READONLY
AS
BEGIN
SET NOCOUNT ON;
MERGE [example_table] AS target
USING @merge_object AS [Source]
ON [target].[id] = [Source].[id]
WHEN MATCHED THEN
UPDATE SET [name] = ISNULL([Source].[name],[target].[name]),
[description] = ISNULL([Source].[description],[target].[description])
WHEN NOT MATCHED THEN
INSERT ( id, name, description )
VALUES ( [Source].[id], [Source].[Name], [Source].[Description] );
END
-- test code4 - Dangerous territoy again
DECLARE @tvp_table1 example_tvp
-- I want to update ONLY the name for id=1
INSERT INTO @tvp_table1 (id, [name]) VALUES (1,'test1 X')
-- I want to update ONLY the name and description for id=2, and by doing so, update the description back to NULL
INSERT INTO @tvp_table1 VALUES (2,'test2 X',null)
exec example_merge @tvp_table1
SELECT * FROM example_table
1test1 XDescription Test One<-- Great, the description didn't get touched
2test2 XDescription Test Two<-- Bummer, I couldn't reset the description to NULL
3test3 XNULL
4test4Description Test Four
5test5 XNULL
6test6NEW Description Test Six
Is there a way to build the merge SP so it would be able to handle this??
January 15, 2016 at 3:14 am
serg-52 (1/15/2016)
Sergiy (1/14/2016)
Apart from that.How do you deal with concurrent updates?
If 2 or more users edit the same record in their independent instances of C++ application - which version of the record takes precedence?
OP commented that a bunch of updates is a business transaction. So the system should either have its own transaction control or rely on MS SQL transactions to address this issue (and deadlocks as well).
Affirmative, we use DB locking as well as locking on the application level. Thanks!
January 15, 2016 at 3:30 am
The code example you posted as "finalised" excludes all NULL variables from the UPDATE altogether.
That's correct. But because I am using sp_execute SQL there is a workaround and I 'could' (in special cases) pass the fields I want to set to NULL in a special parameter:
IF(@null_list IS NOT NULL) SET @SQL_Cmd = @SQL_cmd + @null_list + N',';
and when calling the SP this parameter would contain : @null_list='fieldA = @_fieldA' , which would then get handle properly.
Special conditions can be easily implemented with CASE statement instead of ISNULL.
took me a minute to realize what you meant, but I think you mean:
use ISNULL everywhere
for those fields which allow NULL : use CASE ?
I don't see any overhead here.
I noticed worse performance updating all fields, so that definitely impact me. Fields which got updated unnecessary that f.ex. had an index on it which then needed updating.
January 15, 2016 at 3:22 pm
Yes, from within the proc there is the exact same 'problem'. However, it is from outside the proc you will notice dramatic performance improvements. Let's say you put 1000 records into each 'chunk' you update. Let us further suppose you use NULL in the traditional manner, which you have already demonstrated to work in your example. Here's where you benefit:
1. You don't have to connect 999 times from the app - that latency is eliminated.
2. The command is sent across the network 999 fewer times.
3. The table is updated 999 fewer times.
4. Each index is updated 999 fewer times.
5. There are 999 times fewer transactions.
Also, if you wanted to get all wiggy on it, you could use (.net) PLinq and go AsParallel() on the thing that's building the chunks for you, and build the next chunk while the first check is being processed. PLinq will automatically split up into as many threads as it take to use all CPUs on the box. If you write a small console app that does 1000 sp calls and then does the same work using the tvp you'll see the dramatic increase for yourself. Depending on the box, PLinq could increase that improvement by a factor of, maybe 10 or so depending on how badass your server is.
January 17, 2016 at 4:15 am
lnardozi 61862 (1/15/2016)
Yes, from within the proc there is the exact same 'problem'. However, it is from outside the proc you will notice dramatic performance improvements. Let's say you put 1000 records into each 'chunk' you update. Let us further suppose you use NULL in the traditional manner, which you have already demonstrated to work in your example. Here's where you benefit:1. You don't have to connect 999 times from the app - that latency is eliminated.
2. The command is sent across the network 999 fewer times.
3. The table is updated 999 fewer times.
4. Each index is updated 999 fewer times.
5. There are 999 times fewer transactions.
Also, if you wanted to get all wiggy on it, you could use (.net) PLinq and go AsParallel() on the thing that's building the chunks for you, and build the next chunk while the first check is being processed. PLinq will automatically split up into as many threads as it take to use all CPUs on the box. If you write a small console app that does 1000 sp calls and then does the same work using the tvp you'll see the dramatic increase for yourself. Depending on the box, PLinq could increase that improvement by a factor of, maybe 10 or so depending on how badass your server is.
Hi Inardozi, I'm very excited about this mehod :-), and I'm pushing for some time with the dev team to create a code fork that would work with TVPs.
I definitely see there could be a great benefit from doing it this way.
Hopefully a real test with one-for-one workloads will be enough to prove its efficiency. It might even lead to revisiting the db-model which allows for NULLs here and there (and check/fix all the application code that deals with these NULL cases).
I agree with points 3,4,5 you bring fwd. , however the application has to fill the TVP , so it would still need to connect to add a new row to the TVP for each update or insert it detects/generate.
Or do you mean:
- buffer all inserts into the tvp on the application side
- one insert into the tvp (which could be quite long/large)
- one call to do the merge
For my test I can work around these NULLs, it is just weird that there doesn't seem to bee an elegant solution for this within TSQL? If NULL can be a real value , then their should've been
A) something else to define optional values with
B) a function that could check if a parameter "isset" or not
... again, unless I am missing the obvious here. It seems weird I would be the first person to hit this constraint!
Thanks again for your feedback.
January 17, 2016 at 9:07 am
On the app side, adding a row to be processed is done by adding a row to the table-valued parameter. One common way to do that is to use a DataTable. In its simplest form:
var done = false;
var counter = 0;
DataTable tvp = GetNewTableWithSchemaOfTVP();
using (var connection = new SqlConnection(cnString))
{
connection.Open();
while (!done)
{
DataRow dr = tvp.NewRow();
// your code to process a row goes here, fill the DataRow however you would normally
counter += ProcessRow(dr);
tvp.Rows.Add(dr);
if (counter ^1000==0) {
using (SqlCommand cmd = new SqlCommand("UpdateUsingTVP", connection))
{
cmd.CommandType = CommandType.StoredProcedure;
var pList = new SqlParameter("@tvp", SqlDbType.Structured);
pList.TypeName = "dbo.YourTVPName";
pList.Value = tvp;
cmd.ExecuteNonQuery();
}
tvp = GetNewTableWithSchemaOfTVP();
}
}
The plinq stuff is left out in the interest of clarity. In your existing code, it performs a database update per row. The point is to intercept that process and put the parameter values you'd normally have put in the stored procedure into the DataTable. That way you only have to call the stored procedure every thousandth time, get a new table and move on. HTH.
January 17, 2016 at 1:55 pm
Stijn977 (1/15/2016)
took me a minute to realize what you meant, but I think you mean:use ISNULL everywhere
for those fields which allow NULL : use CASE ?
No, what I meant was :
SET ColumnName = CASE WHEN @NewValue IS NOT NULL OR SpecialCondition = TRUE then @NewValue ELSE ColumnName END
_____________
Code for TallyGenerator
January 17, 2016 at 2:12 pm
Stijn977 (1/15/2016)
serg-52 (1/15/2016)
Sergiy (1/14/2016)
Apart from that.How do you deal with concurrent updates?
If 2 or more users edit the same record in their independent instances of C++ application - which version of the record takes precedence?
OP commented that a bunch of updates is a business transaction. So the system should either have its own transaction control or rely on MS SQL transactions to address this issue (and deadlocks as well).
Affirmative, we use DB locking as well as locking on the application level. Thanks!
I find it hard to believe you apply and hold DB lock for the whole time between reading values from the DB, processing it in Application and submitting it back in form of batch of updates.
"Locking on application level" - what exactly does it mean?
I was not talking just about concurrency on saving data.
You have to make sure the data you've been modifying in one of the instances of your application is not being modified by another one. Or, even worse, being used as a base for another data modification process.
By the time that second process is finished the dataset it returns may happen to be invalid because the source data has been changed by another process which has just completed saving its outcome.
Think of editing an Excel spreadsheet by 2 users. The changes which been saved first are lost. Therefore they do not allow to open a document for editing if it's opened by somebody else.
Do you take care of it in you "disconnected edit" scenario?
_____________
Code for TallyGenerator
January 18, 2016 at 2:47 am
Sergiy (1/17/2016)
Stijn977 (1/15/2016)
took me a minute to realize what you meant, but I think you mean:use ISNULL everywhere
for those fields which allow NULL : use CASE ?
No, what I meant was :
SET ColumnName = CASE WHEN @NewValue IS NOT NULL OR SpecialCondition = TRUE then @NewValue ELSE ColumnName END
Hi Sergiy, I see what you mean and yes I thought about this one too. The problem is that in some cases there might be 2 (or 3?) fields that allow null, meaning 3 extra flags to keep track. Not completely undo-able of course. But then what if some person changes a field from allow null to not allow null, i need to keep track etc etc. just not my preferred solution.
I think at the moment I really want to test system behaviour using TVPs and based upon that I might try and push to get rid of the 'allow null' fields (on can hope).
On the blocking. The system is already in use and has 100s to 1000s concurrent users, so the database and application level blocking seems to work fine. With only the very occasional deadlock or transaction which runs longer because it is held up by locking.
have a great week and thank you for your suggestions!
January 18, 2016 at 2:58 am
lnardozi 61862 (1/17/2016)
On the app side, adding a row to be processed is done by adding a row to the table-valued parameter. One common way to do that is to use a DataTable. In its simplest form:
var done = false;
var counter = 0;
DataTable tvp = GetNewTableWithSchemaOfTVP();
using (var connection = new SqlConnection(cnString))
{
connection.Open();
while (!done)
{
DataRow dr = tvp.NewRow();
// your code to process a row goes here, fill the DataRow however you would normally
counter += ProcessRow(dr);
tvp.Rows.Add(dr);
if (counter ^1000==0) {
using (SqlCommand cmd = new SqlCommand("UpdateUsingTVP", connection))
{
cmd.CommandType = CommandType.StoredProcedure;
var pList = new SqlParameter("@tvp", SqlDbType.Structured);
pList.TypeName = "dbo.YourTVPName";
pList.Value = tvp;
cmd.ExecuteNonQuery();
}
tvp = GetNewTableWithSchemaOfTVP();
}
}
The plinq stuff is left out in the interest of clarity. In your existing code, it performs a database update per row. The point is to intercept that process and put the parameter values you'd normally have put in the stored procedure into the DataTable. That way you only have to call the stored procedure every thousandth time, get a new table and move on. HTH.
Hello Inardozi, thanks for taking the time to make it even more clear. I do think I understand the concept and only wish I could get a new code quicker so I could test it out (ETA not until a week or so).
I believe that in my specific case (# updates/inserts say around 50 - 200 per transaction), the biggest potential bottleneck is transactions waiting for another transaction to finish.
I've observed very low latency for small statements, so I my initial guess is that by initializing the TVP already on the database side = adding the rows to the TVP as I work my way through the transaction (which still finishes in a couple 100s of milliseconds normally at this point). And then at the end calling the SP to merge everything, only at that point would the underlying table be potentially suffer locking, which hopefully should be minimal as the new data is already (db)server side waiting to be merged.
Viewing 12 posts - 46 through 56 (of 56 total)
You must be logged in to reply to this topic. Login to reply