April 1, 2020 at 12:49 pm
I'm writing an update statement that uses a quantity column to determine how many rows get updated. I can accomplish this using RBAR, but performance is unusable and I can't figure out how to do this as a set operation.
I have a sales table that contains a row for each individual item. In the script there are two sales,'abc' and 'def'. They include items '123' quantity 5, '456' quantity 3 and '789' quantity 3, which is a total of 11 rows in the table.
Some items are shipped at the time of sale and others are shipped as they become available. This data arrives in the sale_order table in the form of an itemID , quantity and ship_date.
My goal is to get the ship_date from the order table and apply it to the original sale the number of times in the quantity column. sale_code 'abc' has two orders for item 123, one with quantity 1 and the other with 3. The sale should have 3 rows updated with ship_date 3/3/2020 and 1 row with 3/2/2020. Row order for the item does not matter.
The script below give a short sample data set. I included a sale_result table that shows what the result should look like, which probably better explains what I'm trying to accomplish than my description.
/*
drop table dbo.sale_order
drop table dbo.sale
drop table dbo.sale_result
*/
create table dbo.sale_order
(
id int identity primary key
,item_id int
,sale_code varchar(36)
,quantity int
,ship_date datetime
)
INSERT INTO [dbo].[sale_order] ([item_id],[sale_code],quantity,ship_date) VALUES (123,'ABC' ,1 ,'3/2/2020')
INSERT INTO [dbo].[sale_order] ([item_id],[sale_code],quantity,ship_date) VALUES (123,'ABC' ,3 ,'3/3/2020')
INSERT INTO [dbo].[sale_order] ([item_id],[sale_code],quantity,ship_date) VALUES (456,'DEF' ,3 ,'2/4/2020')
INSERT INTO [dbo].[sale_order] ([item_id],[sale_code],quantity,ship_date) VALUES (789,'DEF' ,2 ,'2/5/2020')
create table dbo.sale
(
id int identity primary key
,item_id int
,sale_code varchar(36)
,ship_date datetime
)
insert into dbo.sale ([item_id],[sale_code],[ship_date]) values(123,'ABC','3/1/2020') --one item delivered at time of sale
insert into dbo.sale ([item_id],[sale_code],[ship_date]) values(123,'ABC',NULL)
insert into dbo.sale ([item_id],[sale_code],[ship_date]) values(123,'ABC',NULL)
insert into dbo.sale ([item_id],[sale_code],[ship_date]) values(123,'ABC',NULL)
insert into dbo.sale ([item_id],[sale_code],[ship_date]) values(123,'ABC',NULL)
insert into dbo.sale ([item_id],[sale_code],[ship_date]) values(456,'DEF',NULL)
insert into dbo.sale ([item_id],[sale_code],[ship_date]) values(456,'DEF',NULL)
insert into dbo.sale ([item_id],[sale_code],[ship_date]) values(456,'DEF',NULL)
insert into dbo.sale ([item_id],[sale_code],[ship_date]) values(789,'DEF',NULL)
insert into dbo.sale ([item_id],[sale_code],[ship_date]) values(789,'DEF',NULL)
insert into dbo.sale ([item_id],[sale_code],[ship_date]) values(789,'DEF',NULL)
create table dbo.sale_result
(
id int identity primary key
,item_id int
,sale_code varchar(36)
,ship_date datetime
)
insert into dbo.sale_result ([item_id],[sale_code],[ship_date]) values(123,'ABC','3/1/2020')
insert into dbo.sale_result ([item_id],[sale_code],[ship_date]) values(123,'ABC','3/2/2020')
insert into dbo.sale_result ([item_id],[sale_code],[ship_date]) values(123,'ABC','3/3/2020')
insert into dbo.sale_result ([item_id],[sale_code],[ship_date]) values(123,'ABC','3/3/2020')
insert into dbo.sale_result ([item_id],[sale_code],[ship_date]) values(123,'ABC','3/3/2020')
insert into dbo.sale_result ([item_id],[sale_code],[ship_date]) values(456,'DEF','2/4/2020')
insert into dbo.sale_result ([item_id],[sale_code],[ship_date]) values(456,'DEF','2/4/2020')
insert into dbo.sale_result ([item_id],[sale_code],[ship_date]) values(456,'DEF','2/4/2020')
insert into dbo.sale_result ([item_id],[sale_code],[ship_date]) values(789,'DEF','2/5/2020')
insert into dbo.sale_result ([item_id],[sale_code],[ship_date]) values(789,'DEF','2/5/2020')
insert into dbo.sale_result ([item_id],[sale_code],[ship_date]) values(789,'DEF',NULL)
select * from dbo.sale
select * from dbo.sale_result
select * from dbo.sale_order
April 1, 2020 at 2:21 pm
I have an idea, instead of posting a png file showing the results, that just happens to be too small and fuzzy for me to read, try this instead.
Create a table, perhaps named ExpectedResults, insert into the table the desired results that you are expecting. This accomplishes two things: one, it shows us what you are expecting, two it gives us something to test against.
April 1, 2020 at 2:47 pm
Thanks for suggestion. I've edited the script and added a result table.
April 1, 2020 at 2:51 pm
Or just use a testing framework and write a test: https://www.sqlservercentral.com/articles/using-tsqlt-tests-to-practice-queries
April 1, 2020 at 3:30 pm
Just to clarify, presumably this is for an OLTP system? If so, wouldn't this be a parameterised update, run from a proc?
Eg
Exec update_Sale @Item_Id, @Sale_Code
?
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
April 1, 2020 at 3:40 pm
The way this is set up isn't straightforward because there's no way to join to the duplicated rows in sale WHERE ship_date is null. I re-jiggered the script to not insert those rows. The results table is populated with 2 inserts: 1) sale table, and 2) sale_order table cross applied to a tally table. The sale table isn't needed as a "work" table.
drop table if exists #sale_order;
drop table if exists #sale;
drop table if exists #sale_result;
go
create table #sale_order(
id int identity primary key
,item_id int
,sale_code varchar(36)
,quantity int
,ship_date datetime);
go
insert #sale_order(item_id,sale_code,quantity,ship_date) values
(123,'ABC' ,1 ,'3/2/2020'),
(123,'ABC' ,3 ,'3/3/2020'),
(456,'DEF' ,3 ,'2/4/2020'),
(789,'DEF' ,2 ,'2/5/2020');
create table #sale(
id int identity primary key
,item_id int
,sale_code varchar(36)
,ship_date datetime);
go
insert #sale(item_id,sale_code,ship_date) values
(123,'ABC','3/1/2020'); --one item delivered at time of sale
create table #sale_result(
id int identity primary key
,item_id int
,sale_code varchar(36)
,ship_date datetime);
go
insert #sale_result(item_id,sale_code,ship_date)
select item_id, sale_code, ship_date from #sale
union all
select
so.item_id, so.sale_code, so.ship_date
from
#sale_order so
cross apply
dbo.tally(so.quantity) t;
select * from #sale;
select * from #sale_result;
select * from #sale_order;
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
April 1, 2020 at 3:41 pm
This is for a data warehouse so it would be a batch process. Millions of orders to build it initially and then 100k daily.
April 1, 2020 at 3:54 pm
Just out of interest, can you explain how you intend to use the Sale table in the DW?
This feels like an excessive amount of de-normalisation to me.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
April 1, 2020 at 6:11 pm
Responding to the solution by scdecade, the null values in the sale table must be populated before the sale_order records can be applied.
April 1, 2020 at 6:43 pm
Is it correct: rows with NULL ship_date in the 'sale' table are created before the corresponding ship_date in the sale_order table? Units always ship in the quantity ordered?
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
April 1, 2020 at 6:55 pm
You're correct on the denormalization. The goal is to simplify incoming oltp data by giving the ability to update the status of a particular item (backordered, shipped, delivered, event1, event2 and so on) rather than having to break out part of the quantity to a new row. If multiple items are on one row with a quantity, then an update to any status will result in a new row. After the full life cycle, you end up with about the same amount of rows but with more processing power required and probably more storage.
April 1, 2020 at 6:55 pm
Yes, null ship_date arrives in the sale table before sale_order occurs.
April 1, 2020 at 7:44 pm
Why not update the 'sale' table directly? Do the 'sale' table and 'sale_result' table always have the same # of rows? If so, could this be implemented by making 'sale' a temporal table?
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
April 1, 2020 at 7:56 pm
Not sure what you mean by "directly", I think that's what I'm asking for.
The sale_result table is only to show you what the sale table would like if the orders were applied to it correctly. It is not part of the database model.
April 1, 2020 at 8:53 pm
drop table if exists #sale_order;
drop table if exists #sale;
drop table if exists #sale_result;
go
create table #sale_order(
id int identity primary key
,item_id int
,sale_code varchar(36)
,quantity int
,ship_date datetime);
go
insert #sale_order(item_id,sale_code,quantity,ship_date) values
(123,'ABC' ,1 ,'3/2/2020'),
(123,'ABC' ,3 ,'3/3/2020'),
(456,'DEF' ,3 ,'2/4/2020'),
(789,'DEF' ,2 ,'2/5/2020');
create table #sale(
id int identity primary key
,item_id int
,sale_code varchar(36)
,ship_date datetime);
go
insert #sale(item_id, sale_code,ship_date) values
(123,'ABC','3/1/2020'), --one item delivered at time of sale
(123,'ABC',NULL),
(123,'ABC',NULL),
(123,'ABC',NULL),
(123,'ABC',NULL),
(456,'DEF',NULL),
(456,'DEF',NULL),
(456,'DEF',NULL),
(789,'DEF',NULL),
(789,'DEF',NULL),
(789,'DEF',NULL);
declare
@id int=2;
with s_cte(id, ship_date) as (
select top(select quantity from #sale_order where id=@id)
s.id, so.ship_date
from
#sale s
join
#sale_order so on s.item_id=so.item_id
and s.sale_code=so.sale_code
and so.id=@id
where
s.ship_date is null)
update s
set
ship_date=sc.ship_date
from
#sale s
join
s_cte sc on s.id=sc.id;
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
Viewing 15 posts - 1 through 15 (of 16 total)
You must be logged in to reply to this topic. Login to reply