July 17, 2017 at 3:33 am
Hi,
Because of the application issue, we are seeing partial duplicate row, how to combine duplicate row to original row.
Ex
Transactional Sales Table:
Create table Sales
(
Id int identity (1,1),
OrderID int,
ProductName varchar(50),
Date datetime default getdate(),
SalesAmount int
)
Sample Data:
Insert into dbo.Sales (OrderId,ProductName,SalesAmount)
select 1001,'Computer',50 union all
select 1002, 'Computer',0.1 union all ----- It should have been 50.1 for item 1001
select 1003 ,'KeyBoard', 15 union all
Select 1004 , 'Mouse', 8 union all
Select 1005,'Computer',80 union all
Select 1006,'computer',0.5 ----- It should have been 80.5 for item 1005
select * from Sales
-- Cross check
select ProductName,Count(ProductName) as NumberofItemsSold,Sum(SalesAmount) TotalSales FROM sales
GROUP BY ProductName
From the above query the company had sold only two computers, but there was issue with front end systems, the fraaction value of salesamount for computer product consider as two sales instead of one sale item, so Ideally we are looking to cleanse the data as follows
1001,'Computer',50.1
1003 ,'KeyBoard', 15
1004 , 'Mouse', 8
1005,'Computer',80.5
How do I combine rows to achieve the above output, It would be great if someone had dealt with this kind of issues.
Many Thanks
July 17, 2017 at 3:47 am
How do you tell that the value is duplicated, is it only is the following ID has a decimal value between >0 and <1?
If so, you declare you column, SalesAmount, as an INT and then insert a decimal value. This will result in the value 0. I this correct? If so, where is the actual decimal value stored? If not, please update your statement and check it works as you expect.
Thom~
Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk
July 17, 2017 at 4:00 am
Assuming that all rows >0 and <1 are offending rows, then this should work. Note, I do not update the OrderID, as generally reseeding your IDs is a bad idea. It also assumes that the Sales amount is a decimal, not an int:WITH NextRow AS (
SELECT *,
LEAD(SalesAmount) OVER (PARTITION BY ProductName ORDER BY OrderID) AS NextSalesAmount
FROM Sales)
UPDATE Sales
SET SalesAmount = NR.SalesAmount + NR.NextSalesAmount
FROM NextRow NR
WHERE NR.NextSalesAmount > 0 AND NR.NextSalesAmount < 1
AND NR.OrderID = Sales.OrderID;
GO
SELECT *
FROM Sales;
GO
DELETE FROM Sales
WHERE SalesAmount > 0 AND SalesAmount < 1;
GO
SELECT *
FROM Sales;
Thom~
Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk
July 17, 2017 at 4:03 am
sorry the table creation script is bit wrong, it should have been float instead of int.
Create table Sales
(
Id int identity (1,1),
OrderID int,
ProductName varchar(50),
Date datetime default getdate(),
SalesAmount float
)
We identify its is duplicate when there the value is between 0 and 1 and also using lag function whether the previous row product name is same as current as current row.
With following query I could able to identify which rows are duplicate or not
Query to Identify Duplicates;
select case when SCAS_DW.dbo.Sales.SalesAmount between 0 and 1 and lag(ProductName,1) over (order by ORDERid) = ProductName then 'Duplicate' else 'Not Duplicate' End as Validation
,*
from Sales
But I am struggling how to join two rows as one row sum sales amount and ignore the row of duplicate.
July 17, 2017 at 4:17 am
Thom A - Monday, July 17, 2017 4:00 AMAssuming that all rows >0 and <1 are offending rows, then this should work. Note, I do not update the OrderID, as generally reseeding your IDs is a bad idea. It also assumes that the Sales amount is a decimal, not an int:WITH NextRow AS (
SELECT *,
LEAD(SalesAmount) OVER (PARTITION BY ProductName ORDER BY OrderID) AS NextSalesAmount
FROM Sales)
UPDATE Sales
SET SalesAmount = NR.SalesAmount + NR.NextSalesAmount
FROM NextRow NR
WHERE NR.NextSalesAmount > 0 AND NR.NextSalesAmount < 1
AND NR.OrderID = Sales.OrderID;
GO
SELECT *
FROM Sales;
GO
DELETE FROM Sales
WHERE SalesAmount > 0 AND SalesAmount < 1;
GO
SELECT *
FROM Sales;
you are star
July 17, 2017 at 6:46 am
Thom A - Monday, July 17, 2017 4:00 AMAssuming that all rows >0 and <1 are offending rows, then this should work. Note, I do not update the OrderID, as generally reseeding your IDs is a bad idea. It also assumes that the Sales amount is a decimal, not an int:WITH NextRow AS (
SELECT *,
LEAD(SalesAmount) OVER (PARTITION BY ProductName ORDER BY OrderID) AS NextSalesAmount
FROM Sales)
UPDATE Sales
SET SalesAmount = NR.SalesAmount + NR.NextSalesAmount
FROM NextRow NR
WHERE NR.NextSalesAmount > 0 AND NR.NextSalesAmount < 1
AND NR.OrderID = Sales.OrderID;
GO
SELECT *
FROM Sales;
GO
DELETE FROM Sales
WHERE SalesAmount > 0 AND SalesAmount < 1;
GO
SELECT *
FROM Sales;
Hi Thom,
I am glad for your post, If we want the same output using above table as view instead of updating and deleting actual table, could you refine you query and provide the answer, Before I post it now, I have tried using CTE's and subqueries but I am not able to achieve it. Could you please help me here.
Many Thanks
July 17, 2017 at 6:53 am
Sangeeth878787 - Monday, July 17, 2017 6:46 AMHi Thom,I am glad for your post, If we want the same output using above table as view instead of updating and deleting actual table, could you refine you query and provide the answer, Before I post it now, I have tried using CTE's and subqueries but I am not able to achieve it. Could you please help me here.
Many Thanks
One method (excuse my paste from VS Code):
Thom~
Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply