How to combine two rows as one row for transactional Data

Question

How to combine two rows as one row for transactional Data

Sangeeth Raj

Hall of Fame

Points: 3541
More actions
July 17, 2017 at 3:33 am

#402311

Hi,
Because of the application issue, we are seeing partial duplicate row, how to combine duplicate row to original row.
Ex
Transactional Sales Table:

Create table Sales
(
Id int identity (1,1),
OrderID int,
ProductName varchar(50),
Date datetime default getdate(),
SalesAmount int
)

Sample Data:
Insert into dbo.Sales (OrderId,ProductName,SalesAmount)
select 1001,'Computer',50 union all
select 1002, 'Computer',0.1 union all ----- It should have been 50.1 for item 1001
select 1003 ,'KeyBoard', 15 union all
Select 1004 , 'Mouse', 8 union all
Select 1005,'Computer',80 union all
Select 1006,'computer',0.5 ----- It should have been 80.5 for item 1005
select * from Sales
-- Cross check
select ProductName,Count(ProductName) as NumberofItemsSold,Sum(SalesAmount) TotalSales FROM sales
GROUP BY ProductName
From the above query the company had sold only two computers, but there was issue with front end systems, the fraaction value of salesamount for computer product consider as two sales instead of one sale item, so Ideally we are looking to cleanse the data as follows
1001,'Computer',50.1
1003 ,'KeyBoard', 15
1004 , 'Mouse', 8
1005,'Computer',80.5
How do I combine rows to achieve the above output, It would be great if someone had dealt with this kind of issues.
Many Thanks

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

Thom A SSC Guru Points: 99134 More actions · Answer 1

How do you tell that the value is duplicated, is it only is the following ID has a decimal value between >0 and <1?

If so, you declare you column, SalesAmount, as an INT and then insert a decimal value. This will result in the value 0. I this correct? If so, where is the actual decimal value stored? If not, please update your statement and check it works as you expect.

Thom~

Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk

Thom A SSC Guru Points: 99134 More actions · Answer 2

Assuming that all rows >0 and <1 are offending rows, then this should work. Note, I do not update the OrderID, as generally reseeding your IDs is a bad idea. It also assumes that the Sales amount is a decimal, not an int:
WITH NextRow AS ( SELECT *, LEAD(SalesAmount) OVER (PARTITION BY ProductName ORDER BY OrderID) AS NextSalesAmount FROM Sales) UPDATE Sales SET SalesAmount = NR.SalesAmount + NR.NextSalesAmount FROM NextRow NR WHERE NR.NextSalesAmount > 0 AND NR.NextSalesAmount < 1 AND NR.OrderID = Sales.OrderID; GO SELECT * FROM Sales; GO DELETE FROM Sales WHERE SalesAmount > 0 AND SalesAmount < 1; GO SELECT * FROM Sales;

Thom~

Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk

Sangeeth Raj Hall of Fame Points: 3541 More actions · Answer 3

sorry the table creation script is bit wrong, it should have been float instead of int.
Create table Sales
(
Id int identity (1,1),

OrderID int,

ProductName varchar(50),

Date datetime default getdate(),

SalesAmount float
)

We identify its is duplicate when there the value is between 0 and 1 and also using lag function whether the previous row product name is same as current as current row.

With following query I could able to identify which rows are duplicate or not

Query to Identify Duplicates;
select case when SCAS_DW.dbo.Sales.SalesAmount between 0 and 1 and lag(ProductName,1) over (order by ORDERid) = ProductName then 'Duplicate' else 'Not Duplicate' End as Validation
,*
from Sales

But I am struggling how to join two rows as one row sum sales amount and ignore the row of duplicate.

Sangeeth Raj Hall of Fame Points: 3541 More actions · Answer 4

Thom A - Monday, July 17, 2017 4:00 AM
Assuming that all rows >0 and <1 are offending rows, then this should work. Note, I do not update the OrderID, as generally reseeding your IDs is a bad idea. It also assumes that the Sales amount is a decimal, not an int:
WITH NextRow AS ( SELECT *, LEAD(SalesAmount) OVER (PARTITION BY ProductName ORDER BY OrderID) AS NextSalesAmount FROM Sales) UPDATE Sales SET SalesAmount = NR.SalesAmount + NR.NextSalesAmount FROM NextRow NR WHERE NR.NextSalesAmount > 0 AND NR.NextSalesAmount < 1 AND NR.OrderID = Sales.OrderID; GO SELECT * FROM Sales; GO DELETE FROM Sales WHERE SalesAmount > 0 AND SalesAmount < 1; GO SELECT * FROM Sales;

you are star

Sangeeth Raj Hall of Fame Points: 3541 More actions · Answer 5

Thom A - Monday, July 17, 2017 4:00 AM
Assuming that all rows >0 and <1 are offending rows, then this should work. Note, I do not update the OrderID, as generally reseeding your IDs is a bad idea. It also assumes that the Sales amount is a decimal, not an int:
WITH NextRow AS ( SELECT *, LEAD(SalesAmount) OVER (PARTITION BY ProductName ORDER BY OrderID) AS NextSalesAmount FROM Sales) UPDATE Sales SET SalesAmount = NR.SalesAmount + NR.NextSalesAmount FROM NextRow NR WHERE NR.NextSalesAmount > 0 AND NR.NextSalesAmount < 1 AND NR.OrderID = Sales.OrderID; GO SELECT * FROM Sales; GO DELETE FROM Sales WHERE SalesAmount > 0 AND SalesAmount < 1; GO SELECT * FROM Sales;

Hi Thom,

I am glad for your post, If we want the same output using above table as view instead of updating and deleting actual table, could you refine you query and provide the answer, Before I post it now, I have tried using CTE's and subqueries but I am not able to achieve it. Could you please help me here.

Many Thanks

Thom A SSC Guru Points: 99134 More actions · Answer 6

Sangeeth878787 - Monday, July 17, 2017 6:46 AM
Hi Thom,
I am glad for your post, If we want the same output using above table as view instead of updating and deleting actual table, could you refine you query and provide the answer, Before I post it now, I have tried using CTE's and subqueries but I am not able to achieve it. Could you please help me here.
Many Thanks

One method (excuse my paste from VS Code):

WITH NextRow AS (
  SELECT *,
   LEAD(SalesAmount) OVER (PARTITION BY ProductName ORDER BY OrderID) AS NextSalesAmount
  FROM Sales)
--UPDATE Sales 
--SET SalesAmount = NR.SalesAmount + NR.NextSalesAmount
SELECT NR.ID, NR.OrderID,
    NR.ProductName, NR.[Date],
    NR.SalesAmount + CASE WHEN NR.NextSalesAmount > 0 AND NR.NextSalesAmount < 1 THEN NR.NextSalesAmount ELSE 0.00 END AS SalesAmount
FROM NextRow NR
WHERE NOT (NR.SalesAmount > 0 AND NR.SalesAmount < 1)
ORDER BY OrderID ASC;

Thom~

Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk