August 17, 2022 at 10:12 am
Hi,
I have a scenario where my employee table has multiple IDs. The background to this scenario is that due to a change in the reporting systems employees who overlapped during the change ended up with more than one ID as in the table below
CREATE TABLE #input
(
Emp_codevarchar(300),
Employee_Namevarchar(300),
SSN varchar(300)
);
INSERT INTO #input (Emp_code,Employee_Name,SSN ) VALUES ('A00001', 'Employee Name 1', 'SSN1 ');
INSERT INTO #input (Emp_code,Employee_Name,SSN ) VALUES ('A00002', 'Employee Name 2', 'SSN2 ');
INSERT INTO #input (Emp_code,Employee_Name,SSN ) VALUES ('A00003', 'Employee Name 3', 'SSN3 ');
INSERT INTO #input (Emp_code,Employee_Name,SSN ) VALUES ('A00004', 'Employee Name 4', 'SSN4 ');
INSERT INTO #input (Emp_code,Employee_Name,SSN ) VALUES ('A00005', 'Employee Name 5', 'SSN5 ');
INSERT INTO #input (Emp_code,Employee_Name,SSN ) VALUES ('A00006', 'Employee Name 6', 'SSN6 ');
INSERT INTO #input (Emp_code,Employee_Name,SSN ) VALUES ('A00007', 'Employee Name 7', 'SSN7 ');
INSERT INTO #input (Emp_code,Employee_Name,SSN ) VALUES ('B90001', 'Employee Name 1', 'SSN1 ');
INSERT INTO #input (Emp_code,Employee_Name,SSN ) VALUES ('B90002', 'Employee Name 2', 'SSN2 ');
INSERT INTO #input (Emp_code,Employee_Name,SSN ) VALUES ('B90003', 'Employee Name 3', 'SSN3 ');
INSERT INTO #input (Emp_code,Employee_Name,SSN ) VALUES ('B90005', 'Employee Name 5', 'SSN5 ');
INSERT INTO #input (Emp_code,Employee_Name,SSN ) VALUES ('B90006', 'Employee Name 6', 'SSN6 ');
INSERT INTO #input (Emp_code,Employee_Name,SSN ) VALUES ('B90007', 'Employee Name 7', 'SSN7 ');
As can be seen from the table included, Employee Name 1 has two employee codes - A00001 and B90001. However, every employee has a unique SSN. In the list above, only Employee Name 4 has one Employee Code, which means this employee didn't exist in the previous system
I am trying to reconcile both sets of employees codes based on SSN such that there is a unique employee code which is a combination of the "New System"+"Old System" (where available)
The output I am trying to achieve should look something like this
CREATE TABLE #output
(
Emp_codevarchar(300),
Employee_Namevarchar(300),
SSN varchar(300)
);
INSERT INTO #output (Emp_code,Employee_Name,SSN ) VALUES ('A00001_B90001', 'Employee Name 1', 'SSN1 ');
INSERT INTO #output (Emp_code,Employee_Name,SSN ) VALUES ('A00002_B90002', 'Employee Name 2', 'SSN2 ');
INSERT INTO #output (Emp_code,Employee_Name,SSN ) VALUES ('A00003_B90003', 'Employee Name 3', 'SSN3 ');
INSERT INTO #output (Emp_code,Employee_Name,SSN ) VALUES ('A00004', 'Employee Name 4', 'SSN4 ');
INSERT INTO #output (Emp_code,Employee_Name,SSN ) VALUES ('A00005_B90005', 'Employee Name 5', 'SSN5 ');
INSERT INTO #output (Emp_code,Employee_Name,SSN ) VALUES ('A00006_B90006', 'Employee Name 6', 'SSN6 ');
INSERT INTO #output (Emp_code,Employee_Name,SSN ) VALUES ('A00007_B90007', 'Employee Name 7', 'SSN7 ');
I attempted this using Pivot table but getting stuck in splitting the SSN into multiple employee ids
Any help would be greatly appreciated
Thanks!
August 17, 2022 at 1:00 pm
Hi ,
you can try using below query
Select Employee_Name,SSN,STRING_AGG(Emp_code,'_') within group (order by Emp_code) from #input
group by Employee_Name,SSN
August 17, 2022 at 1:47 pm
SELECT
MIN(Emp_code) + CASE WHEN COUNT(*) > 1 THEN '_' + MAX(Emp_code) ELSE '' END AS Emp_code,
MAX(Employee_Name) AS Employee_Name,
SSN
FROM #input
GROUP BY SSN
ORDER BY SSN
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
August 18, 2022 at 12:11 am
Thanks for that. The sample I included was just a subset of my table. The information I am working with is payroll data. The table contains employees with multiple pay dates and pay types such tax, leave, bonus, and a few other specific to the company. This adds an extra level of granularity because of which there can be multiple instances of an Employee ID
I have tried to expand my dataset a bit more in the input table below if it helps
CREATE TABLE #input
(
Employee_ID varchar(300),
SSN varchar(300),
Pay_ID varchar(300),
Date varchar(300),
AmountPaid varchar(300),
Col1 varchar(300),
Col2 varchar(300),
Col3 varchar(300),
Col4 varchar(300),
source varchar(300)
);
INSERT INTO #input (Employee_ID,SSN,Pay_ID,Date,AmountPaid,Col1,Col2,Col3,Col4,source )
VALUES
('A00001', 'SSN1', 'A00001_20170910', '10/09/2017', '412', 'RandomText1', 'RandomText2', 'RandomText3', 'RandomText4', 'SYS1 '),
('A00001', 'SSN1', 'A00001_20170910', '10/09/2017', '164', 'RandomText2', 'RandomText3', 'RandomText4', 'RandomText5', 'SYS1 '),
('A00001', 'SSN1', 'A00001_20180114', '14/01/2018', '295', 'RandomText3', 'RandomText4', 'RandomText5', 'RandomText6', 'SYS1 '),
('B00001', 'SSN1', 'B00001_20201210', '10/12/2020', '146', 'RandomText4', 'RandomText5', 'RandomText6', 'RandomText7', 'SYS1 '),
('B00001', 'SSN1', 'B00001_20201210', '10/12/2020', '183', 'RandomText5', 'RandomText6', 'RandomText7', 'RandomText8', 'SYS1 '),
('B00001', 'SSN1', 'B00001_20201210', '10/12/2020', '211', 'RandomText6', 'RandomText7', 'RandomText8', 'RandomText9', 'SYS1 '),
('A00001', 'SSN1', 'A00001_20170521', '21/05/2017', '271', 'RandomText7', 'RandomText7', 'RandomText7', 'RandomText7', 'SYS2 '),
('A00001', 'SSN1', 'A00001_20170409', '9/04/2017', '310', 'RandomText8', 'RandomText8', 'RandomText8', 'RandomText8', 'SYS2 '),
('A00001', 'SSN1', 'A00001_20170409', '9/04/2017', '449', 'RandomText9', 'RandomText9', 'RandomText9', 'RandomText9', 'SYS2 '),
('A00001', 'SSN1', 'A00001_20170312', '12/03/2017', '385', 'RandomText10', 'RandomText10', 'RandomText10', 'RandomText10', 'SYS2 '),
('A00002', 'SSN2', 'A00002_20201126', '26/11/2020', '353', 'RandomText11', 'RandomText11', 'RandomText11', 'RandomText11', 'SYS1 '),
('A00002', 'SSN2', 'A00002_20201126', '26/11/2020', '305', 'RandomText12', 'RandomText12', 'RandomText12', 'RandomText12', 'SYS1 '),
('A00002', 'SSN2', 'A00002_20201126', '26/11/2020', '481', 'RandomText13', 'RandomText13', 'RandomText13', 'RandomText13', 'SYS1 '),
('B00002', 'SSN2', 'B00002_20160612', '12/06/2016', '461', 'RandomText14', 'RandomText14', 'RandomText14', 'RandomText14', 'SYS2 '),
('B00002', 'SSN2', 'B00002_20160612', '12/06/2016', '363', 'RandomText15', 'RandomText15', 'RandomText15', 'RandomText15', 'SYS2 '),
('B00002', 'SSN2', 'B00002_20160612', '12/06/2016', '499', 'RandomText16', 'RandomText16', 'RandomText16', 'RandomText16', 'SYS2 '),
('B00002', 'SSN2', 'B00002_20160626', '26/06/2016', '362', 'RandomText17', 'RandomText17', 'RandomText17', 'RandomText17', 'SYS2 ');
When I tried the Sting_Agg code, the result I get is a column with as many occurrences of an ID separated by a delimiter, which is not what I need. The expected output should be
CREATE TABLE #output
(
Employee_IDvarchar(300),
SSNvarchar(300),
Pay_IDvarchar(300),
Datevarchar(300),
AmountPaidvarchar(300),
Col1varchar(300),
Col2varchar(300),
Col3varchar(300),
Col4varchar(300),
source varchar(300)
);
INSERT INTO #output (Employee_ID,SSN,Pay_ID,Date,AmountPaid,Col1,Col2,Col3,Col4,source )
VALUES
('A00001_B00001', 'SSN1', 'A00001_20170910', '10/09/2017', '412', 'RandomText1', 'RandomText2', 'RandomText3', 'RandomText4', 'SYS1 '),
('A00001_B00001', 'SSN1', 'A00001_20170910', '10/09/2017', '164', 'RandomText2', 'RandomText3', 'RandomText4', 'RandomText5', 'SYS1 '),
('A00001_B00001', 'SSN1', 'A00001_20180114', '14/01/2018', '295', 'RandomText3', 'RandomText4', 'RandomText5', 'RandomText6', 'SYS1 '),
('A00001_B00001', 'SSN1', 'B00001_20201210', '10/12/2020', '146', 'RandomText4', 'RandomText5', 'RandomText6', 'RandomText7', 'SYS1 '),
('A00001_B00001', 'SSN1', 'B00001_20201210', '10/12/2020', '183', 'RandomText5', 'RandomText6', 'RandomText7', 'RandomText8', 'SYS1 '),
('A00001_B00001', 'SSN1', 'B00001_20201210', '10/12/2020', '211', 'RandomText6', 'RandomText7', 'RandomText8', 'RandomText9', 'SYS1 '),
('A00001_B00001', 'SSN1', 'A00001_20170521', '21/05/2017', '271', 'RandomText7', 'RandomText7', 'RandomText7', 'RandomText7', 'SYS2 '),
('A00001_B00001', 'SSN1', 'A00001_20170409', '9/04/2017', '310', 'RandomText8', 'RandomText8', 'RandomText8', 'RandomText8', 'SYS2 '),
('A00001_B00001', 'SSN1', 'A00001_20170409', '9/04/2017', '449', 'RandomText9', 'RandomText9', 'RandomText9', 'RandomText9', 'SYS2 '),
('A00001_B00001', 'SSN1', 'A00001_20170312', '12/03/2017', '385', 'RandomText10', 'RandomText10', 'RandomText10', 'RandomText10', 'SYS2 '),
('A00002_B00002', 'SSN2', 'A00002_20201126', '26/11/2020', '353', 'RandomText11', 'RandomText11', 'RandomText11', 'RandomText11', 'SYS1 '),
('A00002_B00002', 'SSN2', 'A00002_20201126', '26/11/2020', '305', 'RandomText12', 'RandomText12', 'RandomText12', 'RandomText12', 'SYS1 '),
('A00002_B00002', 'SSN2', 'A00002_20201126', '26/11/2020', '481', 'RandomText13', 'RandomText13', 'RandomText13', 'RandomText13', 'SYS1 '),
('A00002_B00002', 'SSN2', 'B00002_20160612', '12/06/2016', '461', 'RandomText14', 'RandomText14', 'RandomText14', 'RandomText14', 'SYS2 '),
('A00002_B00002', 'SSN2', 'B00002_20160612', '12/06/2016', '363', 'RandomText15', 'RandomText15', 'RandomText15', 'RandomText15', 'SYS2 '),
('A00002_B00002', 'SSN2', 'B00002_20160612', '12/06/2016', '499', 'RandomText16', 'RandomText16', 'RandomText16', 'RandomText16', 'SYS2 '),
('A00002_B00002', 'SSN2', 'B00002_20160626', '26/06/2016', '362', 'RandomText17', 'RandomText17', 'RandomText17', 'RandomText17', 'SYS2 ');
The number of rows remains the same in the input and the output. Because an SSN can have two ids associated with, I want to combine the two individual ids coming from the different systems and make it the new Employee ID
Hope that helped
Thanks
August 18, 2022 at 9:37 pm
How about something like this?
WITH ssn_empID AS (
SELECT DISTINCT
Employee_ID, SSN
FROM #input
),
ssn_empIDagg AS (
SELECT
STRING_AGG(Employee_ID,'_') AS newEmployee_ID,
SSN
FROM ssn_empID
GROUP BY SSN
)
SELECT
se.newEmployee_ID,
i.*
FROM #input i
JOIN ssn_empIDagg se ON se.ssn = i.ssn
August 22, 2022 at 2:43 am
hmm, I tried the above code. I end up getting a number ids concatenated together. For example, if ID A00001 has multiple rows on 3 paydates and Id B00001 has multiple rows for 2 paydates, I end up getting "A00001_A00001 _A00001_B00001 _B00001"
August 22, 2022 at 3:01 am
However, every employee has a unique SSN.
For the love of all that is holy in SQL Server, please tell me that your SSNs are NOT in plain, 9 digit text! If they are, you folks really need to fix them and I mean NOW! And, no... I'm not talking about hashing... I'm talking about real encryption with a salt and a master key, etc.
--Jeff Moden
Change is inevitable... Change for the better is not.
August 22, 2022 at 3:04 am
The column in my dataset is not actually SSN. I just used it as example in this scenario to highlight a unique column
August 22, 2022 at 3:09 am
Hopefully, that's true. The last person that told me that was at a company the I worked at temporarily. They told me that AFTER I had already seen that it was untrue.
--Jeff Moden
Change is inevitable... Change for the better is not.
August 26, 2022 at 1:04 pm
I think I managed to solve it using two CTEs and unioning them. I then used another CTE on the combined table to remove any duplicates
August 26, 2022 at 7:30 pm
I'm pretty sure you don't need multiple CTE's to accomplish this task. Given the data supplied in the original post, the following should work (note that varchar(300) is rather wasteful for the data supplied, so I brought that down to just 20):
CREATE TABLE #input (
Employee_ID varchar(20),
SSN varchar(20),
Pay_ID varchar(20),
[Date] varchar(20),
AmountPaid varchar(20),
Col1 varchar(20),
Col2 varchar(20),
Col3 varchar(20),
Col4 varchar(20),
[source] varchar(20)
);
INSERT INTO #input (Employee_ID,SSN,Pay_ID,Date,AmountPaid,Col1,Col2,Col3,Col4,source )
VALUES
('A00001', 'SSN1', 'A00001_20170910', '10/09/2017', '412', 'RandomText1', 'RandomText2', 'RandomText3', 'RandomText4', 'SYS1 '),
('A00001', 'SSN1', 'A00001_20170910', '10/09/2017', '164', 'RandomText2', 'RandomText3', 'RandomText4', 'RandomText5', 'SYS1 '),
('A00001', 'SSN1', 'A00001_20180114', '14/01/2018', '295', 'RandomText3', 'RandomText4', 'RandomText5', 'RandomText6', 'SYS1 '),
('B00001', 'SSN1', 'B00001_20201210', '10/12/2020', '146', 'RandomText4', 'RandomText5', 'RandomText6', 'RandomText7', 'SYS1 '),
('B00001', 'SSN1', 'B00001_20201210', '10/12/2020', '183', 'RandomText5', 'RandomText6', 'RandomText7', 'RandomText8', 'SYS1 '),
('B00001', 'SSN1', 'B00001_20201210', '10/12/2020', '211', 'RandomText6', 'RandomText7', 'RandomText8', 'RandomText9', 'SYS1 '),
('A00001', 'SSN1', 'A00001_20170521', '21/05/2017', '271', 'RandomText7', 'RandomText7', 'RandomText7', 'RandomText7', 'SYS2 '),
('A00001', 'SSN1', 'A00001_20170409', '9/04/2017', '310', 'RandomText8', 'RandomText8', 'RandomText8', 'RandomText8', 'SYS2 '),
('A00001', 'SSN1', 'A00001_20170409', '9/04/2017', '449', 'RandomText9', 'RandomText9', 'RandomText9', 'RandomText9', 'SYS2 '),
('A00001', 'SSN1', 'A00001_20170312', '12/03/2017', '385', 'RandomText10', 'RandomText10', 'RandomText10', 'RandomText10', 'SYS2 '),
('A00002', 'SSN2', 'A00002_20201126', '26/11/2020', '353', 'RandomText11', 'RandomText11', 'RandomText11', 'RandomText11', 'SYS1 '),
('A00002', 'SSN2', 'A00002_20201126', '26/11/2020', '305', 'RandomText12', 'RandomText12', 'RandomText12', 'RandomText12', 'SYS1 '),
('A00002', 'SSN2', 'A00002_20201126', '26/11/2020', '481', 'RandomText13', 'RandomText13', 'RandomText13', 'RandomText13', 'SYS1 '),
('B00002', 'SSN2', 'B00002_20160612', '12/06/2016', '461', 'RandomText14', 'RandomText14', 'RandomText14', 'RandomText14', 'SYS2 '),
('B00002', 'SSN2', 'B00002_20160612', '12/06/2016', '363', 'RandomText15', 'RandomText15', 'RandomText15', 'RandomText15', 'SYS2 '),
('B00002', 'SSN2', 'B00002_20160612', '12/06/2016', '499', 'RandomText16', 'RandomText16', 'RandomText16', 'RandomText16', 'SYS2 '),
('B00002', 'SSN2', 'B00002_20160626', '26/06/2016', '362', 'RandomText17', 'RandomText17', 'RandomText17', 'RandomText17', 'SYS2 ');
WITH AGGREGATES AS (
SELECT DISTINCT I.SSN, I.Employee_ID
FROM #input AS I
)
SELECT
A.SSN,
STRING_AGG(A.Employee_ID, '_') WITHIN GROUP (ORDER BY A.Employee_ID) AS NEW_EMP_ID
FROM AGGREGATES AS A
GROUP BY A.SSN
ORDER BY A.SSN;
The results are:
SSNNEW_EMP_ID
SSN1A00001_B00001
SSN2A00002_B00002
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
August 26, 2022 at 11:57 pm
The reason I used two CTEs is because I have another dataset in which the ID are in two different columns leading to duplicates
With the sample above, I am not able to extract the complete payroll data with all the columns. Here is the code I tried
WITH AGGREGATES AS (
SELECT DISTINCT I.*
FROM #input AS I
)
SELECT
A.SSN,
STRING_AGG(A.Employee_ID, '_') WITHIN GROUP (ORDER BY A.Employee_ID) AS NEW_EMP_ID
FROM AGGREGATES AS A
GROUP BY A.SSN
ORDER BY A.SSN;
Result
SSNNEW_EMP_ID
SSN1A00001_A00001_A00001_A00001_A00001_A00001_A00001_B00001_B00001_B00001
SSN2A00002_A00002_A00002_B00002_B00002_B00002_B00002
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply