Pivot columns to get 0 /1 flag columns

Question

Post reply

Pivot columns to get 0 /1 flag columns

masterelaichi

SSC-Addicted

Points: 444

September 4, 2022 at 8:23 am

#4083732

Hi,

I have a dataset of employees with different "Paytypes" in a single column. I am trying to pivot the table on the Paytype column so that each paytype can be treated as a flag column, i,e, the only possible values in this column should be 1 or 0

Below is a sample code with the dataset

CREATE TABLE #input 
(
    employee_codevarchar(300),
    paydatevarchar(300),
    Paytype varchar(300)
);

INSERT INTO #input (employee_code,paydate,Paytype )
VALUES
    ('Emp_1', '12/11/2013', 'Over_time'),
    ('Emp_1', '26/11/2013', 'Holiday_pay'),
    ('Emp_1', '10/12/2013', 'Meal_Allowance'),
    ('Emp_1', '10/12/2013', 'Meal_Allowance'), -- this is intentional
    ('Emp_2', '5/12/2017', 'Holiday_pay'),
    ('Emp_3', '5/12/2017', 'Holiday_pay'),
    ('Emp_4', '9/07/2013', 'Holiday_pay'),
    ('Emp_4', '9/07/2013', 'Travel_Allowance'),
    ('Emp_4', '9/07/2013', 'Meal_Allowance'),
    ('Emp_4', '23/07/2013', 'Holiday_pay'),
    ('Emp_4', '23/07/2013', 'Holiday_pay'),
    ('Emp_4', '6/08/2013', 'Travel_Allowance');


-----Pivot---
select * 
from #input
PIVOT
  (
    count (paytype)
    FOR paytype in (Over_time,Holiday_pay,Meal_Allowance,Travel_Allowance)

) as pivottable

When I run the above code I get values more than 1 which I think is due to the duplicate rows (as marked in the create table above). These duplicates are intentional because my actual dataset has a couple of other columns which I haven't included here

The output I am trying to get should look something like this

pivot

Any help would be much appreciated!

Viewing 15 posts - 1 through 15 (of 22 total)

You must be logged in to reply to this topic. Login to reply

Phil Parkin SSC Guru Points: 246960 More actions · Answer 1

Here's a solution which does not rely on PIVOT:

SELECT i.employee_code
      ,i.paydate
      ,Over_time = MAX(IIF(i.Paytype = 'Over_time', 1, 0))
      ,Holiday_pay = MAX(IIF(i.Paytype = 'Holiday_pay', 1, 0))
      ,Meal_Allowance = MAX(IIF(i.Paytype = 'Meal_Allowance', 1, 0))
      ,Travel_Allowance = MAX(IIF(i.Paytype = 'Travel_Allowance', 1, 0))
FROM #input i
GROUP BY i.employee_code
        ,i.paydate
ORDER BY i.employee_code
        ,i.paydate;

The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin

masterelaichi SSC-Addicted Points: 444 More actions · Answer 2

Thanks Phil. This does the job

I modified my pivot code by replacing the count with Max. This results in a text in the flag values instead of the 1 or 0. What I eventually have to do is to combine two of the flags , say Meal and Travel allowance into one column, say Work_travel_Allowance, i.e when either Meal or Travel is a "1" then Work_travel_Allowance will be triggered as a 1

In my original code with the pivot, I have used case statements to achieve this. I can do something similar with the code you have suggested using a temp table and then working it out column by column. Is there a way to do this without using an intermediate table ?

Phil Parkin SSC Guru Points: 246960 More actions · Answer 3

Combining two columns is a piece of cake with my code, like so:

SELECT i.employee_code
      ,i.paydate
      ,Over_time = MAX(IIF(i.Paytype = 'Over_time', 1, 0))
      ,Holiday_pay = MAX(IIF(i.Paytype = 'Holiday_pay', 1, 0))
      ,WorkTravelAllowance = MAX(IIF(i.Paytype IN ( 'Meal_Allowance', 'Travel_Allowance' ), 1, 0))
FROM #input i
GROUP BY i.employee_code
        ,i.paydate
ORDER BY i.employee_code
        ,i.paydate;

The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin

masterelaichi SSC-Addicted Points: 444 More actions · Answer 4

Thanks for the almost instant response! It worked like a charm!

I didn't know using the IN statement in an IIF condition was even possible ( I have to admit, I have no clue what that code is doing!). Need to read up on it

jcelko212 32090 SSCrazy Eights Points: 9303 More actions · Answer 5

Please learn what a table is. First, we need to fix your non-table.

CREATE TABLE Expenses

(emp_id CHAR(10) NOT NULL,

pay_date DATE NOT NULL, -- read ISO-8601 formats

expense_type VARCHAR(30) NOT NULL

CHECK(expense_type IN ('Over_time', 'Holiday_pay', 'Meal_Allowance','Travel_Allowance'),

PRIMARY KEY (emp_id,pay_date) -- not an option!!

);

1) the table name should tell us what the set is, not how you use it

2) the schema should not have sooooo many NULLs

3) only 1960's COBOL has all-character columns; we have data types!

4) column names also have a standard

5) we have DATE data type; it is not THREE HUNDRED CHINESE Unicode characters!

6) the only format in ANSI/ISO Standards for dates ISO-8601

7) a table has to have a key to be a table! you have none and no way to get one.

8) a table, therefore, has no duplicate rows. This is not a option! Your comment tells us that your code is awful and dangerous.

9) Why did you fail to use CHECK()? SQL programmers think in a declarative language, so it is natural.

INSERT INTO Expenses (emp_id, paydate, expense_type) VALUES

('Emp_1', '2013-11-11', 'Over_time'),

('Emp_1', '2013-11-26', 'Holiday_pay'),

('Emp_1', '2013-10-12', 'Meal_Allowance'),

-- ('Emp_1', '2013-10-12', 'Meal_Allowance'), -- this is not relational!!

('Emp_2', '2017-05-12', 'Holiday_pay'),

('Emp_3', '2017-05-12', 'Holiday_pay'),

('Emp_4', '2013-07-09', 'Holiday_pay'),

('Emp_4', '2013-07-09', 'Travel_Allowance'),

('Emp_4', '2013-07-09', 'Meal_Allowance'),

('Emp_4', '2013-07-23', 'Holiday_pay'),

('Emp_4', '2013-06-08', 'Travel_Allowance');

SQL Programmers hate assembly language style bit flags, but you are using SQL as a report writer; Phil's answer will work. You might want to use CASE or SIGN and not the local spreadsheet IFF dialect.

Please post DDL and follow ANSI/ISO standards when asking for help.

Jeff Moden SSC Guru Points: 1004340 More actions · Answer 6

jcelko212 32090 wrote:

Please learn what a table is. First, we need to fix your non-table.
CREATE TABLE Expenses (emp_id CHAR(10) NOT NULL, pay_date DATE NOT NULL, -- read ISO-8601 formats expense_type VARCHAR(30) NOT NULL CHECK(expense_type IN ('Over_time', 'Holiday_pay', 'Meal_Allowance','Travel_Allowance'), PRIMARY KEY (emp_id,pay_date) -- not an option!! );

Your key is incorrect for the given data even in the table you claim is correct above.

We don't use hard-coded checks for this type of thing. It should be in a reference table and the check you're using should actually be an FK check.

Of course, that would only apply if the table were a permanent table. The op has not posted an permanent table. He's posted a working table as a Temp Table.

jcelko212 32090 wrote:

1) the table name should tell us what the set is, not how you use it

He did. As previously stated, it's a temporary working table being used for input for an example.

jcelko212 32090 wrote:

2) the schema should not have sooooo many NULLs

Again, it's not a real table. It an example that provides enough to solve the question being asked.

jcelko212 32090 wrote:

3) only 1960's COBOL has all-character columns; we have data types!

Not according to many of your previous posts where you usually insist on turning a perfectly good integer column in a VARCHAR() column just because it's being used as a surrogate key.

jcelko212 32090 wrote:

4) column names also have a standard

Unfortunately, the "standard" which you speak of costs hundreds of dollars.

jcelko212 32090 wrote:

5) we have DATE data type; it is not THREE HUNDRED CHINESE Unicode characters!

Finally, something that I mostly agree with, although the OP didn't use UNICODE.

jcelko212 32090 wrote:

6) the only format in ANSI/ISO Standards for dates ISO-8601

You regularly recommend violating those standards with your YYYY-MM-00 format recommendations. Also, be aware that YYYY-MM-DD will produce some rather unexpected results if the the current language is French or a few others.

jcelko212 32090 wrote:

7) a table has to have a key to be a table! you have none and no way to get one.

If that's true, why did you add one (an incorrect one for the given table, as previously stated) in the code in your table example?

jcelko212 32090 wrote:

8) a table, therefore, has no duplicate rows. This is not a option! Your comment tells us that your code is awful and dangerous. 9) Why did you fail to use CHECK()? SQL programmers think in a declarative language, so it is natural.

Actually, it IS an option, especially when you're trying to demonstrate an issue. You don't know where the data the op is using is coming from nor whether or not the op has any control over it.

jcelko212 32090 wrote:

SQL Programmers hate assembly language style bit flags,...

No "we" don't. They're sometimes used incorrectly and that's what we hate. They can incredibly useful when properly used.

jcelko212 32090 wrote:

... but you are using SQL as a report writer;

Which isn't far off from when you're trying to normalize data, like the OP is trying to.

jcelko212 32090 wrote:

Phil's answer will work.

Darned tootin' it will!

jcelko212 32090 wrote:

You might want to use CASE or SIGN and not the local spreadsheet IFF dialect.

The correct name of the function is "IIF", not "IFF" but, as to your suggestion to use CASE instead, why? There are only two conditions for each instance in Phil's code. There's no need to clutter-up the code with a CASE/WHEN/THEN/ELSE/END here and there is no performance advantage to using CASE instead of IIF. And IIF isn't just a "spreadsheet dialect". It's used in a whole lot of places.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

masterelaichi SSC-Addicted Points: 444 More actions · Answer 7

jcelko212 32090 wrote:

Please learn what a table is. First, we need to fix your non-table.

Thanks for your "constructive" criticism of my dummy data and code which I provided. Even though I am relatively new to SQL I think I have a very basic and general idea what a table is 🙂

Thanks for replying to jcelko212's post for me , Jeff. I couldn't have put it better !

The only thing I want to add is that I do use YYYY-mm-dd date formats in my tables. The table I have used here was just a quick and dirty one which I created using excel and a table generator which I found online which outputs a Create table sql code. The output it generates doesn't identify datatypes unfortunately

masterelaichi SSC-Addicted Points: 444 More actions · Answer 8

jcelko212's post got me thinking

I tried to change the datatype of the paydate column to date by using the alter command but ended up getting this error

"Conversion failed when converting date and/or time from character string."

drop table if exists #input

CREATE TABLE #input 
(
    employee_code varchar(300),
    paydate varchar(300),
    Paytype varchar(300)
);

INSERT INTO #input (employee_code,paydate,Paytype )
VALUES
    ('Emp_1', '12/11/2013', 'Over_time'),
    ('Emp_1', '26/11/2013', 'Holiday_pay'),
    ('Emp_1', '10/12/2013', 'Meal_Allowance'),
    ('Emp_1', '10/12/2013', 'Meal_Allowance'), -- this is intentional
    ('Emp_2', '5/12/2017', 'Holiday_pay'),
    ('Emp_3', '5/12/2017', 'Holiday_pay'),
    ('Emp_4', '9/07/2013', 'Holiday_pay'),
    ('Emp_4', '9/07/2013', 'Travel_Allowance'),
    ('Emp_4', '9/07/2013', 'Meal_Allowance'),
    ('Emp_4', '23/07/2013', 'Holiday_pay'),
    ('Emp_4', '23/07/2013', 'Holiday_pay'),
    ('Emp_4', '6/08/2013', 'Travel_Allowance');

alter table #input
alter column paydate  date

However, it worked when I used the update statement like so

drop table if exists #input

CREATE TABLE #input 
(
    employee_code varchar(300),
    paydate varchar(300),
    Paytype varchar(300)
);

INSERT INTO #input (employee_code,paydate,Paytype )
VALUES
    ('Emp_1', '12/11/2013', 'Over_time'),
    ('Emp_1', '26/11/2013', 'Holiday_pay'),
    ('Emp_1', '10/12/2013', 'Meal_Allowance'),
    ('Emp_1', '10/12/2013', 'Meal_Allowance'), -- this is intentional
    ('Emp_2', '5/12/2017', 'Holiday_pay'),
    ('Emp_3', '5/12/2017', 'Holiday_pay'),
    ('Emp_4', '9/07/2013', 'Holiday_pay'),
    ('Emp_4', '9/07/2013', 'Travel_Allowance'),
    ('Emp_4', '9/07/2013', 'Meal_Allowance'),
    ('Emp_4', '23/07/2013', 'Holiday_pay'),
    ('Emp_4', '23/07/2013', 'Holiday_pay'),
    ('Emp_4', '6/08/2013', 'Travel_Allowance');

 update #input
set paydate=CONVERT(date,paydate,103)

Why is it that the update works but not the alter command ? A google search led me to this link below. But I still don't understand why this is

https://www.geeksforgeeks.org/difference-between-alter-and-update-command-in-sql/

Thanks

This reply was modified 2 years, 2 months ago by masterelaichi.

Jeff Moden SSC Guru Points: 1004340 More actions · Answer 9

masterelaichi wrote:

jcelko212's post got me thinking

I tried to change the datatype of the paydate column to date by using the alter command but ended up getting this error "Conversion failed when converting date and/or time from character string."

drop table if exists #input

CREATE TABLE #input 
(
    employee_code varchar(300),
    paydate varchar(300),
    Paytype varchar(300)
);

INSERT INTO #input (employee_code,paydate,Paytype )
VALUES
    ('Emp_1', '12/11/2013', 'Over_time'),
    ('Emp_1', '26/11/2013', 'Holiday_pay'),
    ('Emp_1', '10/12/2013', 'Meal_Allowance'),
    ('Emp_1', '10/12/2013', 'Meal_Allowance'), -- this is intentional
    ('Emp_2', '5/12/2017', 'Holiday_pay'),
    ('Emp_3', '5/12/2017', 'Holiday_pay'),
    ('Emp_4', '9/07/2013', 'Holiday_pay'),
    ('Emp_4', '9/07/2013', 'Travel_Allowance'),
    ('Emp_4', '9/07/2013', 'Meal_Allowance'),
    ('Emp_4', '23/07/2013', 'Holiday_pay'),
    ('Emp_4', '23/07/2013', 'Holiday_pay'),
    ('Emp_4', '6/08/2013', 'Travel_Allowance');

alter table #input
alter column paydate  date

However, it worked when I used the update statement like so

drop table if exists #input

CREATE TABLE #input 
(
    employee_code varchar(300),
    paydate varchar(300),
    Paytype varchar(300)
);

INSERT INTO #input (employee_code,paydate,Paytype )
VALUES
    ('Emp_1', '12/11/2013', 'Over_time'),
    ('Emp_1', '26/11/2013', 'Holiday_pay'),
    ('Emp_1', '10/12/2013', 'Meal_Allowance'),
    ('Emp_1', '10/12/2013', 'Meal_Allowance'), -- this is intentional
    ('Emp_2', '5/12/2017', 'Holiday_pay'),
    ('Emp_3', '5/12/2017', 'Holiday_pay'),
    ('Emp_4', '9/07/2013', 'Holiday_pay'),
    ('Emp_4', '9/07/2013', 'Travel_Allowance'),
    ('Emp_4', '9/07/2013', 'Meal_Allowance'),
    ('Emp_4', '23/07/2013', 'Holiday_pay'),
    ('Emp_4', '23/07/2013', 'Holiday_pay'),
    ('Emp_4', '6/08/2013', 'Travel_Allowance');

 update #input
set paydate=CONVERT(date,paydate,103)

Why is it that the update works but not the alter command ? A google search led me to this link below. But I still don't understand why this is https://www.geeksforgeeks.org/difference-between-alter-and-update-command-in-sql/

Thanks

Like I said, this is something that Joe got right. The format of your string data is DD/MM/YYYY. What is the language that your system is set for? You can find that out easily enough by running the following...

sp_helplanguage @@Language

Once you have that, look at the 2nd column titled "dateformat"... that will tell you what the "default" format is for "punctuated dates". There's a high probability that your format will return something other than DMY and that's why you needed to CONVERT the date strings prior to inserting them into a DATE column.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

masterelaichi SSC-Addicted Points: 444 More actions · Answer 10

Jeff Moden wrote:

sp_helplanguage @@Language
Once you have that, look at the 2nd column titled "dateformat"... that will tell you what the "default" format is for "punctuated dates". There's a high probability that your format will return something other than DMY and that's why you needed to CONVERT the date strings prior to inserting them into a DATE column.

You are right. I ran that query and it says "mdy"

I tried to convert it prior to inserting like this

drop table if exists #input

CREATE TABLE #input 
(
    employee_code varchar(300),
    paydate varchar(300),
    Paytype varchar(300)
);

alter table #input
alter column paydate  date

INSERT INTO #input (employee_code,paydate,Paytype )
VALUES
    ('Emp_1', '12/11/2013', 'Over_time'),
    ('Emp_1', '26/11/2013', 'Holiday_pay'),
    ('Emp_1', '10/12/2013', 'Meal_Allowance'),
    ('Emp_1', '10/12/2013', 'Meal_Allowance'), -- this is intentional
    ('Emp_2', '5/12/2017', 'Holiday_pay'),
    ('Emp_3', '5/12/2017', 'Holiday_pay'),
    ('Emp_4', '9/07/2013', 'Holiday_pay'),
    ('Emp_4', '9/07/2013', 'Travel_Allowance'),
    ('Emp_4', '9/07/2013', 'Meal_Allowance'),
    ('Emp_4', '23/07/2013', 'Holiday_pay'),
    ('Emp_4', '23/07/2013', 'Holiday_pay'),
    ('Emp_4', '6/08/2013', 'Travel_Allowance');

But I still get that date conversion error

Unless you meant conversion at the datasource ?

Phil Parkin SSC Guru Points: 246960 More actions · Answer 11

You should get into the habit of using 'YYYYMMDD' as your literal date format. This format is unambiguous and works regardless of which 'DATEFORMAT' is in place.

Regarding your question, your default date format is MDY (as you have discovered), yet you are attempting to insert dates in DMY format – so why does the error surprise you?

IF you add the statement

SET DATEFORMAT DMY;

before you attempt the INSERT, the error will go away and the INSERT runs successfully.

The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin

Jeffrey Williams SSC Guru Points: 90303 More actions · Answer 12

Phil Parkin wrote:

IF you add the statement
SET DATEFORMAT DMY;
before you attempt the INSERT, the error will go away and the INSERT runs successfully.

Although this is an option - don't do this. Change the insert statement to use YYYYMMDD or YYYY-MM-DD (for date, datetime2 or datetimeoffset only) and then you won't have to worry about the date format setting.

INSERT INTO #input (employee_code,paydate,Paytype )
VALUES
    ('Emp_4', '20130723', 'Holiday_pay'),
    ('Emp_4', '20130723', 'Holiday_pay');

Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

― Charles R. Swindoll

How to post questions to get better answers faster
Managing Transaction Logs

Jeff Moden SSC Guru Points: 1004340 More actions · Answer 13

Jeffrey Williams wrote:

Phil Parkin wrote:
IF you add the statement
SET DATEFORMAT DMY;
before you attempt the INSERT, the error will go away and the INSERT runs successfully.
Although this is an option - don't do this. Change the insert statement to use YYYYMMDD or YYYY-MM-DD (for date, datetime2 or datetimeoffset only) and then you won't have to worry about the date format setting.
INSERT INTO #input (employee_code,paydate,Paytype )
VALUES
    ('Emp_4', '20130723', 'Holiday_pay'),
    ('Emp_4', '20130723', 'Holiday_pay');

It might not be an option depending on where they're getting the data from.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004340 More actions · Answer 14

masterelaichi wrote:

Unless you meant conversion at the datasource ?

As Jeffrey Williams is implying, that would be the best solution.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)