Pivot columns to get 0 /1 flag columns

  • Hi,

    I have a dataset of employees with different "Paytypes" in a single column. I am trying to pivot the table on the Paytype column so that each paytype can be treated as a flag column, i,e, the only possible values in this column should be 1 or 0

    Below is a sample code with the dataset

    CREATE TABLE #input 
    (
    employee_codevarchar(300),
    paydatevarchar(300),
    Paytype varchar(300)
    );

    INSERT INTO #input (employee_code,paydate,Paytype )
    VALUES
    ('Emp_1', '12/11/2013', 'Over_time'),
    ('Emp_1', '26/11/2013', 'Holiday_pay'),
    ('Emp_1', '10/12/2013', 'Meal_Allowance'),
    ('Emp_1', '10/12/2013', 'Meal_Allowance'), -- this is intentional
    ('Emp_2', '5/12/2017', 'Holiday_pay'),
    ('Emp_3', '5/12/2017', 'Holiday_pay'),
    ('Emp_4', '9/07/2013', 'Holiday_pay'),
    ('Emp_4', '9/07/2013', 'Travel_Allowance'),
    ('Emp_4', '9/07/2013', 'Meal_Allowance'),
    ('Emp_4', '23/07/2013', 'Holiday_pay'),
    ('Emp_4', '23/07/2013', 'Holiday_pay'),
    ('Emp_4', '6/08/2013', 'Travel_Allowance');


    -----Pivot---
    select *
    from #input
    PIVOT
    (
    count (paytype)
    FOR paytype in (Over_time,Holiday_pay,Meal_Allowance,Travel_Allowance)

    ) as pivottable

    When I run the above code I get values more than 1 which I think  is due to the duplicate rows (as marked in the create table above). These duplicates are intentional because my actual dataset has a couple of other columns which I haven't included here

    The output I am trying to get should look something like this

    pivot

    Any help would be much appreciated!

  • Here's a solution which does not rely on PIVOT:

    SELECT i.employee_code
    ,i.paydate
    ,Over_time = MAX(IIF(i.Paytype = 'Over_time', 1, 0))
    ,Holiday_pay = MAX(IIF(i.Paytype = 'Holiday_pay', 1, 0))
    ,Meal_Allowance = MAX(IIF(i.Paytype = 'Meal_Allowance', 1, 0))
    ,Travel_Allowance = MAX(IIF(i.Paytype = 'Travel_Allowance', 1, 0))
    FROM #input i
    GROUP BY i.employee_code
    ,i.paydate
    ORDER BY i.employee_code
    ,i.paydate;

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Thanks Phil. This does the job

    I modified my pivot code by replacing the count with Max. This results in a text in the flag values instead of the 1 or 0. What I eventually have to do is to combine two of the flags , say Meal and Travel allowance into one column, say Work_travel_Allowance, i.e when either Meal or Travel is a "1" then Work_travel_Allowance will be triggered as a 1

    In my original code with the pivot, I have used case statements to achieve this. I can do something similar with the code you have suggested using a temp table and then working it out column by column. Is there a way to do this without using an intermediate table ?

  • Combining two columns is a piece of cake with my code, like so:

    SELECT i.employee_code
    ,i.paydate
    ,Over_time = MAX(IIF(i.Paytype = 'Over_time', 1, 0))
    ,Holiday_pay = MAX(IIF(i.Paytype = 'Holiday_pay', 1, 0))
    ,WorkTravelAllowance = MAX(IIF(i.Paytype IN ( 'Meal_Allowance', 'Travel_Allowance' ), 1, 0))
    FROM #input i
    GROUP BY i.employee_code
    ,i.paydate
    ORDER BY i.employee_code
    ,i.paydate;

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Thanks for the almost instant response! It worked like a charm!

    I didn't know using the IN statement in an IIF condition was even possible ( I have to admit, I have no clue what that code is doing!). Need to read up on it

  • Please learn what a table is. First, we need to fix your non-table.

    CREATE TABLE Expenses

    (emp_id CHAR(10) NOT NULL,

    pay_date DATE NOT NULL, -- read ISO-8601 formats

    expense_type VARCHAR(30) NOT NULL

    CHECK(expense_type IN ('Over_time', 'Holiday_pay', 'Meal_Allowance','Travel_Allowance'),

    PRIMARY KEY (emp_id,pay_date) -- not an option!!

    );

    1) the table name should tell us what the set is, not how you use it

    2) the schema should not have sooooo many NULLs

    3) only 1960's COBOL has all-character columns; we have data types!

    4) column names also have a standard

    5) we have DATE data type; it is not THREE HUNDRED CHINESE Unicode characters!

    6) the only format in ANSI/ISO Standards for dates ISO-8601

    7) a table has to have a key to be a table! you have none and no way to get one.

    8) a table, therefore, has no duplicate rows. This is not a option! Your comment tells us that your code is awful and dangerous.

    9) Why did you fail to use CHECK()? SQL programmers think in a declarative language, so it is natural.

    INSERT INTO Expenses (emp_id, paydate, expense_type) VALUES

    ('Emp_1', '2013-11-11', 'Over_time'),

    ('Emp_1', '2013-11-26', 'Holiday_pay'),

    ('Emp_1', '2013-10-12', 'Meal_Allowance'),

    -- ('Emp_1', '2013-10-12', 'Meal_Allowance'), -- this is not relational!!

    ('Emp_2', '2017-05-12', 'Holiday_pay'),

    ('Emp_3', '2017-05-12', 'Holiday_pay'),

    ('Emp_4', '2013-07-09', 'Holiday_pay'),

    ('Emp_4', '2013-07-09', 'Travel_Allowance'),

    ('Emp_4', '2013-07-09', 'Meal_Allowance'),

    ('Emp_4', '2013-07-23', 'Holiday_pay'),

    ('Emp_4', '2013-07-23', 'Holiday_pay'),

    ('Emp_4', '2013-06-08', 'Travel_Allowance');

    SQL Programmers hate assembly language style bit flags, but you are using SQL as a report writer; Phil's answer will work. You might want to use CASE or SIGN and not the local spreadsheet IFF dialect.

    Please post DDL and follow ANSI/ISO standards when asking for help. 

  • jcelko212 32090 wrote:

    Please learn what a table is. First, we need to fix your non-table.

    CREATE TABLE Expenses (emp_id CHAR(10) NOT NULL, pay_date DATE NOT NULL, -- read ISO-8601 formats expense_type VARCHAR(30) NOT NULL CHECK(expense_type IN ('Over_time', 'Holiday_pay', 'Meal_Allowance','Travel_Allowance'), PRIMARY KEY (emp_id,pay_date) -- not an option!! );

    Your key is incorrect for the given data even in the table you claim is correct above.

    We don't use hard-coded checks for this type of thing.  It should be in a reference table and the check you're using should actually be an FK check.

    Of course, that would only apply if the table were a permanent table.  The op has not posted an permanent table.  He's posted a working table as a Temp Table.

    jcelko212 32090 wrote:

    1) the table name should tell us what the set is, not how you use it

    He did.  As previously stated, it's a temporary working table being used for input for an example.

    jcelko212 32090 wrote:

    2) the schema should not have sooooo many NULLs

    Again, it's not a real table.  It an example that provides enough to solve the question being asked.

    jcelko212 32090 wrote:

    3) only 1960's COBOL has all-character columns; we have data types!

    Not according to many of your previous posts where you usually insist on turning a perfectly good integer column in a VARCHAR() column just because it's being used as a surrogate key.

    jcelko212 32090 wrote:

    4) column names also have a standard

    Unfortunately, the "standard" which you speak of costs hundreds of dollars.

    jcelko212 32090 wrote:

    5) we have DATE data type; it is not THREE HUNDRED CHINESE Unicode characters!

    Finally, something that I mostly agree with, although the OP didn't use UNICODE.

    jcelko212 32090 wrote:

    6) the only format in ANSI/ISO Standards for dates ISO-8601

    You regularly recommend violating those standards with your YYYY-MM-00 format recommendations.  Also, be aware that YYYY-MM-DD will produce some rather unexpected results if the the current language is French or a few others.

    jcelko212 32090 wrote:

    7) a table has to have a key to be a table! you have none and no way to get one.

    If that's true, why did you add one (an incorrect one for the given table, as previously stated) in the code in your table example?

    jcelko212 32090 wrote:

    8) a table, therefore, has no duplicate rows. This is not a option! Your comment tells us that your code is awful and dangerous. 9) Why did you fail to use CHECK()? SQL programmers think in a declarative language, so it is natural.

    Actually, it IS an option, especially when you're trying to demonstrate an issue.  You don't know where the data the op is using is coming from nor whether or not the op has any control over it.

    jcelko212 32090 wrote:

    SQL Programmers hate assembly language style bit flags,...

    No "we" don't.  They're sometimes used incorrectly and that's what we hate.  They can incredibly useful when properly used.

    jcelko212 32090 wrote:

    ... but you are using SQL as a report writer;

    Which isn't far off from when you're trying to normalize data, like the OP is trying to.

    jcelko212 32090 wrote:

    Phil's answer will work.

    Darned tootin' it will!

    jcelko212 32090 wrote:

    You might want to use CASE or SIGN and not the local spreadsheet IFF dialect.

    The correct name of the function is "IIF", not "IFF" but, as to your suggestion to use CASE instead, why?  There are only two conditions for each instance in Phil's code.  There's no need to clutter-up the code with a CASE/WHEN/THEN/ELSE/END here and there is no performance advantage to using CASE instead of IIF.   And IIF isn't just a "spreadsheet dialect".  It's used in a whole lot of places.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • jcelko212 32090 wrote:

    Please learn what a table is. First, we need to fix your non-table.

    Thanks for your "constructive" criticism of my dummy data and code which I provided. Even though I am relatively new to SQL I think I have a very basic and general idea what a table is 🙂

    Thanks for replying to jcelko212's post for me , Jeff. I couldn't have put it better !

    The only thing I want to add is that I do use YYYY-mm-dd date formats in my tables. The table I have used here was just a quick and dirty one which I created using excel and a table generator  which I found online which outputs a Create table sql code. The output it generates doesn't identify datatypes unfortunately

  • jcelko212's  post got me thinking

    I tried to change the datatype of the paydate column to date by using the alter command but ended up getting this error

    "Conversion failed when converting date and/or time from character string."

    drop table if exists #input

    CREATE TABLE #input
    (
    employee_code varchar(300),
    paydate varchar(300),
    Paytype varchar(300)
    );

    INSERT INTO #input (employee_code,paydate,Paytype )
    VALUES
    ('Emp_1', '12/11/2013', 'Over_time'),
    ('Emp_1', '26/11/2013', 'Holiday_pay'),
    ('Emp_1', '10/12/2013', 'Meal_Allowance'),
    ('Emp_1', '10/12/2013', 'Meal_Allowance'), -- this is intentional
    ('Emp_2', '5/12/2017', 'Holiday_pay'),
    ('Emp_3', '5/12/2017', 'Holiday_pay'),
    ('Emp_4', '9/07/2013', 'Holiday_pay'),
    ('Emp_4', '9/07/2013', 'Travel_Allowance'),
    ('Emp_4', '9/07/2013', 'Meal_Allowance'),
    ('Emp_4', '23/07/2013', 'Holiday_pay'),
    ('Emp_4', '23/07/2013', 'Holiday_pay'),
    ('Emp_4', '6/08/2013', 'Travel_Allowance');

    alter table #input
    alter column paydate date

    However, it worked when I used the update statement like so

    drop table if exists #input

    CREATE TABLE #input
    (
    employee_code varchar(300),
    paydate varchar(300),
    Paytype varchar(300)
    );

    INSERT INTO #input (employee_code,paydate,Paytype )
    VALUES
    ('Emp_1', '12/11/2013', 'Over_time'),
    ('Emp_1', '26/11/2013', 'Holiday_pay'),
    ('Emp_1', '10/12/2013', 'Meal_Allowance'),
    ('Emp_1', '10/12/2013', 'Meal_Allowance'), -- this is intentional
    ('Emp_2', '5/12/2017', 'Holiday_pay'),
    ('Emp_3', '5/12/2017', 'Holiday_pay'),
    ('Emp_4', '9/07/2013', 'Holiday_pay'),
    ('Emp_4', '9/07/2013', 'Travel_Allowance'),
    ('Emp_4', '9/07/2013', 'Meal_Allowance'),
    ('Emp_4', '23/07/2013', 'Holiday_pay'),
    ('Emp_4', '23/07/2013', 'Holiday_pay'),
    ('Emp_4', '6/08/2013', 'Travel_Allowance');

    update #input
    set paydate=CONVERT(date,paydate,103)

    Why is it that the update works but not the alter command ? A google search led me to this link below. But I still don't understand why this is

    https://www.geeksforgeeks.org/difference-between-alter-and-update-command-in-sql/

     

    Thanks

    • This reply was modified 2 years, 2 months ago by  masterelaichi.
  • masterelaichi wrote:

    jcelko212's  post got me thinking

    I tried to change the datatype of the paydate column to date by using the alter command but ended up getting this error "Conversion failed when converting date and/or time from character string."

    drop table if exists #input

    CREATE TABLE #input
    (
    employee_code varchar(300),
    paydate varchar(300),
    Paytype varchar(300)
    );

    INSERT INTO #input (employee_code,paydate,Paytype )
    VALUES
    ('Emp_1', '12/11/2013', 'Over_time'),
    ('Emp_1', '26/11/2013', 'Holiday_pay'),
    ('Emp_1', '10/12/2013', 'Meal_Allowance'),
    ('Emp_1', '10/12/2013', 'Meal_Allowance'), -- this is intentional
    ('Emp_2', '5/12/2017', 'Holiday_pay'),
    ('Emp_3', '5/12/2017', 'Holiday_pay'),
    ('Emp_4', '9/07/2013', 'Holiday_pay'),
    ('Emp_4', '9/07/2013', 'Travel_Allowance'),
    ('Emp_4', '9/07/2013', 'Meal_Allowance'),
    ('Emp_4', '23/07/2013', 'Holiday_pay'),
    ('Emp_4', '23/07/2013', 'Holiday_pay'),
    ('Emp_4', '6/08/2013', 'Travel_Allowance');

    alter table #input
    alter column paydate date

    However, it worked when I used the update statement like so

    drop table if exists #input

    CREATE TABLE #input
    (
    employee_code varchar(300),
    paydate varchar(300),
    Paytype varchar(300)
    );

    INSERT INTO #input (employee_code,paydate,Paytype )
    VALUES
    ('Emp_1', '12/11/2013', 'Over_time'),
    ('Emp_1', '26/11/2013', 'Holiday_pay'),
    ('Emp_1', '10/12/2013', 'Meal_Allowance'),
    ('Emp_1', '10/12/2013', 'Meal_Allowance'), -- this is intentional
    ('Emp_2', '5/12/2017', 'Holiday_pay'),
    ('Emp_3', '5/12/2017', 'Holiday_pay'),
    ('Emp_4', '9/07/2013', 'Holiday_pay'),
    ('Emp_4', '9/07/2013', 'Travel_Allowance'),
    ('Emp_4', '9/07/2013', 'Meal_Allowance'),
    ('Emp_4', '23/07/2013', 'Holiday_pay'),
    ('Emp_4', '23/07/2013', 'Holiday_pay'),
    ('Emp_4', '6/08/2013', 'Travel_Allowance');

    update #input
    set paydate=CONVERT(date,paydate,103)

    Why is it that the update works but not the alter command ? A google search led me to this link below. But I still don't understand why this is https://www.geeksforgeeks.org/difference-between-alter-and-update-command-in-sql/

    Thanks

    Like I said, this is something that Joe got right.  The format of your string data is DD/MM/YYYY.  What is the language that your system is set for?  You can find that out easily enough by running the following...

    sp_helplanguage @@Language

    Once you have that, look at the 2nd column titled "dateformat"... that will tell you what the "default" format is for "punctuated dates".  There's a high probability that your format will return something other than DMY and that's why you needed to CONVERT the date strings prior to inserting them into a DATE column.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    sp_helplanguage @@Language

    Once you have that, look at the 2nd column titled "dateformat"... that will tell you what the "default" format is for "punctuated dates".  There's a high probability that your format will return something other than DMY and that's why you needed to CONVERT the date strings prior to inserting them into a DATE column.

    You are right. I ran that query and it says "mdy"

    I tried to convert it prior to inserting like this

    drop table if exists #input

    CREATE TABLE #input
    (
    employee_code varchar(300),
    paydate varchar(300),
    Paytype varchar(300)
    );

    alter table #input
    alter column paydate date

    INSERT INTO #input (employee_code,paydate,Paytype )
    VALUES
    ('Emp_1', '12/11/2013', 'Over_time'),
    ('Emp_1', '26/11/2013', 'Holiday_pay'),
    ('Emp_1', '10/12/2013', 'Meal_Allowance'),
    ('Emp_1', '10/12/2013', 'Meal_Allowance'), -- this is intentional
    ('Emp_2', '5/12/2017', 'Holiday_pay'),
    ('Emp_3', '5/12/2017', 'Holiday_pay'),
    ('Emp_4', '9/07/2013', 'Holiday_pay'),
    ('Emp_4', '9/07/2013', 'Travel_Allowance'),
    ('Emp_4', '9/07/2013', 'Meal_Allowance'),
    ('Emp_4', '23/07/2013', 'Holiday_pay'),
    ('Emp_4', '23/07/2013', 'Holiday_pay'),
    ('Emp_4', '6/08/2013', 'Travel_Allowance');

    But I still get that date conversion error

    Unless you meant conversion at the datasource ?

     

  • You should get into the habit of using 'YYYYMMDD' as your literal date format. This format is unambiguous and works regardless of which 'DATEFORMAT' is in place.

    Regarding your question, your default date format is MDY (as you have discovered), yet you are attempting to insert dates in DMY format – so why does the error surprise you?

    IF you add the statement

    SET DATEFORMAT DMY;

    before you attempt the INSERT, the error will go away and the INSERT runs successfully.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Phil Parkin wrote:

    IF you add the statement

    SET DATEFORMAT DMY;

    before you attempt the INSERT, the error will go away and the INSERT runs successfully.

    Although this is an option - don't do this.  Change the insert statement to use YYYYMMDD or YYYY-MM-DD (for date, datetime2 or datetimeoffset only) and then you won't have to worry about the date format setting.

    INSERT INTO #input (employee_code,paydate,Paytype )
    VALUES
    ('Emp_4', '20130723', 'Holiday_pay'),
    ('Emp_4', '20130723', 'Holiday_pay');

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Jeffrey Williams wrote:

    Phil Parkin wrote:

    IF you add the statement

    SET DATEFORMAT DMY;

    before you attempt the INSERT, the error will go away and the INSERT runs successfully.

    Although this is an option - don't do this.  Change the insert statement to use YYYYMMDD or YYYY-MM-DD (for date, datetime2 or datetimeoffset only) and then you won't have to worry about the date format setting.

    INSERT INTO #input (employee_code,paydate,Paytype )
    VALUES
    ('Emp_4', '20130723', 'Holiday_pay'),
    ('Emp_4', '20130723', 'Holiday_pay');

    It might not be an option depending on where they're getting the data from.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • masterelaichi wrote:

    Unless you meant conversion at the datasource ?

    As Jeffrey Williams is implying, that would be the best solution.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 22 total)

You must be logged in to reply to this topic. Login to reply