query help

  • I need a query which should update the master table with status based

    on the action columns values.

    If any of the values of the tableB for action column are 1,3,4,5,6 then update the status to 'Y' for the

    SID in the tableA else the SID will be 'N'.

    Below is the sample data with expected output.

    Thanks for your help in advance..

    TableA

    SID Name

    ---- -----

    10 Andy

    11 Sam

    12 pat

    13 Mat

    14 John

    TableB

    SID action

    ---- -------

    10 1

    10 3

    10 9

    11 5

    11 6

    12 2

    12 7

    13 1

    14 10

    Expected Output:

    SID Name status

    ---- ----- -------

    10 Andy Y

    11 Sam Y

    12 pat N

    13 Mat Y

    14 John N

  • Try executing this query ;

    UPDATE a

    SET

    a.Status = CASE WHEN b.action IN(1,3,4,5,6) THEN 'Y' ELSE 'N' END

    FROM

    TableA a

    JOIN

    TableB b

    ON

    a.SID = b.SID

  • KtmGuy (1/29/2014)


    Try executing this query ;

    UPDATE a

    SET

    a.Status = CASE WHEN b.action IN(1,3,4,5,6) THEN 'Y' ELSE 'N' END

    FROM

    TableA a

    JOIN

    TableB b

    ON

    a.SID = b.SID

    Well done! Just a quick tip: you can make this much faster with a left outer join instead of the inner join.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Alan.B (1/30/2014)


    KtmGuy (1/29/2014)


    Try executing this query ;

    UPDATE a

    SET

    a.Status = CASE WHEN b.action IN(1,3,4,5,6) THEN 'Y' ELSE 'N' END

    FROM

    TableA a

    JOIN

    TableB b

    ON

    a.SID = b.SID

    Well done! Just a quick tip: you can make this much faster with a left outer join instead of the inner join.

    I am not sure I follow what you mean that a left join would make this faster. Can you explain that?

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Sean Lange (1/31/2014)


    Alan.B (1/30/2014)


    KtmGuy (1/29/2014)


    Try executing this query ;

    UPDATE a

    SET

    a.Status = CASE WHEN b.action IN(1,3,4,5,6) THEN 'Y' ELSE 'N' END

    FROM

    TableA a

    JOIN

    TableB b

    ON

    a.SID = b.SID

    Well done! Just a quick tip: you can make this much faster with a left outer join instead of the inner join.

    I am not sure I follow what you mean that a left join would make this faster. Can you explain that?

    First, and I wanted to mention this in my original response, I don't think an inner join would guarantee the correct answer. In the OP's sample data there is a match in tableB for every SID in tableA. If that were not the case then we would lose data from the LEFT table.

    That said, I was making an assumption based the following DDL and Sample Data:

    USE tempdb

    GO

    IF OBJECT_ID('tempdb.dbo.tableB') IS NOT NULL DROP TABLE dbo.tableB;

    IF OBJECT_ID('tempdb.dbo.tableA') IS NOT NULL DROP TABLE dbo.tableA;

    CREATE TABLE dbo.tableA([SID] int primary key, name varchar(10) not null, [status] CHAR(1) null);

    CREATE TABLE dbo.tableB(g_id int identity primary key, [SID] int foreign key references dbo.tableA([SID]), [action] int not null);

    INSERT dbo.tableA([SID],name) VALUES(10,'Andy'),(11,'Sam'),(12,'pat'),(13,'Mat'),(14,'John');

    INSERT dbo.tableB([SID],action) VALUES(10,1),(10,3),(10,9),(11,5),(11,6),(12,2),(12,7),(13,1),(14,10);

    UPDATE a

    SET a.Status = CASE WHEN b.action IN(1,3,4,5,6) THEN 'Y' ELSE 'N' END

    FROM dbo.TableA a

    JOIN dbo.TableB b ON a.SID = b.SID

    UPDATE a

    SET a.Status = CASE WHEN b.action IN(1,3,4,5,6) THEN 'Y' ELSE 'N' END

    FROM dbo.TableA a

    LEFT JOIN dbo.TableB b ON a.SID = b.SID

    UPDATE a

    SET a.Status = CASE WHEN b.action IN(1,3,4,5,6) THEN 'Y' ELSE 'N' END

    FROM dbo.TableB b

    RIGHT JOIN dbo.TableA a ON a.SID = b.SID

    Based on my DDL (and if there was a matching SID in tableB for each SID in tableA) all three queries should produce the correct result but the OUTER joins produce what appears to be a better query plan based on cost...

    Below is some code based I threw together to test roughly 1,000,000 rows...

    IF OBJECT_ID('tempdb.dbo.tableB') IS NOT NULL DROP TABLE dbo.tableB;

    IF OBJECT_ID('tempdb.dbo.tableA') IS NOT NULL DROP TABLE dbo.tableA;

    IF OBJECT_ID('tempdb..#prep') IS NOT NULL DROP TABLE #prep;

    IF OBJECT_ID('tempdb..#stg') IS NOT NULL DROP TABLE #stg;

    CREATE TABLE dbo.tableA([SID] int primary key, name varchar(10) not null, [status] CHAR(1) null);

    CREATE TABLE dbo.tableB(g_id int identity primary key, [SID] int foreign key references dbo.tableA([SID]), [action] int not null);

    CREATE TABLE #prep(n int primary key, eid char(8) not null, rnd tinyint not null);

    CREATE TABLE #stg(SID_PK int not null, eid char(8) not null, g_id int not null, SID_FK int not null, action tinyint not null);

    WITH iTally(n,eid,rnd) AS

    (SELECT TOP 420000

    ROW_NUMBER() OVER (ORDER BY (SELECT ($))),

    LEFT(convert(char(36),newid()),8),

    convert(int,floor(3*rand(convert(varbinary,newid())))+2) -- rows: 2-5

    FROM sys.all_columns a CROSS JOIN sys.all_columns b)

    INSERT #prep

    SELECT n,eid,rnd

    FROM iTally a;

    INSERT #stg

    SELECT

    x.n AS [SID_PK], x.eid,

    ROW_NUMBER() OVER (ORDER BY (SELECT ($))) AS g_id,

    x.n AS [SID_FK], t.c AS [action]

    FROM #prep x

    CROSS APPLY

    (SELECT TOP (x.rnd)

    convert(int,ceiling(5*rand(convert(varbinary,newid()))))

    FROM #prep) t(c)

    GROUP BY x.n, x.eid, t.c;

    INSERT dbo.tableA(SID,name)

    SELECT DISTINCT SID_PK, eid

    FROM #stg;

    INSERT dbo.tableB(SID,action)

    SELECT SID_FK, action

    FROM #stg;

    I ran this update a few times, here's a result comparison (I ran this a few times and combined the results below)

    SET STATISTICS TIME ON;

    UPDATE a

    SET a.Status = CASE WHEN b.action IN(1,3,4,5,6) THEN 'Y' ELSE 'N' END

    FROM dbo.TableA a

    JOIN dbo.TableB b ON a.SID = b.SID

    UPDATE a

    SET a.Status = CASE WHEN b.action IN(1,3,4,5,6) THEN 'Y' ELSE 'N' END

    FROM dbo.TableA a

    LEFT JOIN dbo.TableB b ON a.SID = b.SID

    UPDATE a

    SET a.Status = CASE WHEN b.action IN(1,3,4,5,6) THEN 'Y' ELSE 'N' END

    FROM dbo.TableB b

    RIGHT JOIN dbo.TableA a ON a.SID = b.SID

    SET STATISTICS TIME OFF;

    GO

    and these were the results:

    --inner

    SQL Server Execution Times:

    run1: CPU time = 6847 ms, elapsed time = 3228 ms.

    run2: CPU time = 6568 ms, elapsed time = 3172 ms.

    run3: CPU time = 3946 ms, elapsed time = 2522 ms.

    --left

    SQL Server Execution Times:

    run1: CPU time = 6054 ms, elapsed time = 2469 ms.

    run2: CPU time = 5836 ms, elapsed time = 2458 ms.

    run3: CPU time = 4619 ms, elapsed time = 1706 ms.

    --right

    SQL Server Execution Times:

    run1: CPU time = 5945 ms, elapsed time = 2691 ms.

    run2: CPU time = 5725 ms, elapsed time = 2491 ms.

    run3: CPU time = 2823 ms, elapsed time = 1453 ms.

    I'm sorry this is such a long response. I hope I have been clear. I am interested in your thoughts...

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Nice work Alan. I too suspected that an inner join could (and probably will) not produce exactly the right results. Of course in our business accuracy is the first requirement. It doesn't much matter how fast the code runs if the results are incorrect. 😀

    Interesting reading the results of your test using the different join types. Somewhat surprised the inner join was slower.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply