the TSQL code for this query...is needed.

  • Jeff Moden (2/10/2012)


    wurkenmomz (2/10/2012)


    Also, he states that using the Distinct will not get the recommended results and can be very unreliable, in these cases. He does this for a living.

    Perfect. The instructor must be correct especially since he does this for a living. The rest of us all do this in our spare time just for fun and only every other weekend and only if it's not nice outside. 😛

    Just remember that your instructor is the one that wrote the "Triangular Join" and then lookup "DISTINCT" in Books Online and see if it's really as "unreliable" as he says. 😉

    "Doing it for a living"...doesn't mean that your instructor is always correct.

    When learning a subject, you should always explore other methods / answers and question why the results are different from what you have been "taught".......you may well learn a lot more this way 🙂

    now, please if you will, consider the following code.....cut and paste into SSMS and run it

    what are your conclusions?

    --+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

    --PLEASE NOTE THAT THIS WILL DELETE ANY EXG TABLES WITH SAME NAME IN database TEMPDB....please amend if required

    --+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

    use [tempdb]--==== start in safe place!!!

    GO

    --====Conditionally delete table

    IF OBJECT_ID('tempdb..Vendors', 'U') IS NOT NULL DROP TABLE tempdb..Vendors ;

    --==== create a table with 10 million record ... tskes around 20 secs on a desktop

    SELECT TOP 10000000 ---- NOTE 10 MILLION rows

    VendorName = 1 + CAST(Abs(Checksum(Newid()) % 1000 ) AS INT),

    VendorCity = 1 + CAST(Abs(Checksum(Newid()) % 100 ) AS INT),

    VendorState = 1 + CAST(Abs(Checksum(Newid()) % 10 ) AS INT)

    INTO Vendors

    FROM sys.all_columns ac1

    CROSS JOIN sys.all_columns ac2

    CROSS JOIN sys.all_columns ac3

    GO

    --==== insert a new record that we know will be unique

    INSERT INTO [dbo].[Vendors]([VendorName], [VendorCity], [VendorState])

    SELECT 99999, 9999, 999

    GO

    -- your instructor code

    SET statistics TIME ON

    SET statistics IO ON

    SELECT VendorName

    ,VendorCity

    ,VendorState

    FROM Vendors

    WHERE VendorCity NOT IN

    (SELECT v1.VendorCity

    FROM Vendors AS v1

    JOIN Vendors AS v2

    ON (v1.VendorName <> v2.VendorName) AND

    (v1.VendorCity = v2.VendorCity AND

    v1.VendorState = v2.VendorState));

    SET statistics TIME OFF

    SET statistics IO OFF

    --==== alternative code by Jeff Moden

    SET statistics TIME ON

    SET statistics IO ON

    SELECT addr1.VendorName, addr1.VendorCity, addr1.VendorState

    FROM dbo.Vendors addr1

    INNER JOIN

    (

    SELECT VendorCity, VendorState

    FROM dbo.Vendors

    GROUP BY VendorCity, VendorState

    HAVING COUNT(DISTINCT VendorName) = 1

    ) unqaddr

    ON addr1.VendorCity = unqaddr.VendorCity

    AND addr1.VendorState = unqaddr.VendorState

    SET statistics TIME OFF

    SET statistics IO OFF

    I would now suggest that you (and your instructor) start to improve upon these results by creating some judicious index(s) that further improve the performance...

    (I appreciate that you may consider the code to be "contrived"....but hopefully it does represent a more efficient method)

    ps

    The rest of us all do this in our spare time just for fun and only every other weekend and only if it's not nice outside. 😛

    I would add that:

    if there is beer in the fridge

    or pork chops for lunch (apologies to JM)

    or the kids need playing with......yadda yadda

    ...then we just post the first piece of code that springs to mind and hope that it works

    😀

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • Mmmmmm.... Porkchops marinated in beer. 😀

    Hey, Graham... here's a little tip for you to try...

    SET STATISTICS IO, TIME ON;

    ...

    some code to be measured

    ...

    SET STATISTICS IO, TIME OFF;

    I know, I know... I'm lazy. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 2 posts - 16 through 16 (of 16 total)

You must be logged in to reply to this topic. Login to reply