Duplicate rows confusion

  • Hello,

    I've this table that I pulled from sqlzoo. I loaded this data in my sql server 2016.

    name region area populationgdp

    Afghanistan South Asia 652225 26000000

    Albania Europe 28728 3200000 6656000000

    Algeria Middle East 2400000 32900000 75012000000

    Andorra Europe 468 64000

    Bangladesh South Asia 143998 15260000067144000000

    United KingdomEurope 242514 596000002022824000000

    when I do the self join "select y.name, y.region, y.population from bbc y inner join bbc x on y.region =x.region" it returns 14 rows. I completely lost my mind as why I'm getting 14 rows. If I change the join from region to name or population (just to test) I get 6 rows as expected.

    Can someone please enlighten me why I'm getting 14 rows in this join "select y.name, y.region, y.population from bbc y inner join bbc x on y.region =x.region"?

    Thanks,

  • Anil Paneru (10/6/2016)


    Hello,

    I've this table that I pulled from sqlzoo. I loaded this data in my sql server 2016.

    name region area populationgdp

    Afghanistan South Asia 652225 26000000

    Albania Europe 28728 3200000 6656000000

    Algeria Middle East 2400000 32900000 75012000000

    Andorra Europe 468 64000

    Bangladesh South Asia 143998 15260000067144000000

    United KingdomEurope 242514 596000002022824000000

    when I do the self join "select y.name, y.region, y.population from bbc y inner join bbc x on y.region =x.region" it returns 14 rows. I completely lost my mind as why I'm getting 14 rows. If I change the join from region to name or population (just to test) I get 6 rows as expected.

    Can someone please enlighten me why I'm getting 14 rows in this join "select y.name, y.region, y.population from bbc y inner join bbc x on y.region =x.region"?

    Thanks,

    Yes... it's called an "accidental Cross Join" or "Many-to-Many" join. If you join 3 rows of EUROPE to 3 rows of EUROPE, each row of the "LEFT" table will join with all 3 rows of the "RIGHT" table for a total of 9 rows. Two rows of SOUTH ASIA will join with 2 rows of SOUTH ASIA for a total of 4 rows. 1 row of MIDDLE EAST will join with 1 row of MIDDLE EAST.

    9 + 4 + 1 = 14

    If you only want 6 rows, you'll need to add NAME to the join to make the rows unique for a 1:1 join.

    Also, I don't understand why you're doing a self join to begin with. Why not just select from a single copy of the table. It will give you what you asked for in the code you posted.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff,

    I could not ask more than this. I'm all clear with your simple explanation. Thanks a bunch!

    I'm trying to pull the data that shows the name, region and population of the smallest country in each region. and I started with that join hoping that I'll be able to compare it to find the smallest country.

    Thanks again for your wonderful explanation!

  • This should give you the results that you are looking for. However, if there are 2 countries in the same region with the same min population, then you will only get one of them. To get ALL the countries with min population, change the ROW_NUMBER() to RANK()

    SELECT T1.name, T1.region, T1.population

    FROM (

    SELECT y.name, y.region, y.population

    , rn = ROW_NUMBER() OVER (PARTITION BY y.region ORDER BY y.population)

    FROM bbc AS y

    ) As T1

    WHERE T1.rn = 1;

  • Des,

    Thank you so much for this easy solution!

    I've tried it myself for few other scenarios and it works perfectly.

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply