Challenging table join problem

  • Hi All,

    I have a table with the ID (int) and ScalePoints (float) columns called Points.

    I have another table with Scale (float) and Descr(varchar) called Scales.

    I want to join these tables based on Points.ScalePoints = Scales.Scale.

    The challenge is that the Points.ScalePoints will have values like 1.2, 4.3, 5.0, 2.8, etc

    and the Scales.Scale will have values like 1.0, 2.0, 3.0, 4.0 and 5.0. This is a setting table.

    So I need to get the resultset maping where if Points.ScalePoint falls between 3 and 4 then I get the

    description of Scale.Descr belonging to Scale.Scale of 3.0. if Points.ScalePoint falls between 1 and 2 then I get the

    description of Scale.Descr belonging to Scale.Scale of 1.0

    How do I do that?

    Thanks for your time.

  • cast it to an integer when you join, like : cast(Points.ScalePoints as integer) = cast(Scales.Scale as integer)

    The probability of survival is inversely proportional to the angle of arrival.

  • Like this:

    select *

    from MyValuesTable

    inner join MyScalesTable

    on MyValuesTable.ValueFloat >= MyScalesTable.Scale

    and MyValuesTable.ValueFloat < MyScalesTable.Scale + 1 ;

    That assumes the scale range is 1 in all cases. If not, you'll need to plug in a sub-select there to get the Min value that's higher than the Scale value.

    select *

    from MyValuesTable

    inner join MyScalesTable

    on MyValuesTable.ValueFloat >= MyScalesTable.Scale

    and MyValuesTable.ValueFloat <

    (select Min(Scale)

    from MyScalesTable as Scales2

    where Scales2.Scale > MyScalesTable.Scale) ;

    (Edit for layout)

    That'll be slower, but it will work for any range in a scale.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Hi GSquared,

    That worked perfectly! Thank you for your time.

  • ramadesai108 (10/13/2011)


    Hi All,

    I have a table with the ID (int) and ScalePoints (float) columns called Points.

    I have another table with Scale (float) and Descr(varchar) called Scales.

    I want to join these tables based on Points.ScalePoints = Scales.Scale.

    The challenge is that the Points.ScalePoints will have values like 1.2, 4.3, 5.0, 2.8, etc

    and the Scales.Scale will have values like 1.0, 2.0, 3.0, 4.0 and 5.0. This is a setting table.

    So I need to get the resultset maping where if Points.ScalePoint falls between 3 and 4 then I get the

    description of Scale.Descr belonging to Scale.Scale of 3.0. if Points.ScalePoint falls between 1 and 2 then I get the

    description of Scale.Descr belonging to Scale.Scale of 1.0

    How do I do that?

    Thanks for your time.

    --First things first, lets build a nice play-area

    IF object_id('tempdb..#Points') IS NOT NULL

    BEGIN

    DROP TABLE #Points

    END

    IF object_id('tempdb..#Scales') IS NOT NULL

    BEGIN

    DROP TABLE #Scales

    END

    --500,000 Random rows of data

    SELECT TOP 500000 IDENTITY(INT,1,1) AS ID,

    RAND(CHECKSUM(NEWID())) * 100 AS ScalePoints

    INTO #Points

    FROM master.dbo.syscolumns sc1, master.dbo.syscolumns sc2, master.dbo.syscolumns sc3

    --Add a Primary Key

    ALTER TABLE #Points

    ADD CONSTRAINT Points_PK_ID

    PRIMARY KEY CLUSTERED (ID) WITH FILLFACTOR = 100

    --100 rows of data

    SELECT TOP 100 IDENTITY(INT,1,1) AS ID,

    (ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) * 1.0) - 1.0 AS Scale

    INTO #Scales

    FROM master.dbo.syscolumns sc1, master.dbo.syscolumns sc2

    --Add a Primary Key

    ALTER TABLE #Scales

    ADD CONSTRAINT Scales_PK_ID

    PRIMARY KEY CLUSTERED (ID) WITH FILLFACTOR = 100

    --Actual Query

    SELECT *

    FROM #Points

    CROSS JOIN #Scales

    WHERE ScalePoints >= Scale

    AND ScalePoints < Scale + 1

    --EDIT--

    Bah, beaten to the punch 🙂


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Much simpler (and more efficient) to just ceiling the ranged value I think:

    select *

    from MyValuesTable

    join MyScalesTable

    on ceiling(MyValuesTable.ValueFloat) = MyScalesTable.Scale

    Cheers, Iain

  • irobertson (10/13/2011)


    Much simpler (and more efficient) to just ceiling the ranged value I think:

    select *

    from MyValuesTable

    join MyScalesTable

    on ceiling(MyValuesTable.ValueFloat) = MyScalesTable.Scale

    Cheers, Iain

    Hmmm.

    --First things first, lets build a nice play-area

    IF object_id('tempdb..#Points') IS NOT NULL

    BEGIN

    DROP TABLE #Points

    END

    IF object_id('tempdb..#Scales') IS NOT NULL

    BEGIN

    DROP TABLE #Scales

    END

    --500,000 Random rows of data

    SELECT TOP 500000 IDENTITY(INT,1,1) AS ID,

    RAND(CHECKSUM(NEWID())) * 100 AS ScalePoints

    INTO #Points

    FROM master.dbo.syscolumns sc1, master.dbo.syscolumns sc2, master.dbo.syscolumns sc3

    --Add a Primary Key

    ALTER TABLE #Points

    ADD CONSTRAINT Points_PK_ID

    PRIMARY KEY CLUSTERED (ID) WITH FILLFACTOR = 100

    --100 rows of data

    SELECT TOP 100 IDENTITY(INT,1,1) AS ID,

    (ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) * 1.0) - 1.0 AS Scale

    INTO #Scales

    FROM master.dbo.syscolumns sc1, master.dbo.syscolumns sc2

    --Add a Primary Key

    ALTER TABLE #Scales

    ADD CONSTRAINT Scales_PK_ID

    PRIMARY KEY CLUSTERED (ID) WITH FILLFACTOR = 100

    PRINT '========== CROSS JOIN =========='

    SET STATISTICS TIME ON

    SELECT *

    FROM #Points

    CROSS JOIN #Scales

    WHERE ScalePoints >= Scale

    AND ScalePoints < Scale + 1

    SET STATISTICS TIME OFF

    PRINT REPLICATE('=',80)

    PRINT '========== CEILING =========='

    SET STATISTICS TIME ON

    SELECT *

    FROM #Points

    JOIN #Scales ON ceiling(#Points.ScalePoints) = #Scales.Scale

    SET STATISTICS TIME OFF

    PRINT REPLICATE('=',80)

    PRINT '========== INNER JOIN =========='

    SET STATISTICS TIME ON

    SELECT *

    FROM #Points

    INNER JOIN #Scales ON #Points.ScalePoints >= #Scales.Scale

    AND #Points.ScalePoints < #Scales.Scale + 1

    SET STATISTICS TIME OFF

    PRINT REPLICATE('=',80)

    ========== CROSS JOIN ==========

    (500000 row(s) affected)

    SQL Server Execution Times:

    CPU time = 4851 ms, elapsed time = 7368 ms.

    ================================================================================

    ========== CEILING ==========

    (494962 row(s) affected)

    SQL Server Execution Times:

    CPU time = 546 ms, elapsed time = 4443 ms.

    ================================================================================

    ========== INNER JOIN ==========

    (500000 row(s) affected)

    SQL Server Execution Times:

    CPU time = 4352 ms, elapsed time = 6898 ms.

    ================================================================================

    Notice the rows affected for the CEILING method.


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Interesting. Cue bout of furious SQL to investigate 🙂

  • Ceiling won't get the same results as the other queries. Floor will. Try it with that.

    Ceiling is dropping any that are higher than the highest scale value. That's the reverse math than the other two methods.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Dur, stoopid. Ceilings are higher, not lower. That would be a floor. You can't fault the logic of that 🙂

    PRINT '========== CROSS JOIN =========='

    SET STATISTICS TIME ON

    SELECT *

    FROM #Points

    CROSS JOIN #Scales

    WHERE ScalePoints >= Scale

    AND ScalePoints < Scale + 1

    SET STATISTICS TIME OFF

    PRINT REPLICATE('=',80)

    PRINT '========== CEILING =========='

    SET STATISTICS TIME ON

    SELECT *

    FROM #Points

    JOIN #Scales ON floor(#Points.ScalePoints) = #Scales.Scale

    SET STATISTICS TIME OFF

    PRINT REPLICATE('=',80)

    PRINT '========== INNER JOIN =========='

    SET STATISTICS TIME ON

    SELECT *

    FROM #Points

    INNER JOIN #Scales ON #Points.ScalePoints >= #Scales.Scale

    AND #Points.ScalePoints < #Scales.Scale + 1

    SET STATISTICS TIME OFF

    PRINT REPLICATE('=',80)

    ========== CROSS JOIN ==========

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    (500000 row(s) affected)

    SQL Server Execution Times:

    CPU time = 2761 ms, elapsed time = 7700 ms.

    ================================================================================

    ========== CEILING ==========

    (500000 row(s) affected)

    SQL Server Execution Times:

    CPU time = 249 ms, elapsed time = 6338 ms.

    ================================================================================

    ========== INNER JOIN ==========

    (500000 row(s) affected)

    SQL Server Execution Times:

    CPU time = 2777 ms, elapsed time = 7805 ms.

    ================================================================================

  • irobertson (10/13/2011)


    Interesting. Cue bout of furious SQL to investigate 🙂

    Think I have it, the ceiling method is matching 0.00* to 1.0 instead of to 0.0. This means that when you get to 99.* there is nothing for the ceiling method to match to because I only included scales between 0.0 and 99.0 in the test data.

    If instead you use floor (or I add 100.0 to the test data), then we get the 500,000 rows.

    ========== FLOOR ==========

    (500000 row(s) affected)

    SQL Server Execution Times:

    CPU time = 2339 ms, elapsed time = 6331 ms.

    ================================================================================

    --First things first, lets build a nice play-area

    IF object_id('tempdb..#Points') IS NOT NULL

    BEGIN

    DROP TABLE #Points

    END

    IF object_id('tempdb..#Scales') IS NOT NULL

    BEGIN

    DROP TABLE #Scales

    END

    --500,000 Random rows of data

    SELECT TOP 500000 IDENTITY(INT,1,1) AS ID,

    RAND(CHECKSUM(NEWID())) * 100 AS ScalePoints

    INTO #Points

    FROM master.dbo.syscolumns sc1, master.dbo.syscolumns sc2, master.dbo.syscolumns sc3

    --Add a Primary Key

    ALTER TABLE #Points

    ADD CONSTRAINT Points_PK_ID

    PRIMARY KEY CLUSTERED (ID) WITH FILLFACTOR = 100

    --100 rows of data

    SELECT TOP 101 IDENTITY(INT,1,1) AS ID,

    (ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) * 1.0) - 1.0 AS Scale

    INTO #Scales

    FROM master.dbo.syscolumns sc1, master.dbo.syscolumns sc2

    --Add a Primary Key

    ALTER TABLE #Scales

    ADD CONSTRAINT Scales_PK_ID

    PRIMARY KEY CLUSTERED (ID) WITH FILLFACTOR = 100

    PRINT '========== CEILING =========='

    SET STATISTICS TIME ON

    SELECT *

    FROM #Points

    JOIN #Scales ON ceiling(#Points.ScalePoints) = #Scales.Scale

    ORDER BY #Points.ScalePoints ASC

    SET STATISTICS TIME OFF

    PRINT REPLICATE('=',80)

    ========== CEILING ==========

    (500000 row(s) affected)

    SQL Server Execution Times:

    CPU time = 2308 ms, elapsed time = 5402 ms.

    ================================================================================

    The conclusion being that the CEILING method is not functionally equivalent to the other methods, however FLOOR does seem to be much more efficient than either a CROSS or an INNER JOIN using >= < +1

    --EDIT--

    Dear god, I am the slowest typist of all time. I blame having several screens all of which have actual work on except the far left one which has this forum open 🙂


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • The plan shows it all. 2 scans and a match for the floor method because all it has to do is calculate the new value then merge. With the other method, it needs to compare the Points to all the values in the Scales table and so constantly scans the worktable it creates:

    ========== CROSS JOIN ==========

    (500000 row(s) affected)

    Table 'Worktable'. Scan count 100, logical reads 2418435, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#Points'. Scan count 1, logical reads 1303, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#Scales'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    ================================================================================

    ========== CEILING ==========

    (500000 row(s) affected)

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#Points'. Scan count 1, logical reads 1303, physical reads 1, read-ahead reads 1298, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#Scales'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    ================================================================================

  • I got working with the first solution from GSquared. Thanks.

    Now i have a result as follows:

    ID myScales Scale Descr

    3 0.154 0 Poor

    2 4.0 4 Bigger

    5 5.0 5 Better

    8 3.10 3 Good

    10 2.2 2 Poor

    I need to figure out how many percentage is poor, bigger, better and good out of these result

    based on the myScales total

    How do I accomplish that with the same query?

    This is what I used as my original query to get the scales:

    select *from MyValuesTableinner join MyScalesTable on MyValuesTable.ValueFloat >= MyScalesTable.Scale and MyValuesTable.ValueFloat < MyScalesTable.Scale + 1 ;

    Thanks again

  • I enterd the data but it got rid of the space. the Id is just an int column, myScales is a float, scale is a 1 digit column and rest is the description. Sorry it is not properly readable.

    Thanks.

  • Something like this should do the trick:

    select sum(case when ScaleDesc = 'Poor' then 1 else 0 end) / count(*) as Poor

    , sum(case when ScaleDesc = 'Good' then 1 else 0 end) / count(*) as Good

    , etc...

    from #Points p

    join #Scales s

    on floor(p.ScalePoints) = s.Scale

    Use something like this if you want decimal places (this example gives 1dp):

    select sum(case when ScaleDesc = 'Poor' then 1.0 else 0.0 end) / count(*) as Poor

    , sum(case when ScaleDesc = 'Good' then 1.0 else 0.0 end) / count(*) as Good

    from #Points p

    join #Scales s

    on floor(p.ScalePoints) = s.Scale

    Edit: derp + note use of more efficient query...

    Edit again: sometimes I'm really dumb. Just ignore the first version.

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply