Challenging table join problem

Question

Post reply

Challenging table join problem

ramadesai108

SSChampion

Points: 10609
More actions
October 13, 2011 at 8:35 am

#245319

Hi All,
I have a table with the ID (int) and ScalePoints (float) columns called Points.
I have another table with Scale (float) and Descr(varchar) called Scales.
I want to join these tables based on Points.ScalePoints = Scales.Scale.
The challenge is that the Points.ScalePoints will have values like 1.2, 4.3, 5.0, 2.8, etc
and the Scales.Scale will have values like 1.0, 2.0, 3.0, 4.0 and 5.0. This is a setting table.
So I need to get the resultset maping where if Points.ScalePoint falls between 3 and 4 then I get the
description of Scale.Descr belonging to Scale.Scale of 3.0. if Points.ScalePoint falls between 1 and 2 then I get the
description of Scale.Descr belonging to Scale.Scale of 1.0
How do I do that?
Thanks for your time.

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply

sturner SSC-Insane Points: 22471 More actions · Answer 1

cast it to an integer when you join, like : cast(Points.ScalePoints as integer) = cast(Scales.Scale as integer)

The probability of survival is inversely proportional to the angle of arrival.

GSquared SSC Guru Points: 260824 More actions · Answer 2

Like this:

select *

from MyValuesTable

inner join MyScalesTable

on MyValuesTable.ValueFloat >= MyScalesTable.Scale

and MyValuesTable.ValueFloat < MyScalesTable.Scale + 1 ;

That assumes the scale range is 1 in all cases. If not, you'll need to plug in a sub-select there to get the Min value that's higher than the Scale value.

select *

from MyValuesTable

inner join MyScalesTable

on MyValuesTable.ValueFloat >= MyScalesTable.Scale

and MyValuesTable.ValueFloat <

(select Min(Scale)

from MyScalesTable as Scales2

where Scales2.Scale > MyScalesTable.Scale) ;

(Edit for layout)

That'll be slower, but it will work for any range in a scale.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

ramadesai108 SSChampion Points: 10609 More actions · Answer 3

Hi GSquared,

That worked perfectly! Thank you for your time.

Cadavre SSC-Forever Points: 41690 More actions · Answer 4

ramadesai108 (10/13/2011)
Hi All,
I have a table with the ID (int) and ScalePoints (float) columns called Points.
I have another table with Scale (float) and Descr(varchar) called Scales.
I want to join these tables based on Points.ScalePoints = Scales.Scale.
The challenge is that the Points.ScalePoints will have values like 1.2, 4.3, 5.0, 2.8, etc
and the Scales.Scale will have values like 1.0, 2.0, 3.0, 4.0 and 5.0. This is a setting table.
So I need to get the resultset maping where if Points.ScalePoint falls between 3 and 4 then I get the
description of Scale.Descr belonging to Scale.Scale of 3.0. if Points.ScalePoint falls between 1 and 2 then I get the
description of Scale.Descr belonging to Scale.Scale of 1.0
How do I do that?
Thanks for your time.

--First things first, lets build a nice play-area

IF object_id('tempdb..#Points') IS NOT NULL

BEGIN

DROP TABLE #Points

END

IF object_id('tempdb..#Scales') IS NOT NULL

BEGIN

DROP TABLE #Scales

END

--500,000 Random rows of data

SELECT TOP 500000 IDENTITY(INT,1,1) AS ID,

RAND(CHECKSUM(NEWID())) * 100 AS ScalePoints

INTO #Points

FROM master.dbo.syscolumns sc1, master.dbo.syscolumns sc2, master.dbo.syscolumns sc3

--Add a Primary Key

ALTER TABLE #Points

ADD CONSTRAINT Points_PK_ID

PRIMARY KEY CLUSTERED (ID) WITH FILLFACTOR = 100

--100 rows of data

SELECT TOP 100 IDENTITY(INT,1,1) AS ID,

(ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) * 1.0) - 1.0 AS Scale

INTO #Scales

FROM master.dbo.syscolumns sc1, master.dbo.syscolumns sc2

--Add a Primary Key

ALTER TABLE #Scales

ADD CONSTRAINT Scales_PK_ID

PRIMARY KEY CLUSTERED (ID) WITH FILLFACTOR = 100

--Actual Query

SELECT *

FROM #Points

CROSS JOIN #Scales

WHERE ScalePoints >= Scale

AND ScalePoints < Scale + 1

--EDIT--

Bah, beaten to the punch 🙂

Forever trying to learn
My blog - http://www.cadavre.co.uk/
For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

irobertson Hall of Fame Points: 3695 More actions · Answer 5

Much simpler (and more efficient) to just ceiling the ranged value I think:

select *

from MyValuesTable

join MyScalesTable

on ceiling(MyValuesTable.ValueFloat) = MyScalesTable.Scale

Cheers, Iain

Cadavre SSC-Forever Points: 41690 More actions · Answer 6

irobertson (10/13/2011)
Much simpler (and more efficient) to just ceiling the ranged value I think:
select *
from MyValuesTable
join MyScalesTable
on ceiling(MyValuesTable.ValueFloat) = MyScalesTable.Scale
Cheers, Iain

Hmmm.

--First things first, lets build a nice play-area

IF object_id('tempdb..#Points') IS NOT NULL

BEGIN

DROP TABLE #Points

END

IF object_id('tempdb..#Scales') IS NOT NULL

BEGIN

DROP TABLE #Scales

END

--500,000 Random rows of data

SELECT TOP 500000 IDENTITY(INT,1,1) AS ID,

RAND(CHECKSUM(NEWID())) * 100 AS ScalePoints

INTO #Points

FROM master.dbo.syscolumns sc1, master.dbo.syscolumns sc2, master.dbo.syscolumns sc3

--Add a Primary Key

ALTER TABLE #Points

ADD CONSTRAINT Points_PK_ID

PRIMARY KEY CLUSTERED (ID) WITH FILLFACTOR = 100

--100 rows of data

SELECT TOP 100 IDENTITY(INT,1,1) AS ID,

(ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) * 1.0) - 1.0 AS Scale

INTO #Scales

FROM master.dbo.syscolumns sc1, master.dbo.syscolumns sc2

--Add a Primary Key

ALTER TABLE #Scales

ADD CONSTRAINT Scales_PK_ID

PRIMARY KEY CLUSTERED (ID) WITH FILLFACTOR = 100

PRINT '========== CROSS JOIN =========='

SET STATISTICS TIME ON

SELECT *

FROM #Points

CROSS JOIN #Scales

WHERE ScalePoints >= Scale

AND ScalePoints < Scale + 1

SET STATISTICS TIME OFF

PRINT REPLICATE('=',80)

PRINT '========== CEILING =========='

SET STATISTICS TIME ON

SELECT *

FROM #Points

JOIN #Scales ON ceiling(#Points.ScalePoints) = #Scales.Scale

SET STATISTICS TIME OFF

PRINT REPLICATE('=',80)

PRINT '========== INNER JOIN =========='

SET STATISTICS TIME ON

SELECT *

FROM #Points

INNER JOIN #Scales ON #Points.ScalePoints >= #Scales.Scale

AND #Points.ScalePoints < #Scales.Scale + 1

SET STATISTICS TIME OFF

PRINT REPLICATE('=',80)

========== CROSS JOIN ==========

(500000 row(s) affected)

SQL Server Execution Times:

CPU time = 4851 ms, elapsed time = 7368 ms.

================================================================================

========== CEILING ==========

(494962 row(s) affected)

SQL Server Execution Times:

CPU time = 546 ms, elapsed time = 4443 ms.

================================================================================

========== INNER JOIN ==========

(500000 row(s) affected)

SQL Server Execution Times:

CPU time = 4352 ms, elapsed time = 6898 ms.

================================================================================

Notice the rows affected for the CEILING method.

Forever trying to learn
My blog - http://www.cadavre.co.uk/
For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

irobertson Hall of Fame Points: 3695 More actions · Answer 7

Interesting. Cue bout of furious SQL to investigate 🙂

GSquared SSC Guru Points: 260824 More actions · Answer 8

Ceiling won't get the same results as the other queries. Floor will. Try it with that.

Ceiling is dropping any that are higher than the highest scale value. That's the reverse math than the other two methods.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

irobertson Hall of Fame Points: 3695 More actions · Answer 9

Dur, stoopid. Ceilings are higher, not lower. That would be a floor. You can't fault the logic of that 🙂

PRINT '========== CROSS JOIN =========='

SET STATISTICS TIME ON

SELECT *

FROM #Points

CROSS JOIN #Scales

WHERE ScalePoints >= Scale

AND ScalePoints < Scale + 1

SET STATISTICS TIME OFF

PRINT REPLICATE('=',80)

PRINT '========== CEILING =========='

SET STATISTICS TIME ON

SELECT *

FROM #Points

JOIN #Scales ON floor(#Points.ScalePoints) = #Scales.Scale

SET STATISTICS TIME OFF

PRINT REPLICATE('=',80)

PRINT '========== INNER JOIN =========='

SET STATISTICS TIME ON

SELECT *

FROM #Points

INNER JOIN #Scales ON #Points.ScalePoints >= #Scales.Scale

AND #Points.ScalePoints < #Scales.Scale + 1

SET STATISTICS TIME OFF

PRINT REPLICATE('=',80)

========== CROSS JOIN ==========

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 0 ms.

(500000 row(s) affected)

SQL Server Execution Times:

CPU time = 2761 ms, elapsed time = 7700 ms.

================================================================================

========== CEILING ==========

(500000 row(s) affected)

SQL Server Execution Times:

CPU time = 249 ms, elapsed time = 6338 ms.

================================================================================

========== INNER JOIN ==========

(500000 row(s) affected)

SQL Server Execution Times:

CPU time = 2777 ms, elapsed time = 7805 ms.

================================================================================

Cadavre SSC-Forever Points: 41690 More actions · Answer 10

irobertson (10/13/2011)
Interesting. Cue bout of furious SQL to investigate 🙂

Think I have it, the ceiling method is matching 0.00* to 1.0 instead of to 0.0. This means that when you get to 99.* there is nothing for the ceiling method to match to because I only included scales between 0.0 and 99.0 in the test data.

If instead you use floor (or I add 100.0 to the test data), then we get the 500,000 rows.

========== FLOOR ==========

(500000 row(s) affected)

SQL Server Execution Times:

CPU time = 2339 ms, elapsed time = 6331 ms.

================================================================================

--First things first, lets build a nice play-area

IF object_id('tempdb..#Points') IS NOT NULL

BEGIN

DROP TABLE #Points

END

IF object_id('tempdb..#Scales') IS NOT NULL

BEGIN

DROP TABLE #Scales

END

--500,000 Random rows of data

SELECT TOP 500000 IDENTITY(INT,1,1) AS ID,

RAND(CHECKSUM(NEWID())) * 100 AS ScalePoints

INTO #Points

FROM master.dbo.syscolumns sc1, master.dbo.syscolumns sc2, master.dbo.syscolumns sc3

--Add a Primary Key

ALTER TABLE #Points

ADD CONSTRAINT Points_PK_ID

PRIMARY KEY CLUSTERED (ID) WITH FILLFACTOR = 100

--100 rows of data

SELECT TOP 101 IDENTITY(INT,1,1) AS ID,

(ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) * 1.0) - 1.0 AS Scale

INTO #Scales

FROM master.dbo.syscolumns sc1, master.dbo.syscolumns sc2

--Add a Primary Key

ALTER TABLE #Scales

ADD CONSTRAINT Scales_PK_ID

PRIMARY KEY CLUSTERED (ID) WITH FILLFACTOR = 100

PRINT '========== CEILING =========='

SET STATISTICS TIME ON

SELECT *

FROM #Points

JOIN #Scales ON ceiling(#Points.ScalePoints) = #Scales.Scale

ORDER BY #Points.ScalePoints ASC

SET STATISTICS TIME OFF

PRINT REPLICATE('=',80)

========== CEILING ==========

(500000 row(s) affected)

SQL Server Execution Times:

CPU time = 2308 ms, elapsed time = 5402 ms.

================================================================================

The conclusion being that the CEILING method is not functionally equivalent to the other methods, however FLOOR does seem to be much more efficient than either a CROSS or an INNER JOIN using >= < +1

--EDIT--

Dear god, I am the slowest typist of all time. I blame having several screens all of which have actual work on except the far left one which has this forum open 🙂

Forever trying to learn
My blog - http://www.cadavre.co.uk/
For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

irobertson Hall of Fame Points: 3695 More actions · Answer 11

The plan shows it all. 2 scans and a match for the floor method because all it has to do is calculate the new value then merge. With the other method, it needs to compare the Points to all the values in the Scales table and so constantly scans the worktable it creates:

========== CROSS JOIN ==========

(500000 row(s) affected)

Table 'Worktable'. Scan count 100, logical reads 2418435, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#Points'. Scan count 1, logical reads 1303, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#Scales'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

================================================================================

========== CEILING ==========

(500000 row(s) affected)

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#Points'. Scan count 1, logical reads 1303, physical reads 1, read-ahead reads 1298, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#Scales'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

================================================================================

ramadesai108 SSChampion Points: 10609 More actions · Answer 12

I got working with the first solution from GSquared. Thanks.

Now i have a result as follows:

ID myScales Scale Descr

3 0.154 0 Poor

2 4.0 4 Bigger

5 5.0 5 Better

8 3.10 3 Good

10 2.2 2 Poor

I need to figure out how many percentage is poor, bigger, better and good out of these result

based on the myScales total

How do I accomplish that with the same query?

This is what I used as my original query to get the scales:

select *from MyValuesTableinner join MyScalesTable on MyValuesTable.ValueFloat >= MyScalesTable.Scale and MyValuesTable.ValueFloat < MyScalesTable.Scale + 1 ;

Thanks again

ramadesai108 SSChampion Points: 10609 More actions · Answer 13

I enterd the data but it got rid of the space. the Id is just an int column, myScales is a float, scale is a 1 digit column and rest is the description. Sorry it is not properly readable.

Thanks.

irobertson Hall of Fame Points: 3695 More actions · Answer 14

Something like this should do the trick:

~~select sum(case when ScaleDesc = 'Poor' then 1 else 0 end) / count(*) as Poor~~

~~, sum(case when ScaleDesc = 'Good' then 1 else 0 end) / count(*) as Good~~

~~, etc...~~

~~from #Points p~~

~~join #Scales s~~

~~on floor(p.ScalePoints) = s.Scale~~

Use something like this if you want decimal places (this example gives 1dp):

select sum(case when ScaleDesc = 'Poor' then 1.0 else 0.0 end) / count(*) as Poor

, sum(case when ScaleDesc = 'Good' then 1.0 else 0.0 end) / count(*) as Good

from #Points p

join #Scales s

on floor(p.ScalePoints) = s.Scale

Edit: derp + note use of more efficient query...

Edit again: sometimes I'm really dumb. Just ignore the first version.