How to compare a list of numbers, kind of like lottery results

Question

How to compare a list of numbers, kind of like lottery results

robert.wiglesworth

SSCommitted

Points: 1790
More actions
February 5, 2015 at 10:36 am

#314388

This isn't for a project I am currently working on, but I have been curious about how you could make this comparison:
Say you have a table that has records with numbers sort of like lottery winning numbers, say:
TableWinners
num1, num2, num3, num4, num5, num6
33 52 47 23 17 28
... more records with similar structure.
Then you have another table with chosen numbers, same structure as above, TableGuesses.
How could you do the following comparisons between TableGuesses and TableWinners:
1. Compare a single record in TableGuesses to a single record in TableWinners to get a count of the number of numbers that match (kind of a typical lottery type of thing).
2. Compare a single record in TableGuessess to ALL records in TableWinners to see which record in TableWinners is the closest match to the selected record in TableGuesses.
Does that make sense?
Thanks!
July 21, 2022 at 12:18 pm

This was removed by the editor as SPAM
July 22, 2022 at 12:57 pm

This was removed by the editor as SPAM
July 27, 2022 at 7:39 pm

This was removed by the editor as SPAM

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply

Nevyn SSChampion Points: 14210 More actions · Answer 1

You would probably unpivot the data in both tables so that you have a key for which set of numbers it belongs to, but each winning/drawn number is its own row.

Then for the first one you join the unpivoted tables and count rows and you have a count of matches.

For the second you do the same steps for all rows of the winning numbers table, grouped by a unique identifier of the drawing with a count, and then order the results by number of matches so that you can get the row(s) with the highest matches.

robert.wiglesworth SSCommitted Points: 1790 More actions · Answer 2

Interesting...

I have never unpivoted a table, but I think I understand what it is your are saying. I will read up on unpivot to see how to make it work.

Thanks!

ChrisM@home SSC-Insane Points: 24260 More actions · Answer 3

robert.wiglesworth (2/5/2015)
This isn't for a project I am currently working on, but I have been curious about how you could make this comparison:
Say you have a table that has records with numbers sort of like lottery winning numbers, say:
TableWinners
num1, num2, num3, num4, num5, num6
33 52 47 23 17 28
... more records with similar structure.
Then you have another table with chosen numbers, same structure as above, TableGuesses.
How could you do the following comparisons between TableGuesses and TableWinners:
1. Compare a single record in TableGuesses to a single record in TableWinners to get a count of the number of numbers that match (kind of a typical lottery type of thing).
2. Compare a single record in TableGuessess to ALL records in TableWinners to see which record in TableWinners is the closest match to the selected record in TableGuesses.
Does that make sense?
Thanks!

Normalise it out, one row per number instead of one row per column, and it's trivial. Left-join guesses to winners on number, and whatever you want to call a "strip", then aggregate by strip. If you can't change the schema then split your columns into rows using CROSS APPLY VALUES (Dwain Camps) or delimitedsplit2k8 (Jeff Moden).

[font="Arial"]^{Low-hanging fruit picker and defender of the moggies}[/font]

For better assistance in answering your questions, please read this[/url].

Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

robert.wiglesworth SSCommitted Points: 1790 More actions · Answer 4

Ok, I see what you are saying...

instead of something like this:

drawdate num1 num2 num3 num4 num5 num6

02/05/2015 55 12 45 33 22 06

Do something like this:

drawdate num

02/05/2015 55

02/05/2015 12

02/05/2015 45

02/05/2015 33

02/05/2015 22

02/05/2015 06

Yes? So that would lead to a much taller but narrow table. Is there a performance issue associated with that?

Nevyn SSChampion Points: 14210 More actions · Answer 5

Basically you are making the rows into columns, each with the identifier of the original row, and one of the winning numbers.

There is an UNPIVOT sql statement, or you can achieve the same logically using CROSS APPLY and VALUES

Some examples to get you started

CREATE TABLE #WinningNumbers

(WinningNumberID int,

num1 int,

num2 int,

num3 int,

num4 int,

num5 int,

num6 int)

INSERT #WinningNumbers (WinningNumberID,num1,num2,num3,num4,num5,num6)

SELECT 1, 12,25,27,31,40,45

--FIRST UNPIVOT

SELECT WinningNumberID, IndNumber, NumberID

FROM

(SELECT WinningNumberID,num1,num2,num3,num4,num5,num6

FROM #WinningNumbers) p

UNPIVOT

(IndNumber FOR NumberID IN (num1,num2,num3,num4,num5,num6)) AS unpiv

--UNPIVOT USING APPLY

SELECT WinningNumberID,IndNumber,NumberID

FROM #WinningNumbers

CROSS APPLY (VALUES ('num1',num1),('num2',num2),('num3',num3),('num4',num4),('num5',num5),('num6',num6)) AS x(NumberID,IndNumber)

DROP TABLE #WinningNumbers

As for whether there is a performance issue, yes, potentially. The actually apply method of unpivoting is fairly fast, but if you are doing that, and then querying the unpivoted table, then if there is a lot of data it could be slow. Note: the performance issue isn't because you are creating more rows, it is because you are scanning and transforming everything then querying it)

Of course, it could be argued that the numbers should stored in this fashion to begin with.

Alan Burstein SSC Guru Points: 61141 More actions · Answer 6

Interesting...
I have never unpivoted a table, but I think I understand what it is your are saying. I will read up on unpivot to see how to make it work.
Thanks!

Check out An Alternative (Better?) Method to UNPIVOT (SQL Spackle)[/url]

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

Alan Burstein SSC Guru Points: 61141 More actions · Answer 7

Using the unpivot method in Dwain's article I came up with this solution...

(Note that I used 4 numbers for brevity)

USE SSRS_POC

GO

IF OBJECT_ID('tempdb..#tablewinners') IS NOT NULL DROP TABLE #tablewinners;

IF OBJECT_ID('tempdb..#tableguesses') IS NOT NULL DROP TABLE #tableguesses;

CREATE TABLE #tableWinners (nid int primary key, n1 int, n2 int, n3 int, n4 int);

CREATE TABLE #tableguesses (nid int primary key, n1 int, n2 int, n3 int, n4 int);

GO

INSERT #tableWinners VALUES (1,12,19,36,40);

INSERT #tableguesses

VALUES(1,11,12,34,41),--1 match (12)

(2,41,34,11,19),--1 match (19)

(3,40,4,19,12);--3 matches (12,19,40)

WITH

winners AS

( SELECT nid, n, nvalue

FROM #tableWinners

CROSS APPLY(VALUES ('n1',n1),('n2',n2),('n3',n3),('n4',n4)) x(n,nvalue)

),

Guesses AS

( SELECT nid, n, nvalue

FROM #tableGuesses

CROSS APPLY (VALUES ('n1',n1),('n2',n2),('n3',n3),('n4',n4)) x(n,nvalue)

)

SELECT

--TOP 1 WITH TIES

g.nid, count(g.nid) AS matches

FROM winners w

JOIN Guesses g ON w.nvalue = g.nvalue

GROUP BY g.nid

--ORDER BY matches DESC

DROP TABLE #tableWinners

DROP TABLE #tableguesses

This will give you how many numbers match. Uncomment the TOP 1 WITH TIES and ORDER BY clauses and it will return the closest match.

Edit: Code formatting

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

Dwain Camps SSC Guru Points: 86948 More actions · Answer 8

Let's see if I can work past my blushing here and help out a little bit.

First off, Alan.B's got a great solution as it knocks off both of your specific requirements with basically a single query.

For obvious reasons, I like the CROSS APPLY VALUES approach to UNPIVOT. 🙂

Since this is an academic question according to the OP, let's look at a few perturbations of the requirement:

Suppose that your table of winners contains 5 or less numbers:

1. After unpivoting you want to find any matches in the guesses table that contain all of those numbers. This is relational division with remainder if there are less than 5 numbers in the table.

2. If your table of winners contains exactly 5 numbers and the winner must be a 100% match against the guess, this is relational division with no remainder.

In either cases of 1 and 2, if we assume that the guesses table can contain more than five numbers, this article covers both:

High Performance Relational Division [/url]

3. If your table of winners must be matched by the sequence of occurrence, that's a special case of relational division that Peter Larsson aptly named "ordered relational division."

Identifying a Subsequence in a Sequence, Part 3

4. Suppose you introduce a requirement that in the case of a tie (same count of matches), the "best match" is the one whose difference in the remaining numbers is minimized.

Now that latter is an interesting problem! Since you didn't state it as a requirement I won't try to solve it right now, but don't be surprised if someday you see an article on it!

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

Jeff Moden SSC Guru Points: 1004340 More actions · Answer 9

jefferyjordan26 wrote:

Hey there!
Each winning/drawn number has its own row, however you would probably unpivot the data in both tables so that you have a key for which group of numbers it belongs to. Then you connect the unpivoted tables for the first one, count the rows, and then you get a count of matches. In order to retrieve the row(s) with the most matches, you need to perform the same processes for each row of the winning numbers table for the second drawing, grouping them by a counted unique drawing identification.

That's basically what most of the posts and a lot of the code on this thread suggests. Have you some alternative code that you'd like to share?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)