August 1, 2007 at 4:45 am
Hi,
I have a table with 1000's of rows, and each row is one of 10 types.
What I want is a statement to return one row per type, but that the row is chosen at random, so each time I execute the row I'll get 10 rows, 1 per type, but each row will be different from the last time I run it
There must be a better way than inserting top 1 rows ordering by newid() into a temp table then returning the 10 rows as e result set..
Thanks!
August 1, 2007 at 4:56 am
Are you talking about PAGING?
You want 10 records at a time (one fo each type) on a "page"?
When you click "next" you want next 10 records (all again of different type)?
N 56°04'39.16"
E 12°55'05.25"
August 1, 2007 at 4:59 am
Hi, No its not for paging.
It's for showing different films but one per genre on a web page..
Thanks,
Andy
August 1, 2007 at 7:01 am
How are you else going to check that some of the films you are going to show, not already has been shown?
N 56°04'39.16"
E 12°55'05.25"
August 1, 2007 at 8:08 am
Well, each of the 10 items in each result set will have a different genre, so can't be shown twice in the same set of ten, and a film ere can only have one genre.
I'm not actually worried about whether it gets shown in successive result sets. Some genres may only have 10 films and others may have 1000. Plus anything random can potentially happen twice in a row anyway.
Hope that makes sense.
(Edited for sloppy typing )
August 1, 2007 at 9:19 am
-- Prepare sample data
CREATE TABLE #Movies
(
MovieID INT IDENTITY PRIMARY KEY,
GenreID INT
)
INSERT #Movies
(
GenreID
)
SELECT TOP 1000 1 + ABS(CHECKSUM(NEWID())) % 10
FROM master..syscolumns
-- Show the expected result
SELECT g.GenreID,
(SELECT TOP 1 m.MovieID FROM #Movies AS m WHERE m.GenreID = g.GenreID ORDER BY NEWID()) AS MovieID
FROM (
SELECT GenreID
FROM #Movies
GROUP BY GenreID
) AS g
ORDER BY g.GenreID
N 56°04'39.16"
E 12°55'05.25"
August 1, 2007 at 9:23 am
Our data in SQL Server may not be ordered based on the time to add them. If there is the ORDER clause in your queries, your retrieved data should be the same.
August 1, 2007 at 9:57 pm
Very nicely done, Peter. Nasty fast, too!
--Jeff Moden
Change is inevitable... Change for the better is not.
August 2, 2007 at 1:02 am
Thanks!
N 56°04'39.16"
E 12°55'05.25"
August 2, 2007 at 1:30 pm
A relational database doesn't really do random well. Here are a couple of more straightforward solutions. If you want truly random results from one query to another -- good luck.
CREATE TABLE #Movies ( MovieID INT IDENTITY PRIMARY KEY, GenreID INT, Title varchar(10) ); -- Insert the data in a haphazard order. The titles are in the form "Genre x, Title y" INSERT into #Movies (GenreID, Title) select 10, 'G10T1' union all select 4, 'G4T2' union all select 2, 'G2T2' union all select 1, 'G1T3' union all select 6, 'G6T1' union all select 3, 'G3T5' union all select 8, 'G8T1' union all select 3, 'G3T4' union all select 3, 'G3T2' union all select 1, 'G1T2' union all select 4, 'G4T1' union all select 3, 'G3T1' union all select 5, 'G5T1' union all select 3, 'G3T3' union all select 1, 'G1T1' union all select 2, 'G2T1' union all select 7, 'G7T1' union all select 9, 'G9T1'; -- The first method. Not random, MIN() would return the first -- and MAX() would return the last title in alphabetical order. select m1.GenreID, max(m1.Title) Title from #Movies m1 group by m1.GenreID -- The second method. More random as to the result, but the result -- would be repeatable: you would get the same titles every time. select top 1 GenreID, Title from #Movies where GenreID = 1 union select top 1 GenreID, Title from #Movies where GenreID = 2 union select top 1 GenreID, Title from #Movies where GenreID = 3 union select top 1 GenreID, Title from #Movies where GenreID = 4 union select top 1 GenreID, Title from #Movies where GenreID = 5 union select top 1 GenreID, Title from #Movies where GenreID = 6 union select top 1 GenreID, Title from #Movies where GenreID = 7 union select top 1 GenreID, Title from #Movies where GenreID = 8 union select top 1 GenreID, Title from #Movies where GenreID = 9 union select top 1 GenreID, Title from #Movies where GenreID = 10
Tomm Carr
--
Version Normal Form -- http://groups.google.com/group/vrdbms
August 3, 2007 at 1:42 am
Running Carr's suggestion reveals that the movies are always the same for every genre.
Why? Because there is a clustered index on MovieID. That means that TOP 1 always return the same MovieID, which defeats the purpose.
Here are the two execution plans side by side for comparison
Peso
|--Compute Scalar(DEFINE[m].[MovieID]=[m].[MovieID]))
|--Nested Loops(Left Outer Join, OUTER REFERENCES[#Movies].[GenreID]))
|--Sort(DISTINCT ORDER BY[#Movies].[GenreID] ASC))
| |--Clustered Index Scan(OBJECT[tempdb].[dbo].[#Movies_____________________________________________________________________________________________________________00000000303D].[PK__#Movies__18AB59A8]))
|--Sort(TOP 1, ORDER BY[Expr1004] ASC))
|--Compute Scalar(DEFINE[m].[MovieID]=[m].[MovieID], [Expr1004]=newid()))
|--Clustered Index Scan(OBJECT[tempdb].[dbo].[#Movies_____________________________________________________________________________________________________________00000000303D].[PK__#Movies__18AB59A8] AS [m]), WHERE[m].[GenreID]=[#Movies].[GenreID]))
Carr
|--Merge Join(Union)
|--Merge Join(Union)
| |--Merge Join(Union)
| | |--Merge Join(Union)
| | | |--Merge Join(Union)
| | | | |--Merge Join(Union)
| | | | | |--Merge Join(Union)
| | | | | | |--Merge Join(Union)
| | | | | | | |--Merge Join(Union)
| | | | | | | | |--Top(1)
| | | | | | | | | |--Clustered Index Scan(OBJECT[tempdb].[dbo].[#Movies_____________________________________________________________________________________________________________00000000303D].[PK__#Movies__18AB59A8]), WHERE[#Movies].[GenreID]=1))
| | | | | | | | |--Top(1)
| | | | | | | | |--Clustered Index Scan(OBJECT[tempdb].[dbo].[#Movies_____________________________________________________________________________________________________________00000000303D].[PK__#Movies__18AB59A8]), WHERE[#Movies].[GenreID]=2))
| | | | | | | |--Top(1)
| | | | | | | |--Clustered Index Scan(OBJECT[tempdb].[dbo].[#Movies_____________________________________________________________________________________________________________00000000303D].[PK__#Movies__18AB59A8]), WHERE[#Movies].[GenreID]=3))
| | | | | | |--Top(1)
| | | | | | |--Clustered Index Scan(OBJECT[tempdb].[dbo].[#Movies_____________________________________________________________________________________________________________00000000303D].[PK__#Movies__18AB59A8]), WHERE[#Movies].[GenreID]=4))
| | | | | |--Top(1)
| | | | | |--Clustered Index Scan(OBJECT[tempdb].[dbo].[#Movies_____________________________________________________________________________________________________________00000000303D].[PK__#Movies__18AB59A8]), WHERE[#Movies].[GenreID]=5))
| | | | |--Top(1)
| | | | |--Clustered Index Scan(OBJECT[tempdb].[dbo].[#Movies_____________________________________________________________________________________________________________00000000303D].[PK__#Movies__18AB59A8]), WHERE[#Movies].[GenreID]=6))
| | | |--Top(1)
| | | |--Clustered Index Scan(OBJECT[tempdb].[dbo].[#Movies_____________________________________________________________________________________________________________00000000303D].[PK__#Movies__18AB59A8]), WHERE[#Movies].[GenreID]=7))
| | |--Top(1)
| | |--Clustered Index Scan(OBJECT[tempdb].[dbo].[#Movies_____________________________________________________________________________________________________________00000000303D].[PK__#Movies__18AB59A8]), WHERE[#Movies].[GenreID]=8))
| |--Top(1)
| |--Clustered Index Scan(OBJECT[tempdb].[dbo].[#Movies_____________________________________________________________________________________________________________00000000303D].[PK__#Movies__18AB59A8]), WHERE[#Movies].[GenreID]=9))
|--Top(1)
|--Clustered Index Scan(OBJECT[tempdb].[dbo].[#Movies_____________________________________________________________________________________________________________00000000303D].[PK__#Movies__18AB59A8]), WHERE[#Movies].[GenreID]=10))
N 56°04'39.16"
E 12°55'05.25"
August 3, 2007 at 3:18 pm
Well, good grief, this wasn't even past the proof of concept stage. I don't think OP was concerned at this point about performance. He seemed more interested in seeing if it could be done at all. If either quasi-random solution (and I pointed out the fact that each listing would be the same) was good enough, then we could have worked on the performance.
Find out the direction you want to go before you start paving the road.
BTW, here's another possible solution:
Here's what I threw together. And for the purists among you, I know the performance can be improved by indexes and I didn't define the relationship between the two tables. There has to be something left to the reader to do. The trigger could be improved also, I'm sure. This is just an example, guys, not production code. (Anyone who wants polished code, contact me. My rates are reasonble. )
create table Movies ( MovieID int identity primary key, GenreID int, Title varchar(10) ); create table GenreList ( GenreID int primary key, Title varchar(10) ); go create trigger Movies_BI on Movies after insert, delete, update as begin if @@RowCount = 0 return; -- Well, just never mind then. set NoCount on; if exists (select * from deleted) begin -- We're either deleting or updating. Either way, remove the effected -- rows (if they exist) from the listing table. delete from GenreList where GenreID in (select GenreID from deleted); end if exists (select * from inserted) -- Inserting or modifying begin -- If this is an insert, there is the possibility one or more are the first insert -- for a genre. For the second or subsequent entries, the delete statement -- above will not have effected that genre's entry in the list table. So we -- have to update them. update GenreList set GenreList.Title = I.Title from GenreList L join ( select I1.GenreID, Max(I1.Title) Title from Inserted I1 group by I1.GenreID ) I on I.GenreID = L.GenreID where L.GenreID in (select GenreID from GenreList); -- All that are left are first inserts for a genre, updates to existing genres -- or updates that change the genre. In any of these instances, there will be no -- entries in the list table so we now have to insert them. insert into GenreList (GenreID, Title) select GenreID, Max(Title) from inserted i where i.GenreID not in (select GenreID from GenreList) group by GenreID; end--if -- If a movie that appears in the list table is deleted or the genre of a movie is changed, -- there remains a "gap" for the deleted/old genre because it is deleted from the listing -- table but not replenished. Here we fix that. If there are gaps, insert a replacement. -- Note: some people are uncomfortable querying a mutating table. Fine, find another way or -- make a business rule for changing the genre of a movie: first delete existing entry then insert -- entry with new genre. You might even implement this recursively within the trigger. insert into GenreList (GenreID, Title) select GenreID, Max(Title) from movies m where m.GenreID not in (select GenreID from GenreList) group by GenreID; end--Trigger go -- Insert the data in a haphazard order. The titles are in the form "Genre x, Title y" insert into Movies (GenreID, Title) select 10, 'G10T1' union all select 4, 'G4T2' union all select 2, 'G2T2' union all select 1, 'G1T3' union all select 6, 'G6T1' union all select 3, 'G3T5' union all select 8, 'G8T1' union all select 3, 'G3T4' union all select 3, 'G3T2' union all select 1, 'G1T2' union all select 4, 'G4T1' union all select 3, 'G3T1' union all select 5, 'G5T1' union all select 3, 'G3T3' union all select 1, 'G1T1' union all select 2, 'G2T1' union all select 7, 'G7T1' union all select 9, 'G9T1'; select * from GenreList; -- Create a new entry for an existing genre insert into Movies (GenreID, Title) values( 10, 'G10T2'); -- Create a new entry for a new genre insert into Movies (GenreID, Title) values( 11, 'G11T1'); -- Update the title of an entry update Movies set Title = 'G3T-One' where Title = 'G3T1'; select * from GenreList; -- Update the title and genre of an entry (new genre exists) update Movies set Title = 'G4T3', GenreID = 4 where Title = 'G3T-One'; -- Update the title and genre of an entry (new genre doesn't exist) update Movies set Title = 'G12T1', GenreID = 12 where Title = 'G4T1'; select * from GenreList; -- Update the title and genre of an entry that is NOT -- one in the list update Movies set Title = 'G3T4', GenreID = 3 where Title = 'G1T1'; select * from GenreList;
Tomm Carr
--
Version Normal Form -- http://groups.google.com/group/vrdbms
August 3, 2007 at 4:59 pm
A relational database doesn't really do random well... If you want truly random results from one query to another... . |
Tomm,
I just gotta ask why you say either of those things or why you think Peter's sweet bit of code doesn't do it?
--Jeff Moden
Change is inevitable... Change for the better is not.
August 6, 2007 at 11:27 am
I didn't say random access couldn't be done, I just said that an RDMS doesn't really lend itself to random access. Am I wrong?
And there is nothing intrinsically wrong with Peter's code. It fulfills all the requirements. But it puts the work in the query.
This "rule" has decreased in importance over the years as increases in memory and cpu speeds has far outpaced disk access speeds. But it is still something I like to keep in mind because, depending on -- well, you know, many things -- it could still be relevant.
So I provided an alternate "solution." Solution is quoted because it did not quite match the requirements. It was quasi-random in that it wasn't random at all but the pattern was hidden from the user and the result set could remain static between queries. So if OP's requirements demanded a different result set every time the query is executed, there is no help for it -- it is Peter's solution (or one like it) or nothing. Generally, however, I find few people complain that they have been given too many solutions to a problem.
In thinking further about it, I have one question. I haven't had much experience with quids so I know almost nothing about the particulars of using them. But it seems to me there would be a certain amount of overhead, particularly locking, that goes into guaranteeing uniqueness. So, in a very busy application, like Amazon or Google, where this query might possibly be called millions of times in a day or so, or peak hit rates of thousands of times per second, does/could calling NewID() ten times for every query create a problem?
Tomm Carr
--
Version Normal Form -- http://groups.google.com/group/vrdbms
August 6, 2007 at 5:02 pm
In light of Peter's solution, I'd have to say "Yeah, you're wrong" Also, what's wrong with putting "the work in the query"? Isn't that where most of the SQL work is done?
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 15 (of 23 total)
You must be logged in to reply to this topic. Login to reply