December 11, 2015 at 1:00 am
Kiril,
at least your script can run a good deal faster, if you'll choose to go this way. You needn't copy the whole source table into #TBL and needn't so much scans on it.
DECLARE @CategoryId INT = 6565;
DECLARE @DS1MinValue INT= 1; --row_number() start
DECLARE @DS1MaxValue INT;
DECLARE @DS2MinValue INT= 1;
DECLARE @DS2MaxValue INT;
SELECT Id
,rnV = case when categoryId is not null then row_number() over(partition by categoryId order by (select 1)) end
,rnN = case when categoryId is null then row_number() over(partition by categoryId order by (select 1)) end
,CategoryId, SKU
INTO #TBL
FROM @t
WHERE CategoryId = @CategoryId OR categoryId IS NULL;
SELECT @DS1MaxValue= MAX(rnV),@DS2MaxValue=MAX(rnN)
FROM #TBL;
IF @DS1MaxValue IS NOT NULL
BEGIN
-- pick a random row from data set where CategoryId IS NOT NULL
SELECT * FROM #TBL WHERE rnv = ROUND(((@DS1MaxValue - @DS1MinValue) * RAND() + @DS1MinValue), 0);
END
ELSE
BEGIN
-- try pick a random row from data set where CategoryId IS NULL
SELECT * FROM #TBL WHERE rnN = ROUND(((@DS2MaxValue - @DS2MinValue) * RAND() + @DS2MinValue), 0);
END;
DROP TABLE #TBL;
December 11, 2015 at 5:40 am
kiril.lazarov.77 (12/10/2015)
TheSQLGuru (12/9/2015)
Handling gaps with my mechanism really is simple and efficient. Been there, done that, got the tshirt.But your new requirement about category makes it much more likely you will need to do the sort thing, although possibly with an exists check first. If row exists with your category, do the sort/rownumber thing for that category. If it doesn't exist do the sort/rownumber thing for null categories. As Scott said, these would hopefully all be narrow-index hits.
Hi Kevin, Many thanks for your help. So basically you're saying I should use the script I attached a couple of comments above? Any way I can make that faster? That's very slow when I run it on my db. Thanks
Sorry - I am on an international trip and pretty brain-dead right now, so I can't give detailed information or a script because it may well come out as gibberish! :hehe: But key points are:
1) don't put stuff into a temp table when you don't need to, and here I cannot see that it is necessary
2) you must do as-narrow-as-possible index seek/scan to accomplish your query
3) I still think (but cannot be certain without much more information about probabilities of hitting both a category and a gap) it would be faster to do an index seek first to see if ANY row exists with a non-null category passed in. If you get a hit, acquire the row using a full row_number random thing to get one row. If you don't get a hit, do the iterative check for a computed ID like I mentioned. If you have huge gaps of missing data this could be less efficient some of the time but likely still more efficient overall.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
Viewing 2 posts - 16 through 16 (of 16 total)
You must be logged in to reply to this topic. Login to reply