August 30, 2022 at 6:08 pm
I need a random sampling of 1% of a population of last names and spoken languages in a county. The lastname does not have to correspond to the person's language, immaterial. My problem is sometimes all the last names do not appear in the results. I need a way of making sure (even if it slightly exceeds 1%) that all the names are returned. I consistently get 167 names but sometimes rarer names like 'Poindexter' fall out of the results. If I run the query again 'Poindexter' may be present but 'Hermann' drops out. Any ideas? The end data needs to look something like:
ID language lastname
1515 Spanish Heath
1620 English Edwards
1452 Korean Smith
1350 Japanese Smith
with cte1 -- to build set of languages - all five need to appear
as (select distinct language from county.reporting.dbo.2022),
cte2 -- to build set of lastnames - all 35 must appear, some can be multiples
as (select distinct lastname from county.reporting.dbo.2022),
cte3 -- build final dataset prior to sampling
as (
select * from county.reporting.dbo.2022
where exists (select * from cte1)
union
select * from county.reporting.dbo.2022
where exists (select * from cte2)
)
select top 1 PERCENT * from cte3 order by newID();
August 31, 2022 at 6:39 am
First problem:
This cte will return all rows of county.reporting.dbo.2022 whenever there are results in cte1 ! ( no where clause in exists query )
Same for the second union part
deduplicated by the fact that you are using UNION in stead of UNION ALL.
cte3 -- build final dataset prior to sampling
as (
select * from county.reporting.dbo.2022
where exists (select * from cte1)
union
select * from county.reporting.dbo.2022
where exists (select * from cte2)
)
That's the logic of 1% of a set, if all languages must be present and you determine the 1% sorting a random uniqueidentifier ?
( btw since it is a linked server reference, guess how the engine will handle such query ???!!! )
Johan
Learn to play, play to learn !
Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:
- How to post Performance Problems
- How to post data/code to get the best help[/url]
- How to prevent a sore throat after hours of presenting ppt
press F1 for solution, press shift+F1 for urgent solution 😀
Need a bit of Powershell? How about this
Who am I ? Sometimes this is me but most of the time this is me
Viewing 2 posts - 1 through 1 (of 1 total)
You must be logged in to reply to this topic. Login to reply