March 18, 2010 at 4:55 am
Dear all:
I have to implement Stratified Random Sampling in my application which is a part of the DataWare house software. Here's the scenario:
-let’s assume that we have a sample universe of 500 records for a given dealer
-Now let’s suppose that we need a sample for analysis of 50 and we expect a response rate of 50%. On this basis we need to select 100 records to send the survey to from the 500 available
-In order to properly represent the 500 records available to use we need to ensure that each respondent has an equal chance of being selected for inclusion in our sample of 200. We do this by:
ocalculating a 1 in N sampling interval # - this equals the sample size required to which we will send a survey ( in this case 100) divided by the total sample universe available ( in this case 500)
oIn our example we have a 1 in 5 sampling interval - in other words we will take every fifth record
oIt’s important too to ensure that we choose a random start point - we therefore need to generate a random number (between 1 and 500 in our case). Whatever the number generated is , that record becomes our start point for sampling
oFor example if we generated a random number of 491 then the 491st record would be our first selected record. From that record we’d start our sample selection using the 1 in 5 sampling interval calculated for this study so the next selected record would be 496 and so on
oOnce the maximum record number in the universe has been reached then the count continues from the bottom up. So again in our example our third iteration would take us beyond 500 and the next selected record would be record number 1. The next after that would be record 6, etc until we have our 100 records.
I am aware of NewID() and TableSample and ramdom column techniques, but I am a bit lost about the approach over. I am listing down the schema of the available source tables below.
Client_Customers
ClientCustomerID Bigint, PK, Identitiy
Description nvarchar(max)
InsertDateTime DateTime
UpdateDateTime DateTime
ClientCustomerAttributes
ClientCustomerAttributeIDbigint, PK, Identity
Descriptionnvarchar(max)
ClientCustomerValues
ClientCustomerValueIDbigintPKIdentity
Valuenvarchar(max)
CilentCustomerIDbigint
ClientCustomerAttributeIDbigint
Client_Products
ClientProductID Bigint, PK, Identitiy
Description nvarchar(max)
InsertDateTime DateTime
UpdateDateTime DateTime
ClientProductAttributes
ClientProductAttributeIDbigint, PK, Identity
Descriptionnvarchar(max)
ClientProductValues
ClientProductValueIDbigintPKIdentity
Valuenvarchar(max)
CilentProductIDbigint
ClientProductAttributeIDbigint
Client_Transactions
ClientTransactionID Bigint, PK, Identitiy
Description nvarchar(max)
ClientCustomerIDBigint FK
ClientProductIDBigint FK
InsertDateTime DateTime
UpdateDateTime DateTime
ClientTransactionAttributes
ClientTransactionAttributeIDbigint, PK, Identity
Descriptionnvarchar(max)
ClientTransactionValues
ClientTransactionValueIDbigintPKIdentity
Valuenvarchar(max)
CilentTransactionIDbigint
ClientTransactionAttributeIDbigint
regards,
Kazim Raza
March 18, 2010 at 5:26 am
Is it not as simple as generating a list of numbers, starting at random 'R', then adding 'N' until enough records are selected?
If so, write something to populate a temporary table with these key numbers, then join to the real data, using the ROW_NUMBER ranking function to assign a sequential number to each data row.
Paul
March 18, 2010 at 5:36 am
Yes Paul, I know its not simple and that's the reason I have turned up here to get expert opinion to get the basic groundwork correct. As I mentioned in my post, I am seeking opinion on what approach should I adopt?
Regards,
Kazim Razas
March 18, 2010 at 5:45 am
It is reasonably simple, as I outlined in my last post.
If it doesn't make sense yet, provide a simplified example that we can work with to outline the concept.
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply