t-test in SQL

  • Has anyone seen a way of implementing the stats t-test function in sql?

    I understand it is available in Excel and Oracle but not SQL Server.

    Allen

  • Nothing native as far as I know (unless you want to programme something yourself....)

    My guess would be that Microsoft would expect you to use Excel to perform the t-test based on a SQL datasource - not ideal, obviously, especially on large data sets.

    However, there are a few third party tools available. The one I've been able to find is Total SQL Statistics, although I should point out firstly that I've not tried it myself and secondly that it's about $1,000 for a single seat licence, so you definitely need to really want that functionality.

    Could it be worth posting a bit more detail about what you're trying to achieve? Is this a one-off problem or functionality to fill an ongoing gap?

    Semper in excretia, suus solum profundum variat

  • Thanks. I am trying to automate the creation of test and control groups for marketing purposes against a total population of 20,000 accounts. There are approximately 16 mailing campaigns but not all accounts are mailed in each campaign. Ideally we want a Test account to be test across all of the campaigns that account is mailed in, similarly a control account would always be a control account.

    The current way our stats guys allocate test and control is by using a randomizing function in SAS to create a ranking then checking the top 90% and bottom 10% are 'statistically equivalent' (not exactly sure what that means but they use a t-test to determine it). If the t-test output is OK then the top 90% is test and the bottom 10% control. If not the random ranking is run again followed by the t-test until we get an OK distribution.

    This is a fairly manual procedure which I would like to put in a loop in a stored proc, exiting the loop when we get an OK distribution.

    Allen

  • OK, I understand.

    It almost sounds as if you've got the tail wagging the dog a bit here. It just seems a little surprising you're being asked to use an application with very limited statistical capabilities (SQL Server) instead of a hugely statistically capable one (SAS) to perform a statistical test when you have both readily available.

    Although my SAS knowledge is VERY rudimentary, my wife happens to be a statistician who uses it day in, day out, so I checked with her. It seems it should be perfectly possible to write a recursive routine in SAS that assigns the t-test output to a variable, then calls itself if the variable's value is outside a predefined range. In other words, going only on what you've said so far, I'd suggest automating the SAS process further rather than moving to SQL Server and rewriting from scratch.

    Mind you, from my experience of companies that use SAS alongside other RDBMS's (Oracle, DB2, SQL Server etc.), whether to perform something in SAS or at DB level often seems to end up as a political rather than efficiency decision. Is that an obstacle you're up against?

    Semper in excretia, suus solum profundum variat

  • Agree with you on that however I'm OK on SQL but would not know where to start in SAS.

  • No problem. In that case, I think your only option is going to be to write your own stored procedure. I've had a look at the t-test equation (at http://en.wikipedia.org/wiki/Student's_t-test), and realised the following:

    1. Each of the constituent calculations is relatively straightforward (the most awkward being Standard Deviation, for which there is a t-sql function) as long as you understand a bit about statistics.

    2. Straightforward it may be in purely statistical terms, but I reckon it'd have me headscratching for a day or so before I could come up with any SQL in which I had any confidence.

    So it's do-able, but not pretty. Bear in mind, though, that whether the t-test output indicates statistical equivalence or not is a stats person's subjective decision, so you'll need to get them to decide a threshold they're happy with.

    Sorry I can't be more help.

    Semper in excretia, suus solum profundum variat

  • Thanks for this. I'll see if I can get some help from our stats guys to translate the equation into SQL, I should have paid more attention in maths!

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply