using group by and count together.

  • Hi Guys,

    Just a small one this time.

    I have a table which contains household insurance quotations.

    As is normal with insurance each customer can do multiple quotes for each policy, they change little things like their excess, the number of days of unnocupancy they have etc etc to get a good balance between low premium and good cover.

    In order to report on conversion rates (percentage of customers who took out an insurance policy after quoting) we need to count up the number of quotes have been done.

    Now if we say that each one of those individual quotes are 1 quote, our conversion rate will be horriffically low. As such we want to group those quotes up so If a customer phones in to get a quote on tuesday, thinks about it, does another quote on wednesday then finally takes out a policy on thursday this is counted as one quote and not 3.

    We have a file of 'all' quotes (imaginatively named) tblHhiQuotes

    in order to get the desired 'unique' quotes we are grouping the quotes by customer surname, customer postcode and insurance cover type (eg contents, buildings or integrated)

    All I want to get is an integer count... I can achieve this in the following manner....

    DECLARE @count AS TABLE (one bit)

    INSERT INTO @count

    SELECT1

    FROMtblHhiQuotes

    WHEREsGlDate <= '28 Oct 2010'

    AND sEffectiveDate BETWEEN

    '01 Nov 2010' AND '30 Nov 2010'

    AND cTransType = 'FQ'

    GROUP BYvSurname

    ,cPostcode

    ,cCoverType

    --output...

    SELECTCOUNT(*)

    FROM@count

    this works and returns a value of 2173

    Initially I tried select count(*) from..... but this of course returns 2173 rows of small numbers (those being the number of 'duplicates')

    I'm happy that my program works and returns a value of 2173 but I'm thinking there MUST be a more efficient way of doing this without resorting to inserting records into a temporary table. I may well be being very very stupid and forgetting some extremely simple function but I cant think of any....

    Is there a built in function of count or group by that im missing here?

    Thanks!

    Ben

    ^ Thats me!

    ----------------------------------------
    01010111011010000110000101110100 01100001 0110001101101111011011010111000001101100011001010111010001100101 01110100011010010110110101100101 011101110110000101110011011101000110010101110010
    ----------------------------------------

  • ;WITH ctSource(Data)

    AS (

    SELECT1

    FROMtblHhiQuotes

    WHEREsGlDate <= '28 Oct 2010'

    AND sEffectiveDate BETWEEN '01 Nov 2010' AND '30 Nov 2010'

    AND cTransType = 'FQ'

    GROUP BYvSurname,

    cPostcode,

    cCoverType

    )

    SELECTSUM(Data)

    FROMcteSource


    N 56Β°04'39.16"
    E 12Β°55'05.25"

  • thats brilliant thanks.

    I'm surprised there isnt an alternative version of count in sql server that returns the number of rows like a numrows() function or whatever.

    Thanks

    Ben

    ^ Thats me!

    ----------------------------------------
    01010111011010000110000101110100 01100001 0110001101101111011011010111000001101100011001010111010001100101 01110100011010010110110101100101 011101110110000101110011011101000110010101110010
    ----------------------------------------

  • BenWard (10/28/2010)


    thats brilliant thanks.

    I'm surprised there isnt an alternative version of count in sql server that returns the number of rows like a numrows() function or whatever.

    Thanks

    Like this version?

    SELECT COUNT(*)

    FROM (

    SELECT 1

    FROM tblHhiQuotes

    WHERE sGlDate <= '28 Oct 2010'

    AND sEffectiveDate BETWEEN

    '01 Nov 2010' AND '30 Nov 2010'

    AND cTransType = 'FQ'

    GROUP BY vSurname

    ,cPostcode

    ,cCoverType

    ) d

    πŸ˜‰


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • haha yes

    which is (I think) exactly the same as

    ;WITH cteSource(Data)

    AS (

    SELECT 1

    FROM tblHhiQuotes

    WHERE sGlDate <= '28 Oct 2010'

    AND sEffectiveDate BETWEEN '01 Nov 2010' AND '30 Nov 2010'

    AND cTransType = 'FQ'

    GROUP BY vSurname,

    cPostcode,

    cCoverType

    )

    SELECT COUNT(*)

    FROM cteSource

    isnt it?!?

    I'm sure we could spend many many hours attempting to determine if we can save 13 milliseconds of processing time by redoing this that or the other here!

    Ben

    ^ Thats me!

    ----------------------------------------
    01010111011010000110000101110100 01100001 0110001101101111011011010111000001101100011001010111010001100101 01110100011010010110110101100101 011101110110000101110011011101000110010101110010
    ----------------------------------------

  • BenWard (10/28/2010)


    haha yes

    which is (I think) exactly the same as

    ;WITH cteSource(Data)

    AS (

    SELECT 1

    FROM tblHhiQuotes

    WHERE sGlDate <= '28 Oct 2010'

    AND sEffectiveDate BETWEEN '01 Nov 2010' AND '30 Nov 2010'

    AND cTransType = 'FQ'

    GROUP BY vSurname,

    cPostcode,

    cCoverType

    )

    SELECT COUNT(*)

    FROM cteSource

    isnt it?!?

    Heh yep you're absolutely right, they're functionally equivalent, though I wouldn't go so far as to say they are the same.

    I haven't got an instance to play with today - I wonder if any of the extensions to GROUP BY (e.g. WITH ROLLUP) would do the trick in a single statement?

    I'm sure we could spend many many hours attempting to determine if we can save 13 milliseconds of processing time by redoing this that or the other here!

    Big topic, and current, "Speed Phreakery" or something. Well worth a read.


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • I did try rollup, that just gave extra records.

    I'll give that a read...

    Ben

    ^ Thats me!

    ----------------------------------------
    01010111011010000110000101110100 01100001 0110001101101111011011010111000001101100011001010111010001100101 01110100011010010110110101100101 011101110110000101110011011101000110010101110010
    ----------------------------------------

  • BenWard (10/28/2010)


    thats brilliant thanks.

    I'm surprised there isnt an alternative version of count in sql server that returns the number of rows like a numrows() function or whatever.

    Thanks

    You mean like Count(DISTINCT vSurname + cPostcode + cCoverType ) ?

    You may need to do conversions to get the datatypes to match.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • you

    are

    a ...

    genius!

    πŸ˜€

    Ben

    ^ Thats me!

    ----------------------------------------
    01010111011010000110000101110100 01100001 0110001101101111011011010111000001101100011001010111010001100101 01110100011010010110110101100101 011101110110000101110011011101000110010101110010
    ----------------------------------------

  • Hahaha that's too funny!

    I bet it's the same performance too.

    Ben, if you're interested and have the time, you could assess all three queries for performance using the simplest method:

    SET STATISTICS TIME ON

    SET STATISTICS IO ON

    <<statements>>

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

    Run three or four times per statement and take average, the first run of each statement is likely to be slowest due to cacheing of the subsequent runs.

    If you're on a dev box on your own, I think you can use something like DBCC FREEPROCCACHE before each run, which you would then do only once.

    Measuring the performance of a statement or batch can get quite involved. The method above is quick and reliable enough to use during production coding to tie-break between functionally-equivalent methods.


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • Or maybe this, since PARTITION BY generally seems to be pretty fast in SQL Server:

    ;WITH cte1 AS (

    SELECT ROW_NUMBER() OVER (PARTITION BY vSurname, cPostcode, cCoverType ORDER BY vSurname)

    AS rownum

    FROM @count

    )

    SELECT COUNT(*)

    FROM cte1

    WHERE rownum = 1

    I'd be afraid that the concatenation may be a bit slow, since string manipulation is definitely a weak point for SQL.

    Scott Pletcher, SQL Server MVP 2008-2010

  • And the results are in !

    My query was the slowest (as was expected lol)

    most common speed 156ms with a low of 141 and a high of 196

    3rd place goes to Peso whose query most commonly executes in 141ms with a low of 130 and a high of 156

    2nd is scott, the partition table can be very fast but is not reliably so, I get even results of either 125ms and 141ms with the occaisional high of 170

    1st place goes to drew with a most common time of 125ms, a low of 110 and a high of 125.

    here is your prize:

    Thanks Guys πŸ™‚

    As always, very impressed by the great community spirit here.

    Ben

    ^ Thats me!

    ----------------------------------------
    01010111011010000110000101110100 01100001 0110001101101111011011010111000001101100011001010111010001100101 01110100011010010110110101100101 011101110110000101110011011101000110010101110010
    ----------------------------------------

  • BenWard (10/29/2010)


    And the results are in !

    My query was the slowest (as was expected lol)

    most common speed 156ms with a low of 141 and a high of 196

    3rd place goes to Peso whose query most commonly executes in 141ms with a low of 130 and a high of 156

    2nd is scott, the partition table can be very fast but is not reliably so, I get even results of either 125ms and 141ms with the occaisional high of 170

    1st place goes to drew with a most common time of 125ms, a low of 110 and a high of 125.

    Ahem...how did the derived table compare with the CTE? πŸ˜‰

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • erm....

    Msg 8155, Level 16, State 2, Line 6

    No column was specified for column 1 of 'd'.

    :Whistling:

    Ben

    ^ Thats me!

    ----------------------------------------
    01010111011010000110000101110100 01100001 0110001101101111011011010111000001101100011001010111010001100101 01110100011010010110110101100101 011101110110000101110011011101000110010101110010
    ----------------------------------------

  • SELECTSUM(1) AS Items

    FROMdbo.tblHhiQuotes

    WHERE sGlDate <= '28 Oct 2010'

    AND sEffectiveDate BETWEEN '01 Nov 2010' AND '30 Nov 2010'

    AND cTransType = 'FQ'

    GROUP BYvSurname,

    cPostcode,

    cCoverType

    WITH ROLLUP

    HAVINGGROUPING(vSurname) = 1


    N 56Β°04'39.16"
    E 12Β°55'05.25"

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply