Count(distinct columnName) Problem

  • Hi,

    I am new to T-SQL and I have the following problem to solve.  I have a table (TestInput) with following columns:

    TestInputId [numeric](10, 0) IDENTITY (1, 1) NOT NULL ,

    SerialNumber [varchar] (10) ,                                               

    EngineType [varchar] (10) ,

    ComponentId [numeric](4, 0) ,

    DefectId [numeric](4, 0) ,

    Direct [bit] ,

    DateEntered [datetime] ,

    TimeEntered [datetime]

    I need to find the number of engines (distinct SerialNumber), Number of direct runners (records where direct = 1), direct runner percentage, and date (DateEntered) for a specific date range. Grouped by DateEntered.

    Any suggestions will be helpful.

  • select count(distinct SerialNumber),sum(cast(Direct as int)),

    (sum(cast(Direct as int)) * 100) / count(distinct SerialNumber)

    from TestInput

    group by DateEntered

    Note: Direct count is all direct including non distinct Serial number !!

    Far away is close at hand in the images of elsewhere.
    Anon.

  • Thank you for your quick reply. This is what I came up with:

    SELECT     COUNT(DISTINCT A.serialnumber) AS Direct, COUNT(DISTINCT B.SerialNumber) AS Total, COUNT(DISTINCT A.serialnumber)

                          * 100 / COUNT(DISTINCT B.SerialNumber) AS Percentage, B.DateEntered

    FROM         (SELECT DISTINCT serialnumber

                           FROM          tblTestInput

                           WHERE      direct = 1 AND DateEntered BETWEEN '5/2/2004' AND '5/31/2004') A RIGHT OUTER JOIN

                          dbo.tblTestInput B ON A.serialnumber = B.SerialNumber

    WHERE     (B.DateEntered BETWEEN '5/2/2004' AND '5/31/2004')

    GROUP BY B.DateEntered

    ORDER BY B.DateEntered

    But I like your query better.  Would you mind explaining it to me, so that I can understand what is going on during execution.

  • OK, my solution did not distinct the SerialNumbers for Directs. Your solution will give an erroneous Direct count if there are no Directs for a day.

    Try this

    SELECT SUM(Direct) AS [Direct],

    SUM(Total) AS [Total],

    (SUM(Direct) * 100) / SUM(Total) AS [Percentage],

    DateEntered

    FROM (

    SELECT 1 AS [Total],

    MAX(CAST(Direct as int)) AS [Direct],

    DateEntered

    FROM tblTestInput

    WHERE DateEntered BETWEEN '5/2/2004' AND '5/31/2004'

    GROUP BY DateEntered,SerialNumber

    ) a

    GROUP BY DateEntered

    ORDER BY DateEntered

    What this does is to firstly, as a subquery, get for each unique serialnumber per date (within date range), a 1 (for counting the total) and the max value of Direct (which will be either 0 if none or 1 if there is at least one) for counting the Directs.

    From the results of the subquery we sum (add up) the total, the directs and calc the percentage.

    This should work OK and has a neater execution plan than any other solution I tried.

    Far away is close at hand in the images of elsewhere.
    Anon.

  • Thank you very much for your solution.  The query runs beautifully.

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply