Quering data from same table

  • CREATE TABLE SampleData

    (

    ID INTEGER PRIMARY KEY,

    UID INTEGER NOT NULL,

    Stage INTEGER NOT NULL,

    Status INTEGER NOT NULL,

    Date DATETIME NOT NULL

    );

    GO

    INSERT SampleData

    (ID,UID,Stage,Status,Date)

    VALUES (01, 1, 1, 2, '12 Jan 2010'),

    (02, 2, 2, 2, '13 Jan 2010'),

    (03, 1, 1, 3, '13 Jan 2010'),

    (04, 3, 2, 1, '20 Jan 2010'),

    (05, 1, 2, 2, '25 Jan 2010'),

    (06, 2, 1, 1, '25 Jan 2010'),

    (07, 1, 2, 2, '02 Feb 2010'),

    (08, 1, 1, 1, '02 Feb 2010'),

    (09, 2, 2, 3, '12 Feb 2010'),

    (10, 3, 2, 2, '12 Feb 2010'),

    (11, 2, 1, 1, '12 Feb 2010'),

    (12, 1, 2, 3, '12 Feb 2010'),

    (13, 2, 1, 2, '12 Feb 2010'),

    (14, 2, 2, 3, '22 Feb 2010');

    GO

    From table I want for given date range each user wise Stage & Status performance Where I want count for 1. stage 1 & status 3 2. stage 2 & status 2 3. stage 2 & status 3 4. stage 3 & status 3

    only right now my query is

    Select UID,

    sum( (case Stage when 1 then (case Status when 3 then 1 else 0 end ) else 0 end )) as Stage13,

    sum( (case Stage when 2 then (case Status when 2 then 1 else 0 end ) else 0 end ) ) as Stage22,

    sum( (case Stage when 2 then (case Status when 3 then 1 else 0 end ) else 0 end ) ) as Stage23,

    sum( (case Stage when 3 then (case Status when 3 then 1 else 0 end ) else 0 end ) ) as Stage33

    from SampleData where date between # and #

    group by UID

    i have morethen 1,00,00,000 datarow where performance is very slow any other method??

  • You know, the people that help out here are all un-paid volunteers. Providing the DDL scripts (CREATE TABLE, CREATE INDEX, etc.) for the tables affected, and INSERT statements to put some test data into those tables that shows your problem will go a long way in getting people to look at your issue and help you out. Please include code for what you have already tried. Don't forget to include what your expected results should be, based on the sample data provided. As a bonus to you, you will get tested code back. For more details on how to get all of this into your post, please look at the first link in my signature.

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • Indexed view.

  • To expand on my previously reply slightly...

    USE tempdb;

    GO

    IF OBJECT_ID(N'dbo.PreAggregatedSampleData', N'V') IS NOT NULL DROP VIEW dbo.PreAggregatedSampleData;

    IF OBJECT_ID(N'dbo.SampleData', N'U') IS NOT NULL DROP TABLE dbo.SampleData;

    GO

    CREATE TABLE dbo.SampleData

    (

    id INTEGER PRIMARY KEY,

    [user_id] INTEGER NOT NULL,

    stage INTEGER NOT NULL,

    [status] INTEGER NOT NULL,

    [date] DATETIME2 NOT NULL

    );

    GO

    INSERT dbo.SampleData

    (id ,[user_id], [stage], [status], [date])

    VALUES (01, 1, 1, 2, '12 Jan 2010'),

    (02, 2, 2, 2, '13 Jan 2010'),

    (03, 1, 1, 3, '13 Jan 2010'),

    (04, 3, 2, 1, '20 Jan 2010'),

    (05, 1, 2, 2, '25 Jan 2010'),

    (06, 2, 1, 1, '25 Jan 2010'),

    (07, 1, 2, 2, '02 Feb 2010'),

    (08, 1, 1, 1, '02 Feb 2010'),

    (09, 2, 2, 3, '12 Feb 2010'),

    (10, 3, 2, 2, '12 Feb 2010'),

    (11, 2, 1, 1, '12 Feb 2010'),

    (12, 1, 2, 3, '12 Feb 2010'),

    (13, 2, 1, 2, '12 Feb 2010'),

    (14, 2, 2, 3, '22 Feb 2010');

    GO

    CREATE VIEW dbo.PreAggregatedSampleData

    WITH SCHEMABINDING

    AS

    SELECT [date], [user_id], stage, [status], COUNT_BIG(*) AS group_row_count

    FROM dbo.SampleData

    GROUP BY

    [date], [user_id], stage, [status]

    GO

    CREATE UNIQUE CLUSTERED INDEX c

    ON dbo.PreAggregatedSampleData ([date], [user_id], [stage], [status]);

    GO

    SELECT Pivoted.[user_id],

    Pivoted.Stage11,

    Pivoted.Stage12,

    Pivoted.Stage13,

    Pivoted.Stage21,

    Pivoted.Stage22,

    Pivoted.Stage23

    FROM (

    -- Pre-aggregated data from the indexed view

    SELECT [user_id],

    column_name = 'Stage' + CONVERT(VARCHAR(2), [stage] * 10 + [status]),

    row_count = COUNT_BIG(*)

    FROM dbo.PreAggregatedSampleData SDB

    WHERE date BETWEEN '1 Jan 2010' and '1 Mar 2010'

    GROUP BY

    [user_id], [stage], [status]

    ) T

    PIVOT (

    -- Reformat into the desired columns

    SUM(row_count)

    FOR column_name IN

    (

    [Stage11],

    [Stage12],

    [Stage13],

    [Stage21],

    [Stage22],

    [Stage23]

    )

    ) Pivoted;

    GO

    IF OBJECT_ID(N'dbo.PreAggregatedSampleData', N'V') IS NOT NULL DROP VIEW dbo.PreAggregatedSampleData;

    IF OBJECT_ID(N'dbo.SampleData', N'U') IS NOT NULL DROP TABLE dbo.SampleData;

    GO

    -- END SCRIPT

    Output:

    user_idStage11Stage12Stage13Stage21Stage22Stage23

    1111NULL21

    221NULLNULL12

    3NULLNULLNULL11NULL

    Paul

  • thanks Wayne,

    I corrected my mistake..

  • Hi paul

    If i create index on same table is there any thing wrong.. insted of view index.

    Regards,

    Pritesh

  • pritesh-807803 (2/21/2010)


    If i create index on same table is there any thing wrong.. insted of view index.

    Pritesh,

    The point of an indexed view is that it automatically keeps partial aggregations up to date. The idea is that summing partial aggregations is potentially much faster than adding all the rows individually. Of course, maintaining the partial aggregations is not for free, so you need to factor that in to your considerations.

    Adding an index to the base table might not buy you much...you still need to add the rows up individually. I had rather assumed that you had already created an optimal index on the base table.

    If you are unsure what the purpose of an indexed view is compared to a normal view, start your research with the following Books Online entry: Types of Views

    This is a fairly advanced topic, so you need to do some investigation to make sure you understand the issues before making a decision.

    Paul

  • Rather than aggregating and analysing your data in a single step, run a full-speed preaggregate, then analyse the results. All that CASEing and SUMing one row at a time will slow things down. You can run the aggregate result into a temp table or you can use a derived table like this:

    SELECT UID,

    SUM( (case Stage when 1 then (case [Status] when 3 then NumRows else 0 end ) else 0 end ) ) as Stage13,

    SUM( (case Stage when 2 then (case [Status] when 2 then NumRows else 0 end ) else 0 end ) ) as Stage22,

    SUM( (case Stage when 2 then (case [Status] when 3 then NumRows else 0 end ) else 0 end ) ) as Stage23,

    SUM( (case Stage when 3 then (case [Status] when 3 then NumRows else 0 end ) else 0 end ) ) as Stage33

    FROM (

    SELECT UID, Stage, [Status], COUNT(*) AS NumRows

    FROM #SampleData

    WHERE [date] between # and #

    GROUP BY UID, Stage, [Status]

    ) d

    GROUP BY UID


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • That's nice code Chris. It represents a nice compromise between the benefits of an indexed view, and the drawbacks related to the maintenance of aggregates for that view. It ultimately won't be as fast as the indexed view approach, but it may be fast enough, and simpler, for this requirement. I like it.

    Paul

  • Paul White (2/21/2010)


    That's nice code Chris. It represents a nice compromise between the benefits of an indexed view, and the drawbacks related to the maintenance of aggregates for that view. It ultimately won't be as fast as the indexed view approach, but it may be fast enough, and simpler, for this requirement. I like it.

    Paul

    Thanks Paul, that's very kind.

    There's a very similar scenario at work, with three aggregated columns instead of two over a table of around 60M rows. It takes about a minute for an unrestricted aggregate (then about 5 seconds for the analysis) that only happens once-ish a year, the rest of the time it's say 50k rows preaggregated, processed & returned in an instant. It runs like a three-legged dog as a single-step process, taking many many times longer than the combined time of both steps run separately.

    We may well have to move to an indexed view as the table increases in size.


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply