quantize ?

  • I have a very simple table :-

    datetime yyyy mm dd hh mm ss server session app user

    with multiple entries  every 15 minutes. Is there a way in t-sql to ask a question like

    how many sessions per user per server per hour and produce a report that looks something like

    hours             server      sessions

    00:01 - 12:00  server1    25

    00:01 - 12:00  server2    45

    .......

    12:01 - 13:00  server1    12

    12:01 - 13:00  server2    23

  • Mark,

    Does your table already split the date components into separate columns or did you mean to write it as?

    datetime (yyyy:mm:dd:hh:mm.ss), server, session, app user

    ll

  • I had to do similar for a table I had that holds captured database sizes, I used the following:

    ds_capturedDateTime = DateTime column

    ds_server = Server name

    ds_createdBy = Username

    SELECT DATEPART(year, ds_capturedDateTime), DATEPART(month, ds_capturedDateTime), DATEPART(day, ds_capturedDateTime), COUNT(*) AS DateCount FROM dbo.dbsize

     GROUP BY DATEPART(year, ds_capturedDateTime), DATEPART(month, ds_capturedDateTime), DATEPART(day, ds_capturedDateTime)

     ORDER BY DATEPART(year, ds_capturedDateTime) DESC, DATEPART(month, ds_capturedDateTime) DESC, DATEPART(day, ds_capturedDateTime) DESC

    You will need to change this to GROUP BY your server name as well, so maybe along the lines of (not tested):

    SELECT ds_server, DATEPART(year, ds_capturedDateTime) AS [Year], DATEPART(month, ds_capturedDateTime) AS [Month], DATEPART(day, ds_capturedDateTime) AS [Day], DATEPART(hour, ds_capturedDateTime) AS [Hour], DATEPART(minute, ds_capturedDateTime) AS [Minute], COUNT(*) AS UserCount FROM dbo.dbsize

          GROUP BY ds_server, DATEPART(year, ds_capturedDateTime), DATEPART(month, ds_capturedDateTime), DATEPART(day, ds_capturedDateTime), DATEPART(hour, ds_capturedDateTime), DATEPART(minute, ds_capturedDateTime)

          ORDER BY DATEPART(year, ds_capturedDateTime) DESC, DATEPART(month, ds_capturedDateTime) DESC, DATEPART(day, ds_capturedDateTime) DESC, DATEPART(hour, ds_capturedDateTime) DESC, DATEPART(minute, ds_capturedDateTime) DESC, ds_server

    Sample Resultset:

    ds_server, Year, Month, Day, Hour, Minute, UserCount

    NESTA-SQL01 2006 10 11 20 0 22

    NESTA-SQL01 2006 10 10 20 0 22

    NESTA-SQL01 2006 10 9 20 0 21

    NESTA-SQL01 2006 10 8 20 0 21

    NESTA-SQL01 2006 10 7 20 0 21

    NESTA-SQL01 2006 10 6 20 0 21

    NESTA-SQL01 2006 10 5 20 0 21

    NESTA-SQL01 2006 10 4 20 0 21

    NESTA-SQL01 2006 10 3 20 0 21

    NESTA-SQL01 2006 10 2 20 0 21

    NESTA-SQL01 2006 10 1 20 0 21

    NESTA-SQL01 2006 9 30 20 0 21

    NESTA-SQL01 2006 9 29 20 0 21

    NESTA-SQL01 2006 9 28 20 0 21

    NESTA-SQL01 2006 9 27 20 0 21

    NESTA-SQL01 2006 9 26 20 0 21

    NESTA-SQL01 2006 9 25 20 0 21

    NESTA-SQL01 2006 9 24 20 0 21

    NESTA-SQL01 2006 9 23 20 0 21

    You could then make this query a view and simply pull out the columns you need based on what you require, i.e.

    SELECT ds_server, [hour], ServerCount FROM dbsizeView 

       WHERE [hour] > 12 AND [hour] < 13

           AND [day] = DATEPART(day, GETDATE())

           AND [month] = DATEPART(month, GETDATE())

           AND [year] = DATEPART(year, GETDATE())

       GROUP BY ds_server, [hour], ServerCount

    I hope this is kind of what you were after, I lost my way a bit in the middle but think it will do roughly what you were asking for?

    ll 

  • no the hours minutes etc are already split out.

    aaaah...separate views sounds smart.....

    what I was wandering if some smart case statement might work....(we'll ignore the minutes issue for now 😉

    pseudo select follows

    SELECT

    count(CASE

    WHEN hours BETWEEN 0 AND 1 THEN sessionid

    END) as 1,

    count(CASE

    WHEN hours BETWEEN 1 AND 2 THEN sessionid THEN sessionid

    END) as 2

    FROM log some where clause group by server,whatever

    But I think journeymans suggestion seems sensible, given that this is going to pile up 250,000 (small) records a day, creating a summary tables/views will be essential anyway.

    Thanks All!

  • Oh no, no, no.... this is a lot easier than that... but I need you to post the CREATE statement for the table...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Add computed colmn:

    = dateadd(hh, DATEPART(hh, Entrytime), dateadd(dd, datediff(dd, 0, Entrytime), 0) ) AS HourStart

    and create index on it.

    After that you may group by that column.

    _____________
    Code for TallyGenerator

  • Kind of where I was going with this but it's important, I think, to include the date columns in that so that you endup with a nice, whole datetime as the computed column.

    Great minds think alike...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • CREATE TABLE [citrixlog] (

     [TimeGenerated] [datetime] NULL ,

     [GeneratedYear] [varchar] (255) COLLATE Finnish_Swedish_CI_AS NULL ,

     [GeneratedDay] [varchar] (255) COLLATE Finnish_Swedish_CI_AS NULL ,

     [GeneratedMon] [varchar] (255) COLLATE Finnish_Swedish_CI_AS NULL ,

     [GeneratedHour] [varchar] (255) COLLATE Finnish_Swedish_CI_AS NULL ,

     [GeneratedMin] [varchar] (255) COLLATE Finnish_Swedish_CI_AS NULL ,

     [GeneratedSec] [varchar] (255) COLLATE Finnish_Swedish_CI_AS NULL ,

     [ServerName] [varchar] (255) COLLATE Finnish_Swedish_CI_AS NULL ,

     [SessionID] [int] NULL ,

     [ClientName] [varchar] (255) COLLATE Finnish_Swedish_CI_AS NULL ,

     [AppName] [varchar] (255) COLLATE Finnish_Swedish_CI_AS NULL ,

     [UserName] [varchar] (255) COLLATE Finnish_Swedish_CI_AS NULL

    ) ON [PRIMARY]

     

  • 1) Why all those Generated... columns are varchar(255) ?

    Just in case?

    Tinyint would be absolutely enough. Except for Year, where smallint would be required

    2) drop all those useless columns.

    3) Create 1 column you really need:

    GeneratedOnHour AS dateadd(hh, DATEPART(hh, TimeGenerated), dateadd(dd, datediff(dd, 0, TimeGenerated), 0) )

    4) Select COUNT(*)  AS NumberOfConections, GeneratedOnHour

    FROM citrixlog

    GROUP BY GeneratedOnHour

    _____________
    Code for TallyGenerator

  • Mark,

    Thanks for posting the table schema...

    Sorry to plague you with questions but trying to do this in a very simple fashion... I see that the TimeGenerated column is a DateTime column... does it contain both a date and a time or just the time?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Plague away 😉

    it contains both date and time, there is a script that churns away every 15 minutes and updates it with data from multiple servers. The questions we are trying to answer are things like what is the avg no of sessions per user, avg / max sessions / users per server, avg / max sessions/ users per application so i tried to gather as much data as possible (clientname is actually probably irrelevant but might assist us in troubleshooting)

    I am home now but I can post some sample data monday if that would help.

  • Help for what?

    CREATE TABLE [citrixlog] (

    [TimeGenerated] [datetime] NULL ,

    GeneratedOnHour AS (dateadd(hh, DATEPART(hh, TimeGenerated), dateadd(dd, datediff(dd, 0, TimeGenerated), 0) ) ) ,

    [SessionID] [int] NULL ,

    [ClientName] [varchar] (255) COLLATE Finnish_Swedish_CI_AS NULL ,

    [AppName] [varchar] (255) COLLATE Finnish_Swedish_CI_AS NULL ,

    [UserName] [varchar] (255) COLLATE Finnish_Swedish_CI_AS NULL

    ) ON [PRIMARY]

    GO

    Select COUNT(*) AS NumberOfConections, GeneratedOnHour

    FROM citrixlog

    GROUP BY GeneratedOnHour

    _____________
    Code for TallyGenerator

  • Thanks Mark,

    I finally have enough information to help... with great zeal, I may add...

    First, as Serqiy suggested, you don't need all those "generated" columns.  He suggested keeping one of them for "Hour" but I'm here to tell you, you don't need any of them.  Why?  Because you have that wonderful little "TimeGenerated" column which you said contains the date AND the time. 

    Second, and this is just some advise, you just assumed that you were doing something right with all of those generated columns and left out the fact that you actually had a full datetime column in your first couple of posts.  When you ask for help, you might not want to assume that you're doing anything right at all (not a slam, just a suggestion) and make sure you post the full schema for the table.  That way, folks like me can come up with alternate suggestions that just might surprise the dickens out of you.

    Third, "We don't need no stinkin' test data"   We make our own But, thanks for offering

    So, on with the show.  Whether you decide to eliminate the generated columns as suggested or not, it won't make a bit of difference in the following.

    Why don't I need test data?  Here's why... read the comments please...

    --===== Create a test table that we can play with

     CREATE TABLE dbo.JBMTestCitrixLog

            (

            TimeGenerated DATETIME NULL,

            ServerName    VARCHAR(255) NULL,

            SessionID     INT          NULL,

            ClientName    VARCHAR(255) NULL,

            AppName       VARCHAR(255) NULL,

            UserName      VARCHAR(255) NULL

            )

    --===== Populate the test table with a million rows over 30 days for performance testing

     INSERT INTO dbo.JBMTestCitrixLog

            (

            TimeGenerated,

            ServerName,

            SessionID,

            ClientName,

            AppName,

            UserName

            )

     SELECT TOP 1000000

            TimeGenerated = CAST(RAND(CAST(NEWID() AS BINARY(16)))*30+39000 AS DATETIME),

            ServerName    = 'Some Server - '+CHAR(RAND(CAST(NEWID() AS BINARY(16)))*26+65),

            SessionID     = CAST(RAND(CAST(NEWID() AS BINARY(16)))*5000+1 AS INT),

            ClientName    = 'Some Client - '+CHAR(RAND(CAST(NEWID() AS BINARY(16)))*26+65)

                                            +CHAR(RAND(CAST(NEWID() AS BINARY(16)))*26+65)

                                            +CHAR(RAND(CAST(NEWID() AS BINARY(16)))*26+65),

            ClientName    = 'Some App - '   +CHAR(RAND(CAST(NEWID() AS BINARY(16)))*26+65)

                                            +CHAR(RAND(CAST(NEWID() AS BINARY(16)))*26+65)

                                            +CHAR(RAND(CAST(NEWID() AS BINARY(16)))*26+65),

            ClientName    = 'Some User - '  +CHAR(RAND(CAST(NEWID() AS BINARY(16)))*26+65)

                                            +CHAR(RAND(CAST(NEWID() AS BINARY(16)))*26+65)

       FROM Master.dbo.SysColumns sc1,

            Master.dbo.SysColumns sc2

    As a side bar, you could probably learn a trick or two from the code above on how to randomly generate large volumes of test data in a hurry... the above code builds the million row table across 30 days in about 56 seconds.

    Ok, in the following, I've tried adding primary keys and indexes... they don't really make the code run that much faster so I don't think you need to worry about such things in this case.  If you have a Primary Key or Indexes, they won't hurt, either (sidebar: Every table should have a Primary Key even if its a very artificial IDENTITY column).

    To give you exactly what you asked for in your first post (your output example), you can run the following simple code and your boss will probably be happy... do note that it's sorted by date and hour across multiple days and that you could add some date criteria to limit the return to a given day, week, or month quite easily...

     SELECT SessionDate = CONVERT(CHAR(11),d.SessionDate,100),

            Hours       = REPLACE(STR(d.HourSlot  ,2),' ','0') + ':00 - '

                        + REPLACE(STR(d.HourSlot+1,2),' ','0') + ':00',

            d.ServerName,

            Sessions    = COUNT(*)

       FROM (--Derived table "d" isolates and preformats the data we want

             SELECT DATEADD(dd,DATEDIFF(dd,0,TimeGenerated),0) AS SessionDate,

                    DATEPART(hh,TimeGenerated) AS HourSlot,

                    ServerName

               FROM dbo.JBMTestCitrixLog WITH (NOLOCK)

            ) d

      GROUP BY d.SessionDate, d.HourSlot, d.ServerName

      ORDER BY d.SessionDate, d.HourSlot, d.ServerName

    Without any date criteria, it returns the summary by date and hour of the day... but the damned thing still returns over 18,000 rows just for the 30 days... a little tough to analyze even if you were to dump the output into yet another table... it does, however, do it all in only about 14 seconds.  Guess we're done here, huh? Not even...

    With a little more code (including a bit of seemingly repetitious code), we can actually make a pretty useful report AND it includes totals by hour of each day, a daily/hourly average, and total for the day FOR EACH SERVER!!  AND, it includes overall daily totals for each day and a grand total, all by hour!!!

    ... here's the code... takes about 26 seconds on all one-million rows over the 30 days...

     SELECT SessionDate = CASE

                            WHEN GROUPING(d.TheDate) = 0

                            THEN CONVERT(CHAR(11),d.TheDate,100)

                            ELSE CHAR(254)+'Grand Total'

                          END,

            ServerName  = CASE

                            WHEN GROUPING(d.ServerName) =0

                            AND GROUPING(d.TheDate)=0

                            THEN d.ServerName

                            WHEN GROUPING(d.TheDate)=1

                            THEN ''

                            ELSE CHAR(222)+'Daily Total'

                          END,

            '12:00 AM'  = STR(SUM(CASE WHEN d.HourSlot = 00 THEN 1 ELSE 0 END),8),

            '01:00 AM'  = STR(SUM(CASE WHEN d.HourSlot = 01 THEN 1 ELSE 0 END),8),

            '02:00 AM'  = STR(SUM(CASE WHEN d.HourSlot = 02 THEN 1 ELSE 0 END),8),

            '03:00 AM'  = STR(SUM(CASE WHEN d.HourSlot = 03 THEN 1 ELSE 0 END),8),

            '04:00 AM'  = STR(SUM(CASE WHEN d.HourSlot = 04 THEN 1 ELSE 0 END),8),

            '05:00 AM'  = STR(SUM(CASE WHEN d.HourSlot = 05 THEN 1 ELSE 0 END),8),

            '06:00 AM'  = STR(SUM(CASE WHEN d.HourSlot = 06 THEN 1 ELSE 0 END),8),

            '07:00 AM'  = STR(SUM(CASE WHEN d.HourSlot = 07 THEN 1 ELSE 0 END),8),

            '08:00 AM'  = STR(SUM(CASE WHEN d.HourSlot = 08 THEN 1 ELSE 0 END),8),

            '09:00 AM'  = STR(SUM(CASE WHEN d.HourSlot = 09 THEN 1 ELSE 0 END),8),

            '10:00 AM'  = STR(SUM(CASE WHEN d.HourSlot = 10 THEN 1 ELSE 0 END),8),

            '11:00 AM'  = STR(SUM(CASE WHEN d.HourSlot = 11 THEN 1 ELSE 0 END),8),

            '12:00 PM'  = STR(SUM(CASE WHEN d.HourSlot = 12 THEN 1 ELSE 0 END),8),

            '01:00 PM'  = STR(SUM(CASE WHEN d.HourSlot = 13 THEN 1 ELSE 0 END),8),

            '02:00 PM'  = STR(SUM(CASE WHEN d.HourSlot = 14 THEN 1 ELSE 0 END),8),

            '03:00 PM'  = STR(SUM(CASE WHEN d.HourSlot = 15 THEN 1 ELSE 0 END),8),

            '04:00 PM'  = STR(SUM(CASE WHEN d.HourSlot = 16 THEN 1 ELSE 0 END),8),

            '05:00 PM'  = STR(SUM(CASE WHEN d.HourSlot = 17 THEN 1 ELSE 0 END),8),

            '06:00 PM'  = STR(SUM(CASE WHEN d.HourSlot = 18 THEN 1 ELSE 0 END),8),

            '07:00 PM'  = STR(SUM(CASE WHEN d.HourSlot = 19 THEN 1 ELSE 0 END),8),

            '08:00 PM'  = STR(SUM(CASE WHEN d.HourSlot = 20 THEN 1 ELSE 0 END),8),

            '09:00 PM'  = STR(SUM(CASE WHEN d.HourSlot = 21 THEN 1 ELSE 0 END),8),

            '10:00 PM'  = STR(SUM(CASE WHEN d.HourSlot = 22 THEN 1 ELSE 0 END),8),

            '11:00 PM'  = STR(SUM(CASE WHEN d.HourSlot = 23 THEN 1 ELSE 0 END),8),

            Total       = STR(COUNT(*),8),

            [Avg/Hr]    = CASE

                            WHEN GROUPING(d.TheDate) = 0

                            THEN STR(COUNT(*)/24,8)

                            ELSE 'N/A'

                          END

       FROM (--Derived table "d" isolates and preformats the data we want

             SELECT DATEADD(dd,DATEDIFF(dd,0,TimeGenerated),0) AS TheDate,

                    DATEPART(hh,TimeGenerated) AS HourSlot,

                    ServerName

               FROM dbo.JBMTestCitrixLog WITH (NOLOCK)

            ) d

      GROUP BY d.TheDate, d.ServerName WITH ROLLUP

      ORDER BY SessionDate, ServerName

    Again, you don't have to return everything... you can add date criteria (preferably to the derived table code to drastically reduce the number of records processed) to create a report for any given day (yesterday is always a favorite), the current week, the previous week, or the whole month.  With a bit of imagination on your part, you could modify the code above to generate all sorts of reports for number of users per hour, etc, etc.

    Now, I have a bit of warning for you... you will undoubtably run into some well intentioned folks that will be rather adament about not using SQL Server for a reporting tool.  They'll say something incredibly sincere yet somewhat ill-informed like "You should do that in the 'Application'".  Creating a report is nothing more than simple data manipulation and SQL Server is very, very good at that.  More folks should believe that SQL Server IS the application!

    That's when they'll play their trump card... "Oh yeah?  Well what if you need it as a web page or just want to make it pretty?  You'll need an application for that!"

    Not true... take a look at sp_MakeWebTask in Books Online (from Query Analyzer, click on [Help][Transact-SQL Help] and then follow your nose)  You can quickly make a simple HTML file using the above code.  Study a little further and you'll find that you can make a "template" where you can adjust line sizes, cell spacing, add titles/subtitles/dates, etc, etc.

    And it's all done without an "application"

    If you have any more questions concerning your reporting problem using the CitrixLog table, please don't hesitate to ask.  "We're all in this together and I'm pullin' for ya" (Red Green).

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff, are you sure 26 seconds is good enough?

    Especially comparing to under 1 second from computed column?

    Not everyone is that lucky to have 26 seconds to wait.

    I have requirement to supply Account Summary report from 4 million rows in 5 seconds. I mean not just return from database, but display it on web page.

    So, your approach definetely would not work for me.

    _____________
    Code for TallyGenerator

  • Serqiy, my old friend... I'm all ears... there's code in the post to gen a million rows... can you show us how to get a similar report as the cross tab I posted in 5 seconds?  And no, I'm not trying to be a smart aleck... I think there's something to learn from you, if you don't mind... Thank you, Sir!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply