Searching for date and weekday counts within Nvarchar column

  • Hi all,

    I have an NVarchar column containing assorted data including dates and times, typically in the following format

       '02-25-2017, 04:03 AM'

    I want to generate 2 result sets (or 1 if it's smarter)

    Result Set 1
    'HourOfPosting', 'PercentageOfTotal',
    -------------------------------------------------
    10-11, 4,3% '
    11-12, 5.2%

    The nearest I've gotten is

    SELECT Count (*) As '10-11'

    FROM [Practice].[dbo].[NarratorPostingTimes]

    Where Entries like '%10:% AM'

    But I don't know how to repeat the count for every hour without writing an additional Select for every hour

    Result Set 1

    'DayOfPosting', 'PercentageOfTotal',
    ---------------------------------------------------
    Monday 14.2%
    Tuesday 15.7%

    Here I guess I'd have to cast the result set to a Datetime format, then extract the day of the week??.  But I'm even more at sea here.

    Anyway, sample data from Select Top 6, as you see it's : 

    02-28-2017, 04:03 AM

    TheNatty replied to a thread Piketted: Bottom 50% had 0 increase in 40 years

    nah i did read it.

    see more

    4 replies | 43 view(s)

    TheNatty's Avatar

  • Consumable data?

  • I have to ask, but why are you storing your data like this? I don't think the problem is really the need to try and get data about datetimes from an nvarchar column, but that the system shouldn't be set up like this, period. I would be more concerned about fixing your database structure.

    You also stated that times are "typically" stored in that format. So they could be in different formats? Could you, instead, provide proper sample data which provides all the scenarios we should expect to encounter at least, in a format we can use (Insert statements), please. Without knowing every scenario we might encounter then a proposed answer may well not give you the desired output for all your eventualities.

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Thom A - Saturday, February 25, 2017 11:25 AM

    I have to ask, but why are you storing your data like this? I don't think the problem is really the need to try and get data about datetimes from an nvarchar column, but that the system shouldn't be set up like this, period. I would be more concerned about fixing your database structure.

    You also stated that times are "typically" stored in that format. So they could be in different formats? Could you, instead, provide proper sample data which provides all the scenarios we should expect to encounter at least, in a format we can use (Insert statements), please. Without knowing every scenario we might encounter then a proposed answer may well not give you the desired output for all your eventualities.

    There's no 'instead' about the sample data - what I posted originally, is what exists.  I can give you more of the same, it but it will follow the same pattern, 5 rows of junk data followed by a date and time.  That isn't going to vary. 

    If you're saying I need to extract the datetime data and store it in a datetime column, I can make ANY changes necessary, and welcome suggestions detailing how to proceed.  It's why fora exist.

  • Without actually getting some useable data from you (more than one example), this is guess work. I've therefore assumed that every date time has the format of 'MM-dd-yyyy, hh:mm AM/PM'. You said they typically have this format though, so that implies there are other formats.

    I don't have a solution for you on how to create a proper time column, as as far as I know, your table has 1 column. You've made no reference to an ID field or anything. Does this work for you at all?
    CREATE TABLE #Sample (YourOnlyColumn VARCHAR(MAX));
    GO

    INSERT INTO #Sample
    VALUES
    ('02-28-2017, 11:03 PM'),
    ('TheNatty replied to a thread Piketted: Bottom 50% had 0 increase in 40 years'),
    ('nah i did read it.'),
    ('see more'),
    ('4 replies | 43 view(s)'),
    ('TheNatty''s Avatar');
    GO

    WITH Times AS (
      SELECT YourOnlyColumn AS PostDateTime
      FROM #Sample
      WHERE YourOnlyColumn LIKE '__-__-____, __:__ __')
    , TimeStrings AS(
      SELECT PostDateTime,
        SUBSTRING(PostDateTime, 7,4) +
        LEFT(PostDateTime, 2) +
        SUBSTRING(PostDateTime, 4, 2) + ' ' +   
        CASE WHEN RIGHT(PostDateTime,2) = 'PM' AND SUBSTRING(PostDateTime,13,2) != '12' THEN CAST(SUBSTRING(PostDateTime,13,2) + 12 AS VARCHAR(2))
          WHEN RIGHT(PostDateTime,2) = 'AM' AND SUBSTRING(PostDateTime,13,2) = '12' THEN '00'
          ELSE SUBSTRING(PostDateTime, 13,2)
          END + ':' +
        SUBSTRING(PostDateTime,16,2) + ':00.000' AS DateTimeString
      FROM Times)
    , ConvertTimes AS (
      SELECT CAST(DateTimeString AS datetime) AS DateTimeValue
      FROM TimeStrings)
    SELECT *
    FROM ConvertTimes;

    GO
    DROP TABLE #Sample;
    GO

    This converts the string you provided, laboriously, into a datetime. You then perform whatever you want to do on that value, like DATENAME, DATEPART, and get the MI you want.

    I don't expect the above to be quick on a big dataset.

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • just some thoughts.....


    CREATE TABLE #Sample (YourOnlyColumn VARCHAR(MAX));
    GO

    INSERT INTO #Sample
    VALUES
    ('02-28-2017, 04:03 AM'),
    ('TheNatty replied to a thread Piketted: Bottom 50% had 0 increase in 40 years'),
    ('nah i did read it.'),
    ('see more'),
    ('4 replies | 43 view(s)'),
    ('TheNatty''s Avatar');
    GO

    SELECT   TRY_CONVERT(TIME, REPLACE(YourOnlyColumn, ',', '')) AS aTIME
    FROM    #Sample
    WHERE   (TRY_CONVERT(TIME, REPLACE(YourOnlyColumn, ',', '')) IS NOT NULL)

    DROP TABLE #Sample;

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • If it were only so simple for me, J, but those American style dates don't work so well on my British PC;)

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Thom A - Saturday, February 25, 2017 2:51 PM

    If it were only so simple for me, J, but those American style dates don't work so well on my British PC;)

    what errors you getting... I am Cornish...(part British unfortunately😛😛😛)

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • J Livingston SQL - Saturday, February 25, 2017 3:00 PM

    Thom A - Saturday, February 25, 2017 2:51 PM

    If it were only so simple for me, J, but those American style dates don't work so well on my British PC;)

    what errors you getting... I am Cornish...(part British unfortunately😛😛😛)

    Not so much as error, as no results in the given example by the OP. I have my SQL set to British English, so a format of mm-dd-yyyy will give odd results, or fail when you have a day value of >= 13. As the OP's data has 02-28, it reads the date part as 2nd day of the 28th month, which, we both know isn't going to convert 🙂

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • JaybeeSQL - Saturday, February 25, 2017 1:56 PM

    Thom A - Saturday, February 25, 2017 11:25 AM

    There's no 'instead' about the sample data - what I posted originally, is what exists.  I can give you more of the same, it but it will follow the same pattern, 5 rows of junk data followed by a date and time.  

    Jaybee... are your date/times always in the format you posted ? ie..  NOTE the comma between "date" and "time"

    ('02-28-2017, 11:03 PM')

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • Just a bit of an exercise to show what can be done, and some of the possible issues if the date format isn't always the same

    declare @tbl table
    (txtstr varchar(50)
    )
    insert into @tbl
     select ' 02-28-2017, 04:03 PM asb' as str
      -- 12345678901234567890
     union
     select ' 12-28-2017, 04:03 PM asb' as str
      -- 12345678901234567890
     union
     select '28-12-2017, 04:03 PM asb' as str
     -- -- 12345678901234567890
     union
     select ' 2-28-2017, 04:03 PM asb' as str
     -- -- 12345678901234567890
     union
     select ' 02/1/2017, 4:03 PM asb' as str
     -- -- 12345678901234567890
    select txtstr
    , dates.*
    , parts.*
    , pats.*
    from @tbl
    --where str like '%-%-%,%:%[aApP][mM]%'
    --outer apply (select replace(replace(txtstr, '-', '/'), ',' , ' ') as newtxtstr) newstr
    outer apply (select 'patindexes' as names
      , patindex('% [aApP][mM]%', txtstr) + 1 as ampmindex
      , patindex('%,%:%[aApP][mM]%', txtstr) + 2 as timeindex
      , patindex('%[0-9][0-9][-/]%[0-9][-/][0-9][0-9]%,%:%[aApP][mM]%', txtstr) as dateindex1 -- date with leading zero
      , patindex('%[0-9][-/]%[0-9_][-/][0-9][0-9]%,%:%[aApP][mM]%', txtstr) as dateindex2 -- date without leading zero

      ) as pats
    outer apply (select 'parts' as names
      ,rtrim(replace(replace(substring(txtstr, case when dateindex1 = 0 then dateindex2 else dateindex1 end, pats.timeindex - 2), '-', '/'), ',' , ' ')) as fulldate
      ,substring(txtstr, pats.timeindex, pats.ampmindex - pats.timeindex + 2) as fulltime

      ) as parts
    outer apply (select try_convert(date, parts.fulldate, 101) as month_day_year_date
      , try_convert(date, parts.fulldate, 103) as day_month_year_date
      , convert(time(0), parts.fulltime) as time
      ) as dates

    output

    txtstr       month_day_year_date day_month_year_date time  names fulldate  fulltime names   ampmindex timeindex dateindex1 dateindex2
    02/1/2017, 4:03 PM asb  2017-02-01    2017-01-02    16:03:00 parts 02/1/2017 4:03 PM patindexes 18   13   2    3
    02-28-2017, 04:03 PM asb 2017-02-28    NULL      16:03:00 parts 02/28/2017 04:03 PM patindexes 20   14   2    3
    12-28-2017, 04:03 PM asb 2017-12-28    NULL      16:03:00 parts 12/28/2017 04:03 PM patindexes 20   14   2    3
    2-28-2017, 04:03 PM asb 2017-02-28    NULL      16:03:00 parts 2/28/2017 04:03 PM patindexes 19   13   0    2
    28-12-2017, 04:03 PM asb NULL      2017-12-28    16:03:00 parts 28/12/2017 04:03 PM patindexes 19   13   1    2

    The issue with the above is that depending on the date format we either do not have a valid date on the try_format, or we get 2 different dates for the same input

    As for your particular needs the following would be the basis for your output

    SELECT hourpart.header
    , Count (*)

    FROM @tbl
    outer apply (select patindex('% [aApP][mM]%', txtstr) + 1 as ampmindex
      , patindex('%,%:%[aApP][mM]%', txtstr) + 2 as timeindex
      ) as pats
    outer apply (select datepart(hour, convert(time(0), substring(txtstr, pats.timeindex, pats.ampmindex - pats.timeindex + 2))) as timehour
      ) as times
    outer apply (select *
      from (values (0, '0-1 AM') , (1, '1-2 AM') , (2, '2-3 AM') , (3, '3-4 AM') , (4, '4-5 AM') , (5, '5-6 AM')
       , (6, '6-7 AM') , (7, '7-8 AM') , (8, '8-9 AM') , (9, '9-10 AM') , (10, '10-11 AM'), (11, '11-12 AM')
       , (12, '0-1 PM'), (13, '1-2 PM'), (14, '2-3 PM'), (15, '3-4 PM') , (16, '4-5 PM') , (17, '5-6 PM')
       , (18, '6-7 PM'), (19, '7-8 PM'), (20, '8-9 PM'), (21, '9-10 PM'), (22, '10-11 PM'), (23, '11-12 PM')
      ) t(workhour, header)
      where times.timehour = t.workhour
      ) hourpart
    --Where patindex('%[0-9][-/]%[0-9_][-/][0-9][0-9]%,%:%[aApP][mM]%', txtstr) > 0
    where patindex('%,%:%[aApP][mM]%', txtstr) > 0
    group by hourpart.header

    Note the where clauses above - depending on your data you only need one of them, but if there is a possibility that the second patindex picks invalid records, then use the first instead

  • frederico_fonseca - Sunday, February 26, 2017 6:00 AM

    Just a bit of an exercise to show what can be done, and some of the possible issues if the date format isn't always the same

    declare @tbl table
    (txtstr varchar(50)
    )
    insert into @tbl
     select ' 02-28-2017, 04:03 PM asb' as str
      -- 12345678901234567890
     union
     select ' 12-28-2017, 04:03 PM asb' as str
      -- 12345678901234567890
     union
     select '28-12-2017, 04:03 PM asb' as str
     -- -- 12345678901234567890
     union
     select ' 2-28-2017, 04:03 PM asb' as str
     -- -- 12345678901234567890
     union
     select ' 02/1/2017, 4:03 PM asb' as str
     -- -- 12345678901234567890

    where does 'asb' in your string come from

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • just text added to it as to ensure the code I was doing was picking up the correct values only, and not empty spaces afterwards

  • I tend to "over-engineer" a bit when it comes to the quality of data.  Yep... we need to know the hourly stuff but I'd also want to know how many bad rows (not of the expected type).  With that, here's a bit of a demonstration to produce the desired output plus a little bit more...

    First, we need some test data.  I never mess around with a small handful of rows because I also test for performance at the same time.  With that in mind, the following will generate a million rows of data with some good data, some bad data, and some unexpected data that's still "good enough".  As usual, details are in the comments in the code.  It also tests for indexing of the expected form of the table.

    /**********************************************************************************************************************
     Create a test table to simulate what may really happen.
     Nothing in this section is a part of the solution.  We're just building test data.
     Most of the data will follow the form of mm-dd-yyyy, hh:mi AM (or PM).
     Some of it will be a GUID to simulate bad data.
     Some of it will be in the correct form but missing the comma and the space just before the AM/PM indicator.
     This whole thing takes about 18 seconds to generate because 444 byte-length rows have been simulated, as well
    **********************************************************************************************************************/
    --===== If the test table already exists, drop it to make reruns in SSMS easier
         IF OBJECT_ID('tempdb..#TestTable','U') IS NOT NULL
       DROP TABLE #TestTable
    ;
    GO
    --===== Identify the range of dates we want to use for our test
    DECLARE  @StartDate     DATETIME = '2010'       --Inclusive, same as 2010-01-01
            ,@EndDate       DATETIME = GETDATE()    --Exclusive
            ,@NumberOfRows  INT      = 1000000
            ,@BadRows       INT      = 1000
    ;
    --===== Create a test table with simulated "other" columns
     CREATE TABLE #TestTable
            (
             RowNum INT IDENTITY(1,1) PRIMARY KEY CLUSTERED
            ,SimColA  NCHAR(100) DEFAULT 'SimColA' --NCHAR(100) used to simply occupy the space of multiple columns
            ,SomeDate NVARCHAR(40)                 --This is your column of dates and times
            ,SimColb  NCHAR(100) DEFAULT 'SimColB' --NCHAR(100) used to simply occupy the space of multiple columns
            )
    ;
    --===== Populate the test table with random-constrained dates and times in the format that claim to be.
         -- This also uses minimally logging to save on time and disk space.
         -- The dates are in random order just to simulate worst case.
       WITH
    cteGenDates AS
    (
     SELECT TOP (@NumberOfRows) --Not to worry. Only takes 14 seconds/Million rows even though each rows is 444 characters wide.
            RandomDT = RAND(CHECKSUM(NEWID()))*DATEDIFF(dd,@StartDate,@EndDate)+@StartDate
       FROM      sys.all_columns ac1
      CROSS JOIN sys.all_columns ac2
    )
     INSERT INTO #TestTable WITH (TABLOCK)
            (SomeDate)
     SELECT SomeDate =   CONVERT(NVARCHAR(40),RandomDT,110) --The Date part
                        +', ' --The comma and space
                        +STUFF(REPLACE(RIGHT(CONVERT(NVARCHAR(40),RandomDT,100),7),' ','0'),6,0,' ') --The Time part
       FROM cteGenDates
     OPTION (RECOMPILE)
    ;
    --===== Add a non-Clustered Index to the table to prevent the overhead of having to look through the wide
         -- Clustered Index because this is all going to cause an index scan because of the incorrect datatype.
         -- This takes only about 2 seconds.
     CREATE INDEX By_SomeDate ON #TestTable (SomeDate ASC)
    ;
    --===== "Wound" a bunch of rows with some bad, non-date data so that we can test error handling
       WITH
    cteBadData AS
    (
     SELECT TOP (@BadRows)
             BadDate = CONVERT(NVARCHAR(40),NEWID())
            ,SomeDate
       FROM #TestTable
      ORDER BY NEWID()
    )
     UPDATE cteBadData
        SET SomeDate = BadDate
    ;
    --===== Change a bunch of rows to be nearly correct in form but missing the comma or the space
         -- between the time and the AM/PM indicator.
       WITH
    cteChangeDate AS
    (
     SELECT TOP (@BadRows)
             ChangedDate = STUFF(STUFF(SomeDate,18,1,''),11,1,'')
            ,SomeDate
       FROM #TestTable
      ORDER BY NEWID()
    )
     UPDATE cteChangeDate
        SET SomeDate = ChangedDate
    ;
    GO

    When we're writing the code to do math and unusual formatting, it's important to remember to do the same thing in code that we've been taught for application/system design;  Keep the data layer and the presentation layer separate.  This not only simplifies coding but, since formatting is expensive, the "pre-aggregation" (a term coined by good friend and fellow MVCP, Peter "Peso" Larsson) accomplished prior to any formatting really helps performance because you only need to format a very small handful of rows rather than all the data in the table.

    And, again... I "over-engineer" for safety and with the thought that I don't want something to fail... I want it to succeed AND tell me that there's some bad data involved.  Here's what I'd end up doing if I weren't allowed to fix the table or add a Persisted Computed Column to validate each date.  Again, details are in the comments.


       WITH
    ctePreValidate AS
    (--==== This not only starts the check for "date" data but it also strips out any commas.
     SELECT  SomeDate = REPLACE(SomeDate,',','')
            ,IsADate  = ISDATE(REPLACE(SomeDate,',',''))
       FROM #TestTable
    --WHERE --Note: If you want to filter on a date range, do it here.
    )
    ,
    cteValidate AS
    (--==== This fine-tunes the validation to prevent accidental forms that may evaluate to a date.
         -- For example, '2016' would evaluate to a date as 2016-01-01.
     SELECT  SomeDate
            ,IsADate  = CASE
                        WHEN IsADate = 1
                         AND SomeDate LIKE '[01][0-9]-[0-3][0-9]-[1-2][09][0-9][0-9]%'
                        THEN 1
                        ELSE 0
                        END
       FROM ctePreValidate
    )
    ,
    ctePreAggregate AS
    (--==== This preaggregates the data so that we don't have so much to work with for final formatting.
         -- This limits it all to 26 Rows instead of a million (for example) in about 4 seconds.
         -- Imagine how fast it will run on smaller date ranges.
     SELECT  HourOfPosting = CASE WHEN IsADate = 1 THEN DATEPART(hh,SomeDate) ELSE -1 END
            ,HourCount     = COUNT(*)
       FROM cteValidate
      GROUP BY CASE WHEN IsADate = 1 THEN DATEPART(hh,SomeDate) ELSE -1 END
       WITH ROLLUP
    )

    --===== This does the final formatting and the % calculation on only 26 rows.
     SELECT HourOfPosting = CASE
                            WHEN HourOfPosting >= 0
                            THEN RIGHT(100+HourOfPosting,2)+'-'+RIGHT(100+(HourOfPosting+1)%24,2)
                            WHEN HourOfPosting IS NULL
                            THEN 'Total'
                            ELSE 'BadDate'
                            END
            ,HourCount
            ,PercentageOfTotal = CONVERT(DECIMAL(6,2),HourCount*200.0/SUM(HourCount) OVER ()) --200.0 because Total is included
       FROM ctePreAggregate
      ORDER BY HourOfPosting
    ;


    The final output looks like this... The HourCount, Total, and Count of bad rows gives the "Bean Counters" the nice, warm fuzzies. 😉

    HourOfPosting HourCount   PercentageOfTotal
    ------------- ----------- ---------------------------------------
    00-01         41707       4.17
    01-02         41489       4.15
    02-03         41236       4.12
    03-04         41557       4.16
    04-05         41755       4.18
    05-06         41804       4.18
    06-07         42041       4.20
    07-08         41711       4.17
    08-09         41193       4.12
    09-10         41686       4.17
    10-11         41623       4.16
    11-12         41719       4.17
    12-13         42107       4.21
    13-14         41795       4.18
    14-15         41319       4.13
    15-16         41467       4.15
    16-17         41721       4.17
    17-18         41639       4.16
    18-19         41639       4.16
    19-20         41472       4.15
    20-21         41893       4.19
    21-22         41494       4.15
    22-23         41249       4.12
    23-00         41684       4.17
    BadDate       1000        0.10
    Total         1000000     100.00

    (26 row(s) affected)

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply