Managing large table in SQL Server 2014

  • Hi Guys,

    I have a large table that has around 11 million rows and is growing by about 1 million rows each week. As a result the performance is starting to degrade on the Power BI reports that consume the data. We had no idea the data would get so big in such a short space of time. The performance isn't killing us yet but are hoping to intervene before things get out of control.

    Here is the DDL for the table...


    CREATE TABLE [dbo].[QueryStats](
      [Id] [int] IDENTITY(1,1) NOT NULL,
      [EventClass] [nvarchar](128) NULL,
      [TimeId] [int] NULL,
      [StartTime] [datetime] NULL,
      [DatabaseName] [nvarchar](256) NULL,
      [ApplicationName] [nvarchar](256) NULL,
      [LoginName] [nvarchar](256) NULL,
      [SPID] [int] NULL,
      [TextData] [nvarchar](max) NULL,
      [ServerName] [nvarchar](256) NULL,
      [Duration] [float] NULL,
      [CPU] [int] NULL,
      [Severity] [int] NULL,
      [Success] [int] NULL,
      [Reads] [bigint] NULL,
      [Writes] [bigint] NULL,
      [TraceType] [varchar](20) NULL,
    CONSTRAINT [PK_QueryStats] PRIMARY KEY CLUSTERED
    (
      [Id] ASC
    )

    The table holds trace data that is generated on a report server. The TextData column is preventing me from making my clustered index a columnstore, I am considering moving the TextData column to its own table and converting the clustered index to a columnstore but this may cause performance issues elsewhere (I would have to join on the nvarchar(max) column in order to populate QueryStats with the 'TextDataId' when inserting data) but I can probably live with this.

    I also looked at partitioning but we only have about 5 months data in there just now so partitioning by year would be pointless.

    Apart from the inclusion of a clustered columnstore index in place of the current clustered index, does anyone have any other suggestions or tips for managing larger tables?

  • Jim-S - Monday, March 19, 2018 9:03 AM

    Hi Guys,

    I have a large table that has around 11 million rows and is growing by about 1 million rows each week. As a result the performance is starting to degrade on the Power BI reports that consume the data. We had no idea the data would get so big in such a short space of time. The performance isn't killing us yet but are hoping to intervene before things get out of control.

    Here is the DDL for the table...


    CREATE TABLE [dbo].[QueryStats](
      [Id] [int] IDENTITY(1,1) NOT NULL,
      [EventClass] [nvarchar](128) NULL,
      [TimeId] [int] NULL,
      [StartTime] [datetime] NULL,
      [DatabaseName] [nvarchar](256) NULL,
      [ApplicationName] [nvarchar](256) NULL,
      [LoginName] [nvarchar](256) NULL,
      [SPID] [int] NULL,
      [TextData] [nvarchar](max) NULL,
      [ServerName] [nvarchar](256) NULL,
      [Duration] [float] NULL,
      [CPU] [int] NULL,
      [Severity] [int] NULL,
      [Success] [int] NULL,
      [Reads] [bigint] NULL,
      [Writes] [bigint] NULL,
      [TraceType] [varchar](20) NULL,
    CONSTRAINT [PK_QueryStats] PRIMARY KEY CLUSTERED
    (
      [QueryStatsId] ASC
    )

    The table holds trace data that is generated on a report server. The TextData column is preventing me from making my clustered index a columnstore, I am considering moving the TextData column to its own table and converting the clustered index to a columnstore but this may cause performance issues elsewhere (I would have to join on the nvarchar(max) column in order to populate QueryStats with the 'TextDataId' when inserting data) but I can probably live with this.

    I also looked at partitioning but we only have about 5 months data in there just now so partitioning by year would be pointless.

    Apart from the inclusion of a clustered columnstore index in place of the current clustered index, does anyone have any other suggestions or tips for managing larger tables?

    This works?  Where is the column QueryStatsId?  How is this table queried?

  • Lynn Pettis - Monday, March 19, 2018 9:50 AM

    Jim-S - Monday, March 19, 2018 9:03 AM

    Hi Guys,

    I have a large table that has around 11 million rows and is growing by about 1 million rows each week. As a result the performance is starting to degrade on the Power BI reports that consume the data. We had no idea the data would get so big in such a short space of time. The performance isn't killing us yet but are hoping to intervene before things get out of control.

    Here is the DDL for the table...


    CREATE TABLE [dbo].[QueryStats](
      [Id] [int] IDENTITY(1,1) NOT NULL,
      [EventClass] [nvarchar](128) NULL,
      [TimeId] [int] NULL,
      [StartTime] [datetime] NULL,
      [DatabaseName] [nvarchar](256) NULL,
      [ApplicationName] [nvarchar](256) NULL,
      [LoginName] [nvarchar](256) NULL,
      [SPID] [int] NULL,
      [TextData] [nvarchar](max) NULL,
      [ServerName] [nvarchar](256) NULL,
      [Duration] [float] NULL,
      [CPU] [int] NULL,
      [Severity] [int] NULL,
      [Success] [int] NULL,
      [Reads] [bigint] NULL,
      [Writes] [bigint] NULL,
      [TraceType] [varchar](20) NULL,
    CONSTRAINT [PK_QueryStats] PRIMARY KEY CLUSTERED
    (
      [QueryStatsId] ASC
    )

    The table holds trace data that is generated on a report server. The TextData column is preventing me from making my clustered index a columnstore, I am considering moving the TextData column to its own table and converting the clustered index to a columnstore but this may cause performance issues elsewhere (I would have to join on the nvarchar(max) column in order to populate QueryStats with the 'TextDataId' when inserting data) but I can probably live with this.

    I also looked at partitioning but we only have about 5 months data in there just now so partitioning by year would be pointless.

    Apart from the inclusion of a clustered columnstore index in place of the current clustered index, does anyone have any other suggestions or tips for managing larger tables?

    This works?  Where is the column QueryStatsId?  How is this table queried?

    Apologies, I had a mistake in my DDL. Fixed now.

    The table is queried directly from Power BI using direct query mode.

  • Jim-S - Monday, March 19, 2018 9:55 AM

    Lynn Pettis - Monday, March 19, 2018 9:50 AM

    Jim-S - Monday, March 19, 2018 9:03 AM

    Hi Guys,

    I have a large table that has around 11 million rows and is growing by about 1 million rows each week. As a result the performance is starting to degrade on the Power BI reports that consume the data. We had no idea the data would get so big in such a short space of time. The performance isn't killing us yet but are hoping to intervene before things get out of control.

    Here is the DDL for the table...


    CREATE TABLE [dbo].[QueryStats](
      [Id] [int] IDENTITY(1,1) NOT NULL,
      [EventClass] [nvarchar](128) NULL,
      [TimeId] [int] NULL,
      [StartTime] [datetime] NULL,
      [DatabaseName] [nvarchar](256) NULL,
      [ApplicationName] [nvarchar](256) NULL,
      [LoginName] [nvarchar](256) NULL,
      [SPID] [int] NULL,
      [TextData] [nvarchar](max) NULL,
      [ServerName] [nvarchar](256) NULL,
      [Duration] [float] NULL,
      [CPU] [int] NULL,
      [Severity] [int] NULL,
      [Success] [int] NULL,
      [Reads] [bigint] NULL,
      [Writes] [bigint] NULL,
      [TraceType] [varchar](20) NULL,
    CONSTRAINT [PK_QueryStats] PRIMARY KEY CLUSTERED
    (
      [QueryStatsId] ASC
    )

    The table holds trace data that is generated on a report server. The TextData column is preventing me from making my clustered index a columnstore, I am considering moving the TextData column to its own table and converting the clustered index to a columnstore but this may cause performance issues elsewhere (I would have to join on the nvarchar(max) column in order to populate QueryStats with the 'TextDataId' when inserting data) but I can probably live with this.

    I also looked at partitioning but we only have about 5 months data in there just now so partitioning by year would be pointless.

    Apart from the inclusion of a clustered columnstore index in place of the current clustered index, does anyone have any other suggestions or tips for managing larger tables?

    This works?  Where is the column QueryStatsId?  How is this table queried?

    Apologies, I had a mistake in my DDL. Fixed now.

    The table is queried directly from Power BI using direct query mode.

    Should have been more specific, what are the queries used by PowerBI?  Looking for the access paths being used to query the table.

  • I'm away from the office just now but il run a trace in the morning and confirm exactly what queries are being run.

    what do you mean by access paths?

  • Jim-S - Monday, March 19, 2018 10:46 AM

    I'm away from the office just now but il run a trace in the morning and confirm exactly what queries are being run.

    what do you mean by access paths?

    What columns are being selected, what columns are being filtered, if joining to other tables what are the join criteria and what columns are being pulled from the other tables.

  • Any WHERE conditions in particular.  Typically StartTime is (almost) always compared against log-type tables.  And/or the data is summarized by StartTime periods (day, week, etc.).  If so, cluster the table instead on StartTime first (followed by $IDENTITY, or just use StartTime alone, either way).

    But that's an example of why we need to know how the table is accessed.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • Jim-S - Monday, March 19, 2018 9:03 AM

    Hi Guys,

    I have a large table that has around 11 million rows and is growing by about 1 million rows each week. As a result the performance is starting to degrade on the Power BI reports that consume the data. We had no idea the data would get so big in such a short space of time. The performance isn't killing us yet but are hoping to intervene before things get out of control.

    Here is the DDL for the table...


    CREATE TABLE [dbo].[QueryStats](
      [Id] [int] IDENTITY(1,1) NOT NULL,
      [EventClass] [nvarchar](128) NULL,
      [TimeId] [int] NULL,
      [StartTime] [datetime] NULL,
      [DatabaseName] [nvarchar](256) NULL,
      [ApplicationName] [nvarchar](256) NULL,
      [LoginName] [nvarchar](256) NULL,
      [SPID] [int] NULL,
      [TextData] [nvarchar](max) NULL,
      [ServerName] [nvarchar](256) NULL,
      [Duration] [float] NULL,
      [CPU] [int] NULL,
      [Severity] [int] NULL,
      [Success] [int] NULL,
      [Reads] [bigint] NULL,
      [Writes] [bigint] NULL,
      [TraceType] [varchar](20) NULL,
    CONSTRAINT [PK_QueryStats] PRIMARY KEY CLUSTERED
    (
      [Id] ASC
    )

    The table holds trace data that is generated on a report server. The TextData column is preventing me from making my clustered index a columnstore, I am considering moving the TextData column to its own table and converting the clustered index to a columnstore but this may cause performance issues elsewhere (I would have to join on the nvarchar(max) column in order to populate QueryStats with the 'TextDataId' when inserting data) but I can probably live with this.

    I also looked at partitioning but we only have about 5 months data in there just now so partitioning by year would be pointless.

    Apart from the inclusion of a clustered columnstore index in place of the current clustered index, does anyone have any other suggestions or tips for managing larger tables?

    What is [TimeID]?
    What is it used for?

    Some obvious mistakes in the DDL statement:

    CREATE TABLE [dbo].[QueryStats](
      [Id] [int] IDENTITY(1,1) NOT NULL,
      [EventClass] [nvarchar](128) NULL,
      [TimeId] [int] NULL,
      [StartTime] [datetime] NULL,
      [DatabaseName] [nvarchar](256 128) NULL,
      [ApplicationName] [nvarchar](256 128) NULL,
      [LoginName] [nvarchar](256 128) NULL,
      [SPID] [int] NULL,
      [TextData] [nvarchar](max) NULL,
      [ServerName] [nvarchar](256 128) NULL,
      [Duration] [float bigint] NULL,
      [CPU] [int] NULL,
      [Severity] [int] NULL,
      [Success] [int] NULL,
      [Reads] [bigint] NULL,
      [Writes] [bigint] NULL,
      [TraceType] [varchar](20) NULL,
    CONSTRAINT [PK_QueryStats] PRIMARY KEY NONCLUSTERED ([Id] ASC),
    CONSTRAINT [UC_QueryStats] UNIQUE CLUSTERED ( [StartTime], [Id] ASC)

    I guess indexing on ServerName, DatabaseName, ApplicationName would help speed up certain types of queries.

    _____________
    Code for TallyGenerator

  • If your PowerBI is loading the whole table then following will most likely apply.
    Even if not some of it may still be worthwhile doing.

    PowerBI uses the Vertipaq engine and as such some best practices should be applied.
    see https://www.sqlbi.com/articles/optimizing-high-cardinality-columns-in-vertipaq/ for examples

    When loading the data onto Powerbi split the datetime fields onto date portion and time portion as separate columns.

    The big int fields should also, where possible, be split
    e.g. a number

    123,123,123,122 could be split onto 3

    select cast(Writes / 1000000 as bigint) as transactionhighid
      , cast((Writes / 1000) % 1000 as bigint) as transactionmidid
      , cast(Writes % 1000 as bigint) as transactionlowid
    from (select 123123123122 as Writes
    ) t

    and then reconstructed by DAX as

    Fact[Writes] :=
    IFERROR(
      VALUES( Fact[WritesHighID] ) * 1000000 + VALUES( transactionmidid ) * 1000 + VALUES( Fact[WritesLowID] ),
      BLANK()
    )

    If you need column TextData on PowerBI have a look at it and if possible parseout what you need before loading as this will not compress that well.

  • frederico_fonseca - Monday, March 19, 2018 7:07 PM

    When loading the data onto Powerbi split the datetime fields onto date portion and time portion as separate columns.

    The big int fields should also, where possible, be split
    e.g. a number

    123,123,123,122 could be split onto 3

    Never a good idea.
    If a package assumes it as a good practice the best approach would be not to use it.

    can you write a query to select all queries started during off-hours last night?
    And what about a query selecting all events with writes > certain amount?

    With split columns both queries would require table scans, which means reading the whole table into memory and apply filtering in there.
    Which is exactly the problem with the current state of affairs in OP's database.

    _____________
    Code for TallyGenerator

  • Sergiy - Monday, March 19, 2018 7:55 PM

    frederico_fonseca - Monday, March 19, 2018 7:07 PM

    When loading the data onto Powerbi split the datetime fields onto date portion and time portion as separate columns.

    The big int fields should also, where possible, be split
    e.g. a number

    123,123,123,122 could be split onto 3

    Never a good idea.
    If a package assumes it as a good practice the best approach would be not to use it.

    can you write a query to select all queries started during off-hours last night?
    And what about a query selecting all events with writes > certain amount?

    With split columns both queries would require table scans, which means reading the whole table into memory and apply filtering in there.
    Which is exactly the problem with the current state of affairs in OP's database.

    I think his point here is that PowerBI is going to scan the entire table anyways and it's been the best practice to split the fields up in order to decrease cardinality of a given field. Thus, in a field that is date/time down to the microsecond, the recommendation is to split date into one field, time into another field, then consider removing some of the precision in your time to maybe report to the minute in order to decrease that cardinality. The more uniques the slower the performance.

    Overall, you can try direct query with Power BI and see how it's essentially querying the database. While it will generate SQL queries that maybe can take advantage of the model, indexes and so forth, it's also not very smart. In most cases I've seen, there is still queries being kicked out that pull in certain fields entirely, then filters them later when in PowerBI. In my case, it's not too bad because I only use columnstores, but on row-based systems, not so much.

    Some of the best options to take here is to obviously consider reducing fields you really don't need. Then moving towards a smaller model where you can optimize the table as much as possible in order to optimize for those full scans. Less bytes, the better essentially. After that, consider maybe rolling up the data or creating views that are pre-filtering the data for the PowerBI reports. Similar to above, less records, the better essentially too.

    For example, while PowerBI may still query or import the entire view, the view itself is based on a query that is taking advantage of the indexes on the table. This is commonly not thought of because with SQL Server, if you create a view on a table where an index is on MyField, SELECT MyField FROM MyTable is going to have the same execution plan as SELECT MyField FROM MyTable WHERE MyField = 1. Thus, it's often skipped. But with PowerBI, the filter may not or ever be applied to the view, thus you can do it for PowerBI to help ensure success.

    In some of the examples I work with. I mostly do import once a day on a billion record table with those views, but within a columnstore data warehouse. While the import is not exactly blazing fast, once loaded from those views, which are summarizing the data, it's extremely fast in PowerBI as it's pre-loaded and ready to rock-n-roll.

  • I ran a trace while the report was refreshing and I have attached the five queries that ran against the table.

    I see that people have suggested possibly adding non clustered indexes, I was under the impressions that non clustered indexes on large tables where not a good idea and would be best to use a clustered index (preferably a columnstore) which is why I was considering moving the nvarchar(max) columns to its own table. Was I wrong to think this?

    ** edit - it wouldn't allow me to add the .sql file, here are the queries

    -- query 1
    SELECT
    TOP (1000001) [t1].[Week Ending],
    COUNT_BIG([t0].[EventClass])
    AS [a0]
    FROM
    ((select [Id],
      [EventClass],
      [TimeId],
      [StartTime],
      [DatabaseName],
      [ApplicationName],
      [LoginName],
      [SPID],
      [ServerName],
      [Duration],
      [CPU],
      [Severity],
      [Success],
      [Reads],
      [Writes],
      [TraceType]
    from [dbo].[QueryStats] as [$Table]) AS [t0]

    left outer join

    (select [_].[TimeId] as [TimeId],
      convert(date, [_].[Date]) as [Date],
      [_].[DateName] as [DateName],
      [_].[CalendarYear] as [CalendarYear],
      [_].[CalendarQuarter] as [CalendarQuarter],
      [_].[CalendarQuarterName] as [CalendarQuarterName],
      [_].[CalendarPeriod] as [CalendarPeriod],
      [_].[CalendarPeriodName] as [CalendarPeriodName],
      [_].[CalendarDayOfMonth] as [CalendarDayOfMonth],
      [_].[CalendarYearName] as [CalendarYearName],
      [_].[WeekEnding] as [Week Ending]
    from
    (
      select [TimeId],
       [Date],
       [DateName],
       [CalendarYear],
       [CalendarQuarter],
       [CalendarQuarterName],
       [CalendarPeriod],
       [CalendarPeriodName],
       [CalendarDayOfMonth],
       [CalendarYearName],
       [WeekEnding]
      from [dbo].[Time] as [$Table]
    ) as [_]) AS [t1] on
    (
    [t0].[TimeId] = [t1].[TimeId]
    )
    )

    WHERE
    (
    (
    ([t1].[Week Ending] < CAST( '20180321 00:00:00' AS datetime))
    AND
    ([t1].[Week Ending] >= CAST( '20180124 00:00:00' AS datetime))
    )
    AND
    (
    ([t0].[EventClass] IN (N'RPC:Completed',N'SQL:BatchCompleted',N'Query End'))
    )
    )

    GROUP BY [t1].[Week Ending]

    -- query 2

    SELECT
    TOP (1000001) [t2].[Region],
    (COUNT_BIG(DISTINCT [t0].[LoginName]) + MAX(CASE WHEN [t0].[LoginName] IS NULL THEN 1 ELSE 0 END))
    AS [a0]
    FROM
    (
    ((select [Id],
      [EventClass],
      [TimeId],
      [StartTime],
      [DatabaseName],
      [ApplicationName],
      [LoginName],
      [SPID],
      [ServerName],
      [Duration],
      [CPU],
      [Severity],
      [Success],
      [Reads],
      [Writes],
      [TraceType]
    from [dbo].[QueryStats] as [$Table]) AS [t0]

    left outer join

    (select [_].[TimeId] as [TimeId],
      convert(date, [_].[Date]) as [Date],
      [_].[DateName] as [DateName],
      [_].[CalendarYear] as [CalendarYear],
      [_].[CalendarQuarter] as [CalendarQuarter],
      [_].[CalendarQuarterName] as [CalendarQuarterName],
      [_].[CalendarPeriod] as [CalendarPeriod],
      [_].[CalendarPeriodName] as [CalendarPeriodName],
      [_].[CalendarDayOfMonth] as [CalendarDayOfMonth],
      [_].[CalendarYearName] as [CalendarYearName],
      [_].[WeekEnding] as [Week Ending]
    from
    (
      select [TimeId],
       [Date],
       [DateName],
       [CalendarYear],
       [CalendarQuarter],
       [CalendarQuarterName],
       [CalendarPeriod],
       [CalendarPeriodName],
       [CalendarDayOfMonth],
       [CalendarYearName],
       [WeekEnding]
      from [dbo].[Time] as [$Table]
    ) as [_]) AS [t1] on
    (
    [t0].[TimeId] = [t1].[TimeId]
    )
    )

    left outer join

    (select [$Table].[UserRegionId] as [UserRegionId],
      [$Table].[UserName] as [UserName],
      [$Table].[EmailAddress] as [EmailAddress],
      [$Table].[Region] as [Region]
    from [dbo].[UserRegion] as [$Table]) AS [t2] on
    (
    [t0].[LoginName] = [t2].[UserName]
    )
    )

    WHERE
    (
    (
    ([t1].[Date] < CAST( '20180321 00:00:00' AS datetime))
    AND
    ([t1].[Date] >= CAST( '20180219 00:00:00' AS datetime))
    )
    AND
    (
    NOT(
    (
    CHARINDEX(N'PBIUser\svc',
    COALESCE([t2].[UserName], '')
    )
    = 1)
    )
    AND
    (
    ([t2].[Region] IN ('Unknown','Americas','Asia Pacific','EMEA'))
    )
    )
    )

    GROUP BY [t2].[Region]

    -- query 3

    SELECT
    TOP (1000001) [t0].[TraceType],
    COUNT_BIG([t0].[EventClass])
    AS [a0]
    FROM
    (
    ((select [Id],
      [EventClass],
      [TimeId],
      [StartTime],
      [DatabaseName],
      [ApplicationName],
      [LoginName],
      [SPID],
      [ServerName],
      [Duration],
      [CPU],
      [Severity],
      [Success],
      [Reads],
      [Writes],
      [TraceType]
    from [dbo].[QueryStats] as [$Table]) AS [t0]

    left outer join

    (select [_].[TimeId] as [TimeId],
      convert(date, [_].[Date]) as [Date],
      [_].[DateName] as [DateName],
      [_].[CalendarYear] as [CalendarYear],
      [_].[CalendarQuarter] as [CalendarQuarter],
      [_].[CalendarQuarterName] as [CalendarQuarterName],
      [_].[CalendarPeriod] as [CalendarPeriod],
      [_].[CalendarPeriodName] as [CalendarPeriodName],
      [_].[CalendarDayOfMonth] as [CalendarDayOfMonth],
      [_].[CalendarYearName] as [CalendarYearName],
      [_].[WeekEnding] as [Week Ending]
    from
    (
      select [TimeId],
       [Date],
       [DateName],
       [CalendarYear],
       [CalendarQuarter],
       [CalendarQuarterName],
       [CalendarPeriod],
       [CalendarPeriodName],
       [CalendarDayOfMonth],
       [CalendarYearName],
       [WeekEnding]
      from [dbo].[Time] as [$Table]
    ) as [_]) AS [t1] on
    (
    [t0].[TimeId] = [t1].[TimeId]
    )
    )

    left outer join

    (select [$Table].[UserRegionId] as [UserRegionId],
      [$Table].[UserName] as [UserName],
      [$Table].[EmailAddress] as [EmailAddress],
      [$Table].[Region] as [Region]
    from [dbo].[UserRegion] as [$Table]) AS [t2] on
    (
    [t0].[LoginName] = [t2].[UserName]
    )
    )

    WHERE
    (
    (
    (
    (
    (
    (
    (
    (COALESCE([t2].[Region], '') = 'Americas')
    OR
    (COALESCE([t2].[Region], '') = 'Asia Pacific')
    )
    OR
    (COALESCE([t2].[Region], '') = 'EMEA')
    )
    OR
    (COALESCE([t2].[Region], '') = 'Unknown')
    )
    OR
    (
    [t2].[Region] IS NULL
    )
    )
    AND
    NOT(
    (
    CHARINDEX(N'PBIUser\svc',
    COALESCE([t2].[UserName], '')
    )
    = 1)
    )
    )
    AND
    ([t1].[Date] < CAST( '20180321 00:00:00' AS datetime))
    )
    AND
    (
    ([t1].[Date] >= CAST( '20180219 00:00:00' AS datetime))
    AND
    (
    ([t0].[EventClass] IN (N'RPC:Completed',N'SQL:BatchCompleted',N'Query End'))
    )
    )
    )

    GROUP BY [t0].[TraceType]

    -- query 4
    SELECT
    TOP (1000001) [t0].[TraceType],[t1].[MonthYear],[t1].[MonthYearSort],
    (COUNT_BIG(DISTINCT [t0].[LoginName]) + MAX(CASE WHEN [t0].[LoginName] IS NULL THEN 1 ELSE 0 END))
    AS [a0]
    FROM
    ((select [Id],
      [EventClass],
      [TimeId],
      [StartTime],
      [DatabaseName],
      [ApplicationName],
      [LoginName],
      [SPID],
      [ServerName],
      [Duration],
      [CPU],
      [Severity],
      [Success],
      [Reads],
      [Writes],
      [TraceType]
    from [dbo].[QueryStats] as [$Table]) AS [t0]

    left outer join

    (

    SELECT [t1].[TimeId] AS [TimeId],(
    COALESCE([t1].[CalendarPeriodName], '')
    + (N'-' +
    COALESCE(
    CAST([t1].[CalendarYear] AS VARCHAR(4000))
    , '')
    )) AS [MonthYear],
    CAST(
    (SELECT COALESCE(Op1 + Op2, Op1, Op2) FROM (SELECT
    ([t1].[CalendarYear] * 100)
    AS Op1, [t1].[CalendarPeriod] AS Op2) AS AuxTable)
    AS VARCHAR(4000))
    AS [MonthYearSort]
    FROM
    (
    (select [_].[TimeId] as [TimeId],
      convert(date, [_].[Date]) as [Date],
      [_].[DateName] as [DateName],
      [_].[CalendarYear] as [CalendarYear],
      [_].[CalendarQuarter] as [CalendarQuarter],
      [_].[CalendarQuarterName] as [CalendarQuarterName],
      [_].[CalendarPeriod] as [CalendarPeriod],
      [_].[CalendarPeriodName] as [CalendarPeriodName],
      [_].[CalendarDayOfMonth] as [CalendarDayOfMonth],
      [_].[CalendarYearName] as [CalendarYearName],
      [_].[WeekEnding] as [Week Ending]
    from
    (
      select [TimeId],
       [Date],
       [DateName],
       [CalendarYear],
       [CalendarQuarter],
       [CalendarQuarterName],
       [CalendarPeriod],
       [CalendarPeriodName],
       [CalendarDayOfMonth],
       [CalendarYearName],
       [WeekEnding]
      from [dbo].[Time] as [$Table]
    ) as [_])
    )
    AS [t1]
    )
    AS [t1] on
    (
    [t0].[TimeId] = [t1].[TimeId]
    )
    )

    GROUP BY [t0].[TraceType],[t1].[MonthYear],[t1].[MonthYearSort]

    -- query 5
        
    SELECT
    TOP (1000001) [t1].[MonthYear],[t1].[MonthYearSort],
    (COUNT_BIG(DISTINCT [t0].[LoginName]) + MAX(CASE WHEN [t0].[LoginName] IS NULL THEN 1 ELSE 0 END))
    AS [a0]
    FROM
    ((select [Id],
      [EventClass],
      [TimeId],
      [StartTime],
      [DatabaseName],
      [ApplicationName],
      [LoginName],
      [SPID],
      [ServerName],
      [Duration],
      [CPU],
      [Severity],
      [Success],
      [Reads],
      [Writes],
      [TraceType]
    from [dbo].[QueryStats] as [$Table]) AS [t0]

    left outer join

    (

    SELECT [t1].[TimeId] AS [TimeId],(
    COALESCE([t1].[CalendarPeriodName], '')
    + (N'-' +
    COALESCE(
    CAST([t1].[CalendarYear] AS VARCHAR(4000))
    , '')
    )) AS [MonthYear],
    CAST(
    (SELECT COALESCE(Op1 + Op2, Op1, Op2) FROM (SELECT
    ([t1].[CalendarYear] * 100)
    AS Op1, [t1].[CalendarPeriod] AS Op2) AS AuxTable)
    AS VARCHAR(4000))
    AS [MonthYearSort]
    FROM
    (
    (select [_].[TimeId] as [TimeId],
      convert(date, [_].[Date]) as [Date],
      [_].[DateName] as [DateName],
      [_].[CalendarYear] as [CalendarYear],
      [_].[CalendarQuarter] as [CalendarQuarter],
      [_].[CalendarQuarterName] as [CalendarQuarterName],
      [_].[CalendarPeriod] as [CalendarPeriod],
      [_].[CalendarPeriodName] as [CalendarPeriodName],
      [_].[CalendarDayOfMonth] as [CalendarDayOfMonth],
      [_].[CalendarYearName] as [CalendarYearName],
      [_].[WeekEnding] as [Week Ending]
    from
    (
      select [TimeId],
       [Date],
       [DateName],
       [CalendarYear],
       [CalendarQuarter],
       [CalendarQuarterName],
       [CalendarPeriod],
       [CalendarPeriodName],
       [CalendarDayOfMonth],
       [CalendarYearName],
       [WeekEnding]
      from [dbo].[Time] as [$Table]
    ) as [_])
    )
    AS [t1]
    )
    AS [t1] on
    (
    [t0].[TimeId] = [t1].[TimeId]
    )
    )

    GROUP BY [t1].[MonthYear],[t1].[MonthYearSort]

  • Before starting on this queries I have to ask:

    Where do they come from?

    What/who wrote/generated these queries?

    Is there a possibility to change the way they built?

    _____________
    Code for TallyGenerator

  • Sergiy - Tuesday, March 20, 2018 3:54 AM

    Before starting on this queries I have to ask:Where do they come from?What/who wrote/generated these queries?Is there a possibility to change the way they built?

    They are generated by Power BI based on my report criteria such as filters etc. I am not aware of any way to rewrite them.

  • Well, I stand by my suggestion - the best long time solution to the performance problems you have is to move away from using PowerBI.
    Because query tools which build queries like these can be described with only one word: "Rubbish".

    Query 1.

    a) Long list of columns selected for [t1], especially

    convert(date, [_].[Date]) as [Date],

    does not make any sense.
    Only columns which are actually used are [Week Ending] and [TimeId]
    b) Left Outer Join is overturned by WHERE clause and made into effectively INNER JOIN, but not as effective as actual INNER JOIN.
    c)  TimeID does not have a logical correspondence to the clustered index on table [t0]. Regardless which column is used for clustered index - [Id], as now, or StartTime], as suggested.
    Therefore JOIN by that filed will cause a clustered index scan (aka table scan). Indexing this field would not help much, because the optimiser could not predict the range of TimeID values coming out of [t1], so it would have to choose
    between {index scan + bookmark lokup} and {CI scan}.CI scan usually wins.
    d) using TOP (n) without ORDER BY does not make much sense, as it would return randomly selected rows.
    It looks like the only purpose of it is to prevent memory overload when the reurned recordset it too big. But that may be, and must be, achiived in a different way. At least, a user must get an indication that there are rows which were omitted from the returned recordset.

    Assuming, you have already  changed the clustered index on [dbo].[QueryStats] that's how the query should look like:

    SELECT
    [t1].[Week Ending],
    COUNT_BIG([t0].[EventClass]) AS [a0]
    FROM [dbo].[QueryStats] AS [t0]
    INNER JOIN (
    select MIN([Date]) as [Fromdate],
    DATEADD(dd, DATEDIFF(dd, 0, MAX([Date]))+1, 0) as [ToDate],
    [WeekEnding]
    from [dbo].[Time]
    WHERE [Week Ending] < CAST( '20180321 00:00:00' AS datetime)
    AND [t1].[Week Ending] >= CAST( '20180124 00:00:00' AS datetime)
    GROUP BY [WeekEnding]
    ) [t1] on [t0].[StartTime] >= [t1].[Fromdate] AND [t0].[StartTime] < [t1].[ToDate]
    WHERE
    ([t0].[EventClass] IN (N'RPC:Completed',N'SQL:BatchCompleted',N'Query End'))
    GROUP BY [t1].[Week Ending]

    But since you cannot rewrite the query, you may fool the instrument amd make it do what you need to do.
    For that you're gonna need to figure out the full range of dates covered by the selected Week Endings, and add a filter on StartTime >= FirstDateFromTheRange and StartTime < DATEADD(dd, 1, LastdateFromTheRange).
    With a clustered index on StartTime you should not have any performance issues even on much bigger data sets.

    Query 2.
    Pretty much the same issue.
    But because the filter is against [t1].[date] instead of [t1].WeekEnding] the fix is even easier.
    Just add a filter on [StartTime] within the same range as [Date].

    Query 3
    same as Query 2.

    Query 4.
    It seems like it returns the aggregations against the whole history, as far as it goes into the past.
    It's meant to scan the whole data set by definition, so there is not much to be improved.
    I can only suggest to apply a filter on StartTime limiting the period covered by the report.
    To me, it's pretty stupid to recalculate aggregations based on read-only datasets for previous months, which have been already calculated many times earlier, but the tool is as dummy as it is.

    Query 5
    Is the same as Query 4, just 1 less item in grouping.

    _____________
    Code for TallyGenerator

Viewing 15 posts - 1 through 15 (of 25 total)

You must be logged in to reply to this topic. Login to reply