Stats histogram comparison with data in table (Windows function)

Question

Post reply

Stats histogram comparison with data in table (Windows function)

SQLServerCodeTweak

Valued Member

Points: 63

June 23, 2022 at 12:12 pm

#4053812

Hi,

I'm comparing the stats histogram with the actual data distribution from a table (for this example Person.LastName on AdventureWorks) like so:

;with e as (
select lastname, count(*) norows, ROW_NUMBER() OVER (order by (select 1)) rn
from person.person
group by lastname
having count(*) >=1
)
select    e.lastname
, e.norows
, e.rn
, h.step_number step
, h.range_rows, h.equal_rows, h.distinct_range_rows, h.average_range_rows
from e
left join sys.dm_db_stats_histogram (274100017, 2) h
on e.LastName = h.range_high_key
ORDER BY rn

which will give me something like so:

histogram

If the stats are up-to-date the range_rows will equal the sum of rows between each step.

What I'd like to do is tally the number of rows between each step in order to check they agree.

To use a windows function I'll need to update the step where it is NULL in order to partition by it.

Is there a nice way to do this. I'm thinking I'll have to put this resultset into a temp table first and update the step.

This will then allow me to use windows function partitioning on step.

By the way I'm finding the the histogram like so:

USE AdventureWorks2019

   -- find the stats histogram for the on Person.Person.LastName
    select s.name, c.Name
,'select * from sys.dm_db_stats_histogram (' + cast(s.object_id as varchar(30)) + ', '+cast(s.stats_id as varchar(10)) +'); '  ShowHistogram
from sys.stats s
left join sys.tables t
on s.object_id = t.object_id
left join  sys.stats_columns sc
on s.object_id = sc.object_id
and s.stats_id = sc.stats_id
join sys.columns c 
on  sc.object_id = c.object_id
and sc.column_id = c.column_id
where t.name = 'person'
AND c.name = 'LastName'

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply

SQLServerCodeTweak Valued Member Points: 63 More actions · Answer 1

So going down the temp table route:

DROP TABLE IF EXISTS #tmptmp;

WITH e
AS (SELECT LastName,
           COUNT(*) norows,
           ROW_NUMBER() OVER (ORDER BY (SELECT 1)) rn
    FROM Person.Person
    GROUP BY LastName
    HAVING COUNT(*) >= 1)
SELECT e.LastName,
       e.norows,
       e.rn,
       h.step_number step,
       h.range_rows,
       h.equal_rows,
       h.distinct_range_rows,
       h.average_range_rows
INTO #tmptmp
FROM e
    LEFT JOIN sys.dm_db_stats_histogram(274100017, 2) h
        ON e.LastName = h.range_high_key
ORDER BY rn DESC;

gives me this:

histopre

and the only way I can think to update the step column is like so (there must be a better way):

WHILE 1 = 1
BEGIN
    ;WITH stps (rn, step)
     AS (SELECT rn,
                LAG(step) OVER (ORDER BY rn DESC) step
         FROM #tmptmp)
    UPDATE t
    SET step = stps.step
    FROM #tmptmp t
        JOIN stps
            ON t.rn = stps.rn
    WHERE t.step IS NULL
          AND stps.step IS NOT NULL;

    IF @@Rowcount <= 0
        BREAK;
END

which gives me the step column :

histopost

which now allows me to use windows functions like so:

SELECT LastName,
   norows,
       rn,
       step,
   SUM(norows) OVER (PARTITION BY step) - equal_rows CalculatedRangeRows,
   ISNULL((SUM(norows) OVER (PARTITION BY step) - equal_rows) - range_rows,0) discrepancy,
       range_rows,
       equal_rows,
       distinct_range_rows,
       average_range_rows
FROM #tmptmp
ORDER BY rn DESC;

There has to be a better way of doing the update, seriously an update in a loop?.

Is there anyone that can do a better job of this?

Recursive CTE perhaps....

Wilson254 Grasshopper Points: 12 More actions · Answer 2

I am at this point searching for some obvious plan.

MyMileStoneCard

This reply was modified 2 years, 4 months ago by Wilson254.