June 23, 2022 at 12:12 pm
Hi,
I'm comparing the stats histogram with the actual data distribution from a table (for this example Person.LastName on AdventureWorks) like so:
;with e as (
select lastname, count(*) norows, ROW_NUMBER() OVER (order by (select 1)) rn
from person.person
group by lastname
having count(*) >=1
)
select e.lastname
, e.norows
, e.rn
, h.step_number step
, h.range_rows, h.equal_rows, h.distinct_range_rows, h.average_range_rows
from e
left join sys.dm_db_stats_histogram (274100017, 2) h
on e.LastName = h.range_high_key
ORDER BY rn
which will give me something like so:
If the stats are up-to-date the range_rows will equal the sum of rows between each step.
What I'd like to do is tally the number of rows between each step in order to check they agree.
To use a windows function I'll need to update the step where it is NULL in order to partition by it.
Is there a nice way to do this. I'm thinking I'll have to put this resultset into a temp table first and update the step.
This will then allow me to use windows function partitioning on step.
By the way I'm finding the the histogram like so:
USE AdventureWorks2019
-- find the stats histogram for the on Person.Person.LastName
select s.name, c.Name
,'select * from sys.dm_db_stats_histogram (' + cast(s.object_id as varchar(30)) + ', '+cast(s.stats_id as varchar(10)) +'); ' ShowHistogram
from sys.stats s
left join sys.tables t
on s.object_id = t.object_id
left join sys.stats_columns sc
on s.object_id = sc.object_id
and s.stats_id = sc.stats_id
join sys.columns c
on sc.object_id = c.object_id
and sc.column_id = c.column_id
where t.name = 'person'
AND c.name = 'LastName'
June 24, 2022 at 9:17 am
So going down the temp table route:
DROP TABLE IF EXISTS #tmptmp;
WITH e
AS (SELECT LastName,
COUNT(*) norows,
ROW_NUMBER() OVER (ORDER BY (SELECT 1)) rn
FROM Person.Person
GROUP BY LastName
HAVING COUNT(*) >= 1)
SELECT e.LastName,
e.norows,
e.rn,
h.step_number step,
h.range_rows,
h.equal_rows,
h.distinct_range_rows,
h.average_range_rows
INTO #tmptmp
FROM e
LEFT JOIN sys.dm_db_stats_histogram(274100017, 2) h
ON e.LastName = h.range_high_key
ORDER BY rn DESC;
gives me this:
and the only way I can think to update the step column is like so (there must be a better way):
WHILE 1 = 1
BEGIN
;WITH stps (rn, step)
AS (SELECT rn,
LAG(step) OVER (ORDER BY rn DESC) step
FROM #tmptmp)
UPDATE t
SET step = stps.step
FROM #tmptmp t
JOIN stps
ON t.rn = stps.rn
WHERE t.step IS NULL
AND stps.step IS NOT NULL;
IF @@Rowcount <= 0
BREAK;
END
which gives me the step column :
which now allows me to use windows functions like so:
SELECT LastName,
norows,
rn,
step,
SUM(norows) OVER (PARTITION BY step) - equal_rows CalculatedRangeRows,
ISNULL((SUM(norows) OVER (PARTITION BY step) - equal_rows) - range_rows,0) discrepancy,
range_rows,
equal_rows,
distinct_range_rows,
average_range_rows
FROM #tmptmp
ORDER BY rn DESC;
There has to be a better way of doing the update, seriously an update in a loop?.
Is there anyone that can do a better job of this?
Recursive CTE perhaps....
July 1, 2022 at 9:34 am
I am at this point searching for some obvious plan.
Viewing 3 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply