March 17, 2023 at 12:00 am
Comments posted to this topic are about the item Expert Performance Indexing in Azure SQL and SQL Server 2022
March 31, 2023 at 7:12 am
When it’s written “free ebook to download”, I’m expecting to be free of cost not freely able to download and pay for a ebook…
March 31, 2023 at 3:37 pm
When it’s written “free ebook to download”, I’m expecting to be free of cost not freely able to download and pay for a ebook…
I was just going to write something similar. Not a book... the comment above. 😀
--Jeff Moden
Change is inevitable... Change for the better is not.
March 31, 2023 at 3:45 pm
Apologies, put the URL in the wrong field.
March 31, 2023 at 7:44 pm
Thanks, Steve.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 3, 2024 at 2:57 pm
There is a great problem in the book, when we are talking about performance. In a HEAP - on page 3 - data is NOT ordered as inserted. We reuse space on already allocated pages if possible, or allocate a new page elsewhere in the file - maybe earlier, because of drop of another object or maybe later in the file. It is why the execution time can be high when inserting into a HEAP. The following - not optimized - small script shows the principle used for a HEAP.
USE master;
GO
DROP DATABASE IF EXISTS TestDB;
GO
CREATE DATABASE TestDB;
GO
USE TestDB;
CREATE TABLE dbo.HeapData
(
ID INTNOT NULLIDENTITY
CONSTRAINT PK_HeapDataPRIMARY KEY NONCLUSTERED,
Txt VARCHAR (1000) NOT NULL
);
CREATE TABLE dbo.t1
(
ID INT NOT NULLIDENTITY
CONSTRAINT PK_t1 PRIMARY KEY,
Txt CHAR (1000) NOT NULL
);
GO
SET NOCOUNT ON;
GO
INSERT INTO dbo.t1 (Txt) VALUES
('xxxx');
GO 5000
INSERT INTO dbo.HeapData (Txt) VALUES
(REPLICATE ('xxxxxxxxxx', 90));
GO 50
INSERT INTO dbo.t1 (Txt)
SELECT Txt
FROM dbo.t1;
GO 5
INSERT INTO dbo.HeapData (Txt) VALUES
(REPLICATE ('xxxxxxxxxx', 70));
GO 5000
SELECTFPLC.file_id, FPLC.page_id, COUNT (*)
FROM dbo.HeapData CROSS APPLY sys.fn_PhysLocCracker (%%physloc%%) AS FPLC
GROUP BY FPLC.file_id, FPLC.page_id
ORDER BY FPLC.file_id, FPLC.page_id;
GO
DELETE
FROM dbo.HeapData
WHERE ID % 2 = 1;
CHECKPOINT;
GO
INSERT INTO dbo.HeapData (Txt) VALUES
(REPLICATE ('xxxxxxxxxx', 20));
GO 5000
SELECTFPLC.file_id, FPLC.page_id, COUNT (*)
FROM dbo.HeapData CROSS APPLY sys.fn_PhysLocCracker (%%physloc%%) AS FPLC
GROUP BY FPLC.file_id, FPLC.page_id
ORDER BY FPLC.file_id, FPLC.page_id;
GO
DROP TABLE dbo.t1;
GO
CHECKPOINT;
WAITFOR DELAY '0:0:20';
GO
INSERT INTO dbo.HeapData (Txt)
SELECT Txt
FROM dbo.HeapData;
GO 2
SELECTFPLC.file_id, FPLC.page_id, COUNT (*)
FROM dbo.HeapData CROSS APPLY sys.fn_PhysLocCracker (%%physloc%%) AS FPLC
GROUP BY FPLC.file_id, FPLC.page_id
ORDER BY FPLC.file_id, FPLC.page_id;
SELECT HeapData.ID, FPLC.file_id, FPLC.page_id
FROM dbo.HeapData CROSS APPLY sys.fn_PhysLocCracker (%%physloc%%) AS FPLC
ORDER BY FPLC.file_id, FPLC.page_id;
SELECT HeapData.ID, FPLC.file_id, FPLC.page_id
FROM dbo.HeapData CROSS APPLY sys.fn_PhysLocCracker (%%physloc%%) AS FPLC
ORDER BY HeapData.ID;
GO
-- file_id is ignored because we only have one file
-- page_id_diff shows both positive and negative values
-- the next row in ID order is both on later and earlier pages
WITH Data
AS
(
SELECT HeapData.ID, FPLC.file_id, FPLC.page_id
FROM dbo.HeapData CROSS APPLY sys.fn_PhysLocCracker (%%physloc%%) AS FPLC
)
SELECT d1.ID, d2.ID, d1.page_id, d2.page_id, d2.page_id - d1.page_id AS page_id_diff
FROM Data AS d1 INNER JOIN Data AS d2 ONd1.ID = d2.ID - 1 OR
d1.ID = d2.ID - 2-- we delete ID % 2 = 1 rows
WHEREd1.file_id = d2.file_id AND
d1.page_id <> d2.page_id;
January 4, 2024 at 3:54 am
That's great but without the book, the uncommented code you posted doesn't explain at all what the code is supposed to be teaching.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 4, 2024 at 7:48 am
I strongly disagree. The book is Advanced/Expert, so I have to assume that it is not only read by beginners. So my code and example is just a simple example.
It is clearly stated on page 3 that data appears physically in the same order as it is inserted. This may, among other things, be the reason why I see statements such as that it is not necessary to sort data, since they are accessed in the same order as inserted, that using HEAP is fast, since data does not have to be inserted at a specific position but just lastly, .... I have a script that shows that when inserting millions of rows, using HEAP will cause double the execution time compared to a Clustered index.
Unfortunately, I also still see that in various posts it is written that data in a Clustered index is physically stored according to the Cluster Key. With a simple SELECT with table hint NOLOCK, it can be clearly seen that data in a Clustered Index is only logically ordered by Cluster Key - IAM and not Cluster Key is used to access data.
Incidentally, it is decidedly problematic when someone writes first and last in a table. Since a table always consists of unordered rows, the result is only sorted if ORDER BY is specified at the same time. It is still seen that DISTINCT or GROUP BY results in a sorted result - a serious error that will not be corrected by repeating.
If on page 3 such a serious error is brushed off, there is reason to react. And there is no need to read further.
January 4, 2024 at 5:31 pm
A few comments
First, if you have issues with the book, they ought to be reported to the author/publisher.
Second, many books, and much code, has errors. It could be early in the code or later, but an error in one place doesn't make the entire thing wrong.
Third, the clustered index does store data in physical order on pages. The pagse/extents could get out of order, which is fragmentation, but that doesn't change the physical nature of storing the data. It is stored in the clustered key order. I think this might be more a verbiage argument, but I'd be interested in a repro that shows differently. Or an article if you feel people are misinformed.
January 4, 2024 at 6:48 pm
When the book is profiled several times on your site, it must also be reasonable that it is the same place with the same readers who are warned. It is not just a small error in some code, but a serious error in terms of understanding. The first row inserted is the first row in the table. The second row inserted is the second row in the table. The third row is the third row in the table. It could be simply expressed by the fact that all rows that are inserted are inserted into already allocated pages. If there is no space, a new page is allocated. This way, misunderstandings can be avoided and wrong claims can be avoided.
It is correct that data is physically arranged within a page when a Clustered Index is defined. But not within an extent and within the files the table is stored in. If a table is stored on 20,000 pages, it can hardly be called a physical order if the first pages in Cluster Key order are page 1:245, the next is page 1 :9999, the third is page 1:27, etc. This can be easily proven by selecting data from a table where the hint NOLOCK is specified. Unfortunately, this hint is also standard for some.
But if your site cannot tolerate serious errors being pointed out, it just has to be taken into account.
January 8, 2024 at 12:05 pm
Some of the pages from my book published on Kindle.
January 8, 2024 at 8:22 pm
Third, the clustered index does store data in physical order on pages. The pagse/extents could get out of order, which is fragmentation, but that doesn't change the physical nature of storing the data. It is stored in the clustered key order. I think this might be more a verbiage argument, but I'd be interested in a repro that shows differently. Or an article if you feel people are misinformed.
After all this time, it's unfortunate to see just how many people believe in that myth. One was written quite nicely more than a decade ago, courtesy of Wayne Sheffield.
The summary at the bottom of his article says it all as a tl;dr...
The slot array is always maintained in key value order, and the rows on each page are those that should be on that page as reflected by the key value of the row. However, the physical storage of the row on that page is not necessarily maintained in key value order. Changes to data so that the data no longer fits into its original space, or inserts that are not in the key value order can cause the actual data to not be physically stored in the key order and will be stored at the beginning of the unused space at the end of the page. When the unused space is consolidated, existing rows are moved up the page, but they are not resorted.
The same is true for non-clustered indexes.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 9, 2024 at 7:37 am
This was removed by the editor as SPAM
January 9, 2024 at 8:15 am
"However, the physical storage of the row on that page is not necessarily maintained in key value order" - the only important is, that the rows only can be accessed in slot order and in this way in index key order. Data in extents and in the file can be accessed in Key order or in physical order by specifying NOLOCK. With NOLOCK SQL Server use the IAM.
January 9, 2024 at 2:07 pm
The storage on the page isn't important because you can't read a row. you can only read a page. Rows on the page can move if they are updated, but it's not a material difference in how the index performs.
Viewing 15 posts - 1 through 15 (of 15 total)
You must be logged in to reply to this topic. Login to reply