July 19, 2010 at 1:34 pm
Hi Folks,
I'm trying to minimize the total number of logical reads against a specific table. I've tested the following three approaches and each results in the same number of logical reads against Table A. Are there some other T-SQL coding techniques I can try to reduce the number of logical reads?
Here are the queries:
-- Query1, Original query in App
SELECT COUNT(*)
FROM [Table A] where col2 in (SELECT DISTINCT col2 FROM [Table B] b, [Table A] a where a.col1=b.col1 and col3 = 196)
-- Query2, Re-write #1
CREATE TABLE #tempWrk(
col2 nvarchar(128) not null
);
CREATE INDEX ix_temp on #tempWrk (col2)
INSERT #tempWrk (col2)
SELECT DISTINCT col2 FROM [Table B] a, [Table A] b where a.col1=b.col1 and a.col3 = 196
SELECT COUNT(*)
FROM [Table A] t1
inner join
#tempWrk t2
on t1.col2 = t2.col2
DROP TABLE #tempWrk
-- Query3, Re-Write #2
;WITH X AS (
SELECT DISTINCT col2 FROM [Table B] a, [Table A] b where a.col1=b.col1 and a.col3 = 196
)
SELECT COUNT(*)
FROM
[Table A] t1
inner join
X on t1.col2 = X.col2
Before executing these queries, I had set statisics io on. The following is the output:
Query1:
Table '[Table B]'. Scan count 5, logical reads 12027, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '[Table A]'. Scan count 10, logical reads 33090, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Query2:
Table '[Table B]'. Scan count 5, logical reads 12027, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '[Table A]'. Scan count 5, logical reads 16545, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#tempWrk____________________________________________________________________________________________________________00000000000F'. Scan count 0, logical reads 1625295, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(416164 row(s) affected)
(1 row(s) affected)
Table '#tempWrk____________________________________________________________________________________________________________00000000000F'. Scan count 5, logical reads 1595, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '[Table A]'. Scan count 5, logical reads 16545, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Query3:
Table '[Table B]'. Scan count 5, logical reads 12027, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '[Table A]'. Scan count 10, logical reads 33090, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
The table definitions are as follows:
CREATE TABLE [dbo].[Table A](
[col1] [int] IDENTITY(1,1) NOT NULL,
[col2] [nvarchar](50) NULL,
CONSTRAINT [PK_DocInfo] PRIMARY KEY CLUSTERED ([docid] ASC)
) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [idx_col2] ON [dbo].[Table A] ([col2] ASC) INCLUDE ([col1])
GO
CREATE TABLE [dbo].[Table B](
[Col3] [int] NOT NULL,
[Col4] [int] NULL,
[Col1] [int] NULL
)
CREATE NONCLUSTERED INDEX [IX_tableB] ON [dbo].[Table B] ([Col3] ASC)
From these, I conclude that each of these three methods will touch [Table A] the exact same number of times. Are there some other techniques I can try that may touch less pages? Thanks in advance!
July 19, 2010 at 3:00 pm
Wouldn't it be easier to use
SELECT COUNT(DISTINCT col2)
FROM [Table A]
WHERE col3 = 196
Or am I missing something obvious?
July 19, 2010 at 3:01 pm
I'm not sure, Lutz. I'm reading his code more thoroughly right now.
He only wants counts of values that come from Table B.
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
July 19, 2010 at 3:05 pm
Thanks for the replies guys. Sorry if the use of ColX has confused us. Table A does not contain Col3 and Table B does not contain Col2. Hope that clears things up a little.
July 19, 2010 at 3:32 pm
Without data to test, this is a shot in the dark, but....
SELECT COUNT(DISTINCT docid) -- docid is primary key of Table A
FROM [Table A] a
JOIN [Table B] b on a.col1=b.col1
WHERE col3 = 196
The code counts distinct doc_ids to be sure there is no twice-counting of rows from Table A due to the join. It may be LESS efficient, but we won't know until you test it.
Important question: Do you have indexes built on col1 for both [Table A] and [Table B]?
(Use of col1 as an included column doesn't count.)
I'd be interested in seeing the I/O stats, Time stats, and execution plans. Also it would be nice to know the volumes we are dealing with in both your tables.
Interesting question.
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
July 19, 2010 at 3:46 pm
The Dixie Flatline (7/19/2010)
Without data to test, this is a shot in the dark, but....
SELECT COUNT(DISTINCT docid) -- docid is primary key of Table A
FROM [Table A] a
JOIN [Table B] b on a.col1=b.col1
WHERE col3 = 196
The code counts distinct doc_ids to be sure there is no twice-counting of rows from Table A due to the join. It may be LESS efficient, but we won't know until you test it.
Important question: Do you have indexes built on col1 for both [Table A] and [Table B]?
(Use of col1 as an included column doesn't count.)
I'd be interested in seeing the I/O stats, Time stats, and execution plans. Also it would be nice to know the volumes we are dealing with in both your tables.
Interesting question.
If his initial queries are correct, this won't work. He wants all the records from A where the Col2 value is found in the query you have above. So first you're doing a join from A to B to get the possible Col2 values and then you're joining back to A to get all records associated with the list of Col2 values.
I was going to respond with some suggestions, but without reasonable sample data being provided I didn't feel like guessing.
July 19, 2010 at 3:51 pm
My reasoning was that he first produces a distinct setof values, then gets a simple count of rows from Table A which have values in that set. However the set of values is dependent on a join between A and B on col1. That join could potentially produce a result set in which rows from column A were produced twice. Doing a count distinct on the primary key would eliminate the duplicates from consideration.
But yes, I freely admitted I was shooting blind 😉
So first you're doing a join from A to B to get the possible Col2 values and then you're joining back to A to get all records associated with the list of Col2 values.
By "you", I assume that you're referring to SwedishOrr. The whole point of the code I wrote was to avoid hitting Table A twice.
It might also be helpful to all these queries to have an index on col3, because in the absence of such an index, rows from Table A have to be scanned to determine the set of rows with col3 = 196. Again, it would be nice to know the volumes involved, see the execution plans, and be sure of all the indexes.
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
July 19, 2010 at 4:04 pm
The Dixie Flatline (7/19/2010)
By "you", I assume that you're referring to SwedishOrr. The whole point of the code I wrote was to avoid hitting Table A twice.
Yes indeed ... the "you" is the generic individual trying to solve the problem as stated by the OP. I noticed that he was going out of his way to get the Col2 values and then join back to the table so I assumed this was necessary.
July 19, 2010 at 4:14 pm
I'm not sure it is necessary because the col2 values come from Table A in the first place.
However, upon reflection, it is quite possible that other rows in Table A have those col2 values but do NOT have col1 values that would successfully join to Table B. So my solution may well return erroneous results.
Thanks for setting me straight. 🙂
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
July 19, 2010 at 4:16 pm
Thanks again for the replies. I've attached sample data now. Also, I made a typo in the original post. The PK of Table A is Col1.
I really appreciate the ideas folks!
Thanks.
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply