Baffling results

  • I have two stored procedures that are functionally identical. I won't bother to post them as the queries themselves are irrelevant. It's the dramatic difference in behavior that is baffling.
    Query 1: A traditional query against a relational database with seven or eight joins and an order by clause on a primary key.
    Query 2: The functional equivalent but it's against a dimensional data model involving one fact table and seven dimensions.
    Both queries return the exact same information.
    Query 1 takes one minute seven seconds clock time. According to the client statistics it takes 67,120 milliseconds. With Activity Monitor turned on it is primarily CPU driven with a burst of I/O at the end.
    Query 2 takes 58 seconds clock time. According to the client statistics it takes 323 milliseconds. That's right, just over a 1/4 of a second. However, it spends the vast majority of it's time doing I/O and almost no cpu time.

    The query plan on Query 2 is clean as a whistle. It looks just like you'd expect, all Index Seeks of the clustered primary key of each dimension. The largest "cost" is 23% for any one operation. The I/O statistics show that it is only doing logical reads with the largest being 78920.

    By comparison, query 1 is performing millions and millions of logical reads with some physical reads and read-ahead reads. The largest operational "cost" in this query is 40% for a Hash Match (Inner Join).

    Both queries return roughly the same number of bytes to the client. The database got a clean bill of health from CheckDB and I defragmented all the disks.

    Does anyone have any thoughts about what query 2 might be doing for 57.5 clock seconds? I know it says it's doing I/O but it sure seems excessive based on the cost of the query plan, the client statistics and the I/O statistics.

    "Beliefs" get in the way of learning.

  • Robert Frasca - Friday, March 8, 2019 8:46 AM

    I have two stored procedures that are functionally identical. I won't bother to post them as the queries themselves are irrelevant. It's the dramatic difference in behavior that is baffling.
    Query 1: A traditional query against a relational database with seven or eight joins and an order by clause on a primary key.
    Query 2: The functional equivalent but it's against a dimensional data model involving one fact table and seven dimensions.
    Both queries return the exact same information.
    Query 1 takes one minute seven seconds clock time. According to the client statistics it takes 67,120 milliseconds. With Activity Monitor turned on it is primarily CPU driven with a burst of I/O at the end.
    Query 2 takes 58 seconds clock time. According to the client statistics it takes 323 milliseconds. That's right, just over a 1/4 of a second. However, it spends the vast majority of it's time doing I/O and almost no cpu time.

    The query plan on Query 2 is clean as a whistle. It looks just like you'd expect, all Index Seeks of the clustered primary key of each dimension. The largest "cost" is 23% for any one operation. The I/O statistics show that it is only doing logical reads with the largest being 78920.

    By comparison, query 1 is performing millions and millions of logical reads with some physical reads and read-ahead reads. The largest operational "cost" in this query is 40% for a Hash Match (Inner Join).

    Both queries return roughly the same number of bytes to the client. The database got a clean bill of health from CheckDB and I defragmented all the disks.

    Does anyone have any thoughts about what query 2 might be doing for 57.5 clock seconds? I know it says it's doing I/O but it sure seems excessive based on the cost of the query plan, the client statistics and the I/O statistics.

    You are leaving out one of the key parts, how many rows of data are being returned to SSMS?  Returning data is the great equalizer.

  • Lynn Pettis - Friday, March 8, 2019 12:18 PM

    Robert Frasca - Friday, March 8, 2019 8:46 AM

    I have two stored procedures that are functionally identical. I won't bother to post them as the queries themselves are irrelevant. It's the dramatic difference in behavior that is baffling.
    Query 1: A traditional query against a relational database with seven or eight joins and an order by clause on a primary key.
    Query 2: The functional equivalent but it's against a dimensional data model involving one fact table and seven dimensions.
    Both queries return the exact same information.
    Query 1 takes one minute seven seconds clock time. According to the client statistics it takes 67,120 milliseconds. With Activity Monitor turned on it is primarily CPU driven with a burst of I/O at the end.
    Query 2 takes 58 seconds clock time. According to the client statistics it takes 323 milliseconds. That's right, just over a 1/4 of a second. However, it spends the vast majority of it's time doing I/O and almost no cpu time.

    The query plan on Query 2 is clean as a whistle. It looks just like you'd expect, all Index Seeks of the clustered primary key of each dimension. The largest "cost" is 23% for any one operation. The I/O statistics show that it is only doing logical reads with the largest being 78920.

    By comparison, query 1 is performing millions and millions of logical reads with some physical reads and read-ahead reads. The largest operational "cost" in this query is 40% for a Hash Match (Inner Join).

    Both queries return roughly the same number of bytes to the client. The database got a clean bill of health from CheckDB and I defragmented all the disks.

    Does anyone have any thoughts about what query 2 might be doing for 57.5 clock seconds? I know it says it's doing I/O but it sure seems excessive based on the cost of the query plan, the client statistics and the I/O statistics.

    You are leaving out one of the key parts, how many rows of data are being returned to SSMS?  Returning data is the great equalizer.

    It returns a full data set of 716 but only 10 rows to the client as I'm using OFFSET for paging.

    "Beliefs" get in the way of learning.

  • Robert Frasca - Friday, March 8, 2019 12:35 PM

    Lynn Pettis - Friday, March 8, 2019 12:18 PM

    Robert Frasca - Friday, March 8, 2019 8:46 AM

    I have two stored procedures that are functionally identical. I won't bother to post them as the queries themselves are irrelevant. It's the dramatic difference in behavior that is baffling.
    Query 1: A traditional query against a relational database with seven or eight joins and an order by clause on a primary key.
    Query 2: The functional equivalent but it's against a dimensional data model involving one fact table and seven dimensions.
    Both queries return the exact same information.
    Query 1 takes one minute seven seconds clock time. According to the client statistics it takes 67,120 milliseconds. With Activity Monitor turned on it is primarily CPU driven with a burst of I/O at the end.
    Query 2 takes 58 seconds clock time. According to the client statistics it takes 323 milliseconds. That's right, just over a 1/4 of a second. However, it spends the vast majority of it's time doing I/O and almost no cpu time.

    The query plan on Query 2 is clean as a whistle. It looks just like you'd expect, all Index Seeks of the clustered primary key of each dimension. The largest "cost" is 23% for any one operation. The I/O statistics show that it is only doing logical reads with the largest being 78920.

    By comparison, query 1 is performing millions and millions of logical reads with some physical reads and read-ahead reads. The largest operational "cost" in this query is 40% for a Hash Match (Inner Join).

    Both queries return roughly the same number of bytes to the client. The database got a clean bill of health from CheckDB and I defragmented all the disks.

    Does anyone have any thoughts about what query 2 might be doing for 57.5 clock seconds? I know it says it's doing I/O but it sure seems excessive based on the cost of the query plan, the client statistics and the I/O statistics.

    You are leaving out one of the key parts, how many rows of data are being returned to SSMS?  Returning data is the great equalizer.

    It returns a full data set of 716 but only 10 rows to the client as I'm using OFFSET for paging.

    The relational version does it's own paging as it was written for SQL Server 2008.

    "Beliefs" get in the way of learning.

  • Robert Frasca - Friday, March 8, 2019 12:36 PM

    Robert Frasca - Friday, March 8, 2019 12:35 PM

    Lynn Pettis - Friday, March 8, 2019 12:18 PM

    Robert Frasca - Friday, March 8, 2019 8:46 AM

    I have two stored procedures that are functionally identical. I won't bother to post them as the queries themselves are irrelevant. It's the dramatic difference in behavior that is baffling.
    Query 1: A traditional query against a relational database with seven or eight joins and an order by clause on a primary key.
    Query 2: The functional equivalent but it's against a dimensional data model involving one fact table and seven dimensions.
    Both queries return the exact same information.
    Query 1 takes one minute seven seconds clock time. According to the client statistics it takes 67,120 milliseconds. With Activity Monitor turned on it is primarily CPU driven with a burst of I/O at the end.
    Query 2 takes 58 seconds clock time. According to the client statistics it takes 323 milliseconds. That's right, just over a 1/4 of a second. However, it spends the vast majority of it's time doing I/O and almost no cpu time.

    The query plan on Query 2 is clean as a whistle. It looks just like you'd expect, all Index Seeks of the clustered primary key of each dimension. The largest "cost" is 23% for any one operation. The I/O statistics show that it is only doing logical reads with the largest being 78920.

    By comparison, query 1 is performing millions and millions of logical reads with some physical reads and read-ahead reads. The largest operational "cost" in this query is 40% for a Hash Match (Inner Join).

    Both queries return roughly the same number of bytes to the client. The database got a clean bill of health from CheckDB and I defragmented all the disks.

    Does anyone have any thoughts about what query 2 might be doing for 57.5 clock seconds? I know it says it's doing I/O but it sure seems excessive based on the cost of the query plan, the client statistics and the I/O statistics.

    You are leaving out one of the key parts, how many rows of data are being returned to SSMS?  Returning data is the great equalizer.

    It returns a full data set of 716 but only 10 rows to the client as I'm using OFFSET for paging.

    The relational version does it's own paging as it was written for SQL Server 2008.

    Let's ask this a different way, how many bytes of data are being returned?

  • Lynn Pettis - Friday, March 8, 2019 12:40 PM

    Robert Frasca - Friday, March 8, 2019 12:36 PM

    Robert Frasca - Friday, March 8, 2019 12:35 PM

    Lynn Pettis - Friday, March 8, 2019 12:18 PM

    Robert Frasca - Friday, March 8, 2019 8:46 AM

    I have two stored procedures that are functionally identical. I won't bother to post them as the queries themselves are irrelevant. It's the dramatic difference in behavior that is baffling.
    Query 1: A traditional query against a relational database with seven or eight joins and an order by clause on a primary key.
    Query 2: The functional equivalent but it's against a dimensional data model involving one fact table and seven dimensions.
    Both queries return the exact same information.
    Query 1 takes one minute seven seconds clock time. According to the client statistics it takes 67,120 milliseconds. With Activity Monitor turned on it is primarily CPU driven with a burst of I/O at the end.
    Query 2 takes 58 seconds clock time. According to the client statistics it takes 323 milliseconds. That's right, just over a 1/4 of a second. However, it spends the vast majority of it's time doing I/O and almost no cpu time.

    The query plan on Query 2 is clean as a whistle. It looks just like you'd expect, all Index Seeks of the clustered primary key of each dimension. The largest "cost" is 23% for any one operation. The I/O statistics show that it is only doing logical reads with the largest being 78920.

    By comparison, query 1 is performing millions and millions of logical reads with some physical reads and read-ahead reads. The largest operational "cost" in this query is 40% for a Hash Match (Inner Join).

    Both queries return roughly the same number of bytes to the client. The database got a clean bill of health from CheckDB and I defragmented all the disks.

    Does anyone have any thoughts about what query 2 might be doing for 57.5 clock seconds? I know it says it's doing I/O but it sure seems excessive based on the cost of the query plan, the client statistics and the I/O statistics.

    You are leaving out one of the key parts, how many rows of data are being returned to SSMS?  Returning data is the great equalizer.

    It returns a full data set of 716 but only 10 rows to the client as I'm using OFFSET for paging.

    The relational version does it's own paging as it was written for SQL Server 2008.

    Let's ask this a different way, how many bytes of data are being returned?

    Roughly 375,000 for both of them.

    "Beliefs" get in the way of learning.

  • Robert Frasca - Friday, March 8, 2019 8:46 AM

    I have two stored procedures that are functionally identical. I won't bother to post them as the queries themselves are irrelevant. It's the dramatic difference in behavior that is baffling.
    Query 1: A traditional query against a relational database with seven or eight joins and an order by clause on a primary key.
    Query 2: The functional equivalent but it's against a dimensional data model involving one fact table and seven dimensions.
    Both queries return the exact same information.

    Just because both procedures return the same information does not mean they are functionally identical.  You described two different database structures so I'm assuming the underlining queries and tables are significantly different as well as potentially the data in them.

  • ZZartin - Friday, March 8, 2019 1:50 PM

    Robert Frasca - Friday, March 8, 2019 8:46 AM

    I have two stored procedures that are functionally identical. I won't bother to post them as the queries themselves are irrelevant. It's the dramatic difference in behavior that is baffling.
    Query 1: A traditional query against a relational database with seven or eight joins and an order by clause on a primary key.
    Query 2: The functional equivalent but it's against a dimensional data model involving one fact table and seven dimensions.
    Both queries return the exact same information.

    Just because both procedures return the same information does not mean they are functionally identical.  You described two different database structures so I'm assuming the underlining queries and tables are significantly different as well as potentially the data in them.

    That's kind of the point of my post. They are functionally the same in that they execute queries to return the same set of data. The fact that they do it differently is irrelevant to their functionality.  I was hoping the data mart would dramatically improve performance but it isn't and I can't find any evidence as to why.  All the evidence I have seems to indicate that it SHOULD be much faster.

    "Beliefs" get in the way of learning.

  • Robert Frasca - Friday, March 8, 2019 2:20 PM

    ZZartin - Friday, March 8, 2019 1:50 PM

    Robert Frasca - Friday, March 8, 2019 8:46 AM

    I have two stored procedures that are functionally identical. I won't bother to post them as the queries themselves are irrelevant. It's the dramatic difference in behavior that is baffling.
    Query 1: A traditional query against a relational database with seven or eight joins and an order by clause on a primary key.
    Query 2: The functional equivalent but it's against a dimensional data model involving one fact table and seven dimensions.
    Both queries return the exact same information.

    Just because both procedures return the same information does not mean they are functionally identical.  You described two different database structures so I'm assuming the underlining queries and tables are significantly different as well as potentially the data in them.

    That's kind of the point of my post. They are functionally the same in that they execute queries to return the same set of data. The fact that they do it differently is irrelevant to their functionality.  I was hoping the data mart would dramatically improve performance but it isn't and I can't find any evidence as to why.  All the evidence I have seems to indicate that it SHOULD be much faster.

    Have you tried the datawarehouse version without the OFFSET stuff?

  • The first thing to check is what indexes are being used to return data in each of the queries.  My guess is that there are good indexes in the transactional data and no good indexes (for the query you are running) in the data mart data.  SSMS has good tools to see the execution plans of queries, so that is a good place to start.

    https://docs.microsoft.com/en-us/sql/relational-databases/performance/display-an-actual-execution-plan?view=sql-server-2017

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply