Number of logical reads very different between 2 supposed identical environments

Question

Number of logical reads very different between 2 supposed identical environments

polytropic2310

Ten Centuries

Points: 1004
More actions
November 22, 2016 at 9:03 am

#319197

Hello,
What parameters can have an impact on "number of logical reads" ?
UAT and PROD environments are supposed to be the same but "number of logical reads" is huge for PROD when I do a simple test on View_Step2 (4.7M lines) :
SELECT TOP 1000 .. FROM View_Step2
Response time UAT : < 1s. 🙂
Response time PROD : about 4 min. :angry:
Here are UAT vs PROD statistics :
Here are execution plans on UAT and PROD, they seem very similar
View_Step2 uses View_Step1 joined with a very small table (3 lignes).
View_step1 uses a single table (about 9.4M lines).
All this is a huge mystery for me !
How could I find the difference(s) between UAT and PROD environments that could explain the big difference of execution time for this simple query ?
Thanks in advance !
Regards,
Steph.

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply

Eirikur Eiriksson SSC Guru Points: 182918 More actions · Answer 1

The difference is in the cardinality, UAT has less than 1000 rows where PROD has 1.3 million rows, obviously there will be a big difference in the logical (and probably physical) reads.

😎

polytropic2310 Ten Centuries Points: 1004 More actions · Answer 2

Eirikur Eiriksson (11/22/2016)
The difference is in the cardinality, UAT has less than 1000 rows where PROD has 1.3 million rows, obviously there will be a big difference in the logical (and probably physical) reads.
😎

Where do you see that ???

Table used by View_step1 have the same number of rows in UAT and PROD (9.4M).

Boths views have the same number of rows in UAT and PROD.

Steph.

Eirikur Eiriksson SSC Guru Points: 182918 More actions · Answer 3

polytropic2310 (11/22/2016)
Eirikur Eiriksson (11/22/2016)
The difference is in the cardinality, UAT has less than 1000 rows where PROD has 1.3 million rows, obviously there will be a big difference in the logical (and probably physical) reads.
😎
Where do you see that ???
Table used by View_step1 have the same number of rows in UAT and PROD (9.4M).
Boths views have the same number of rows in UAT and PROD.
Steph.

Here is an example from the execution plans

😎

UAT

<RunTimeInformation>

<RunTimeCountersPerThread Thread="0" ActualRows="1000" ActualEndOfScans="1" ActualExecutions="1" />

Prod

<RunTimeInformation>

<RunTimeCountersPerThread Thread="0" ActualRows="4665284" ActualEndOfScans="0" ActualExecutions="1" />

</RunTimeInformation>

Just to start, there seems to be a massive difference in the cardinality of Dataxyz_GeoLoc_Input between uat and prod. Have you actually checked if UAT and PROD are in sync? One should always check and never take someones word for it!

Jacob Wilkins One Orange Chip Points: 27976 More actions · Answer 4

The plan does show the cardinality of the table to be the same.

It's the actual rows coming out of the scan that are so wildly different.

From a quick glance it just looks like in PROD the row goal of 1000 is not getting pushed all the way down to that scan (or it is, and it's having to scan most of the table before it can produce 1000 rows), while in UAT it is.

Otherwise, there are some estimate differences between UAT and PROD (1000 vs 2000 on the Dataxyz_GeoLoc_Input table, and 1 vs 3 on itb_annee, respectively).

The PROD plan optimization also timed out, while for UAT it terminated with a "Good Enough Plan Found".

Cheers!

EDIT: Forgot to add this, I'd double-check to make sure the distribution and content is the same in both tables, in addition to the row count.

At the end of the day, it might be as painful as the scan of the heap just happening to hit 1000 qualifying rows more quickly in UAT than in PROD, based solely on quirks of how the data was allocated.

Hugo Kornelis SSC Guru Points: 64790 More actions · Answer 5

Looking at the plan, this appears to be an example of what is actually a very common problem.

The pattern in the execution plan that causes this is basically:

(Top) <=== (Filter) <=== (Read)

In this specific case, the (Read) is the table scan on Dataxyz_GeoLoc_Input, the (Filter) is the result of the inner joins (looking at the rowcounts in all intermediate steps, I can even say that all the filtering is solely done by the join to the itb_annee table).

To understand how this can sometimes go wrong, let's consider a real-world example. I give you a huge bowl with 10 million marbles. You know that half of them are red, but they are so dirty that you can't see their color without first cleaning them. You have two ways of cleaning: you can wipe one marble at a time, which takes 1 second per marble, or you can wash the entire bowl in a washing machine, which takes 20 minutes. If I ask you to give me all the red marbles, you will choose the washing machine and then find all the red marbles in the now clean bowl. But if I ask you to give me 5 red marbles, you will probably choose instead to pick a marble, wipe it, give it to me if it's red, then continue until you have 5. Which, statistically, means that you'll probably clean 10 marbles, taking 10 seconds. After that, you will stop working; the rest of the marbles are still dirty but who cares.

This plan backfires horribly if someone has managed to arrange the marbles so that all the red ones are at the bottom of the bowl. In the most extreme case, you will have cleaned all 5 million non-red marbles before you see your first red. Which means you just spent almost two months cleaning marbles one by one before you can finally stop.

The same type of bad result can happen when it turns out that the marbles were not 50% red at all, but just 0.1% red; in that case you will also plan to clean them one by one and find that there are far less reds than you expected so you end up cleaning thousands of marbles.

Wrapping back to your execution plans: the query has a TOP(1000) somewhere, and the plan uses the pattern above, with the (Top) operator in charge of stopping execution as soon as 1000 result rows are returned. To the far right is the table scan which is estimated to read 2000 rows - so based on the information available to the optimizer, it expects that it needs to process 2000 rows before finding 1000 matches. The statistics and other information available apparently suggest a 50% hit/miss ratio.

The actual numbers prove this wrong; in reality 4,665,284 rows (of the 9,460,720 total in the table) were read before the Top operator finally decided that it had all the data it needed and the rest of the plan could stop working. From the plan alone, it is not possible to see whether this is because the data happens to be read in such an order that all matches appear last, or whether this is because the estimation of a 50% hit/miss ratio is wrong.

In UAT, the plan shape is the same, but the numbers are slightly different. Here, the optimizer expects that it needs to read 1000 rows from the table scan, so it apparently expects a 100% hit ratio. And that estimate is correct, because the execution plan did in fact stop processing after 1000 rows. I predict with a 99.9% degree of certainty from this that the actual data in UAT is slightly different from that in PROD, and that this change in data is the reason that in fact all rows from the table would match the join to the itb_annee table.

It would be intersting to see whether the execution plan changes when you remove the TOP from the query. (I recommend also adding an INTO clause into a temporary table to prevent flooding SSMS with millions of rows when you try this).

The current plan is optimized based on the assumption that only 2000 rows (1000 in UAT) will be processed. This should result in the optimizer picking a "take a second to wash each individual marble" strategy over "using the machine to wash all at once". Without the TOP, that choice will change, and you would be able to see what type of execution plan you would get if the optimizer would be aware that it had to process over 4.6 million rows before finding 1000 matches. There might not even be a faster plan available (some queries simply do not have the equivalent of a marble-bowl washing machine), or it might still be slow; the way to find out is to run the query without TOP.

If that does fix the performance problem, then one way to get better performance from your query would be to make that query without TOP and with an INTO clause permanent, then use a SELECT TOP ... FROM #YourTempTable afterwards.

That being said - I also want to have a word about indexes. A "Table Scan" operator in the execution plan means that the table does not have a clustered index. For a table with so many rows. not having a clustered index is almost always a mistake. I strongly consider adding a clustered index to this table. You might also want to add one or more nonclustered indexes to speed up processing. It is possible that you understand indexing and have valid reasons for having this table as a heap, but I consider it more likely that you, or whoever originally designed this database, does not have sufficient understanding of indexes. (Proper indexing is a huge topic in itself, and I do not want to go too far off topic here; if you feel you need more help in that area, go out on the internet and start reading on indexes).

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

Beatrix Kiddo SSC-Dedicated Points: 32407 More actions · Answer 6

Beatrix Kiddo

SSC-Dedicated

Points: 32407

November 23, 2016 at 1:39 am

#1914165

I really enjoyed reading that, Hugo.

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 7

Beatrix Kiddo (11/23/2016)
I really enjoyed reading that, Hugo.

Absolutely likewise!

It's 85% of a cracking good article, and an important one also. Row goal queries can have astonishingly bad plans, at least with the old CE.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Eirikur Eiriksson SSC Guru Points: 182918 More actions · Answer 8

polytropic2310 (11/22/2016)
Hello,
What parameters can have an impact on "number of logical reads" ?
UAT and PROD environments are supposed to be the same but "number of logical reads" is huge for PROD when I do a simple test on View_Step2 (4.7M lines) :
SELECT TOP 1000 .. FROM View_Step2
Response time UAT : < 1s. 🙂
Response time PROD : about 4 min. :angry:
Here are UAT vs PROD statistics :
Here are execution plans on UAT and PROD, they seem very similar
View_Step2 uses View_Step1 joined with a very small table (3 lignes).
View_step1 uses a single table (about 9.4M lines).
All this is a huge mystery for me !
How could I find the difference(s) between UAT and PROD environments that could explain the big difference of execution time for this simple query ?
Thanks in advance !
Regards,
Steph.

Can you post the DDL for View_Step2,View_Step1, the "a very small table (3 lignes)" and the "single table (about 9.4M lines)" please? Also if there are typed XML columns or an XML index, could you post the XSD / XML index definition for those, both from UAT and PROD.

😎

There is an XQuery specific casts which is a part of a Filter node in both execution plans and that cast takes the output from the function xsd_cast_to_maybe_large. The interesting part is that the xsd_cast_to_maybe_large function's parameters are different in the two execution plans, hinting that there may be a different XSD for those columns or a different XML index.

polytropic2310 Ten Centuries Points: 1004 More actions · Answer 9

Hello all,

Thanks for your answers and the nice metaphor from Hugo. 😉

But I still don't understand why there is a difference of behavior between optimizer on UAT and PROD.

I'm sure at 99.99% that the content of tables/views is the same.

As you diagnosed it, the problem seems actually to be linked with "TOP" instruction.

In PROD, I used View_Step2 to load 4.7M lines into the target table [dtbData_XYZ_GeoLoc] and it took only about 20 min. that is quite normal and similar to the time taken for loading the table in UAT env.

In fact, View_Step2 is used to load [dtbData_XYZ_GeoLoc] (see DDL at the end of this post).

I used "SELECT TOP 1000 ... FROM View_Step2" just to test this view before loading it into target table [dtbData_XYZ_GeoLoc].

I was surprised to see such a different time taken to run this view on UAT (1s.) and PROD (4min.).

Is there a way to reset/configure the optimizer to have the same behavior on UAT and PROD ?

...to say to PROD "use washing-machine to clean all the marbles" ? 😉

View Step1 DDL:

NB: "SELECT TOP 1000 ... FROM View_step1" answers in less than 1s. .

USE [EntrepotXYZ]

GO

/****** Object: View [dbo].[lv_DataXYZ_GeoLoc_Step1] Script Date: 23/11/2016 14:53:36 ******/

SET ANSI_NULLS ON

GO

SET QUOTED_IDENTIFIER ON

GO

CREATE VIEW [dbo].[lv_DataXYZ_GeoLoc_Step1]

AS

SELECT intra_annee, LEFT(siret, 9) AS intra_siren, siret + ident_int AS intra_cle_id, CAST(depcom_red + ISNULL(LTRIM(RTRIM(iris)) + REPLICATE('0', 4 - LEN(LTRIM(RTRIM(iris)))), REPLICATE('0', 4)) AS varchar(9))

AS intra_iris, CASE WHEN LEFT(depcom_red, 2) = '97' THEN LEFT(depcom_red, 3) ELSE LEFT(depcom_red, 2) END AS intra_dep, ident_org, siret, rs, cat_org, typeconst_red, nbpiece_red, construct_red,

conv_red, numconv_red, CASE WHEN datconv_red IS NULL THEN NULL WHEN LTRIM(rtrim(datconv_red)) = '' THEN NULL ELSE RIGHT(datconv_red, 4) + '-' + SUBSTRING(datconv_red, 4, 2)

+ '-' + LEFT(datconv_red, 2) END AS datconv_red, rsexpro_red, siretexpro_red, finan_red, finanautre_red, cus_red, identges_red, ident_int, ident_rep, sru_expir_red, sru_alinea_red, reg, lib_reg, dep, lib_dep,

epci, libepci, depcom_red, lib_depcom, codepostal_red, numvoie_red, indrep_red, typvoie_red, nomvoie_red, numappt_red, numboite_red, esc_red, couloir_red, etage_red, complident_red, entree_red, bat_red,

immeu_red, complgeo_red, lieudit_red, zus_red, loyerprinc_red, loyeracc_red, contrib_red, CAST('' AS xml ).value('sql:column("loymoy") cast as xs:decimal ? ', 'decimal(7,3)') AS loymoy, loyermaxapl_red,

loyermaxcus_red, locat_red, mode_red, bail_red, remlocdate_red, contreslog_red, duree_vacance, codsegpatrim_red, libsegpatrim_red, droit_red, patrimoine_red, origine_red, sortiepatrim_red, newlogt_red,

oldlogt_red, dpedate_red, dpeenergie_red, dpeserre_red, surfhab_red, modesurf_red, surfmode_red, qualacq_red, miscommercial_red, prixvente_red, prodfin_red, mes_sanscumul,

CAST('' AS xml ).value('sql:column("x") cast as xs:decimal ?', 'decimal(24,12)') AS x, CAST('' AS xml ).value('sql:column("y") cast as xs:decimal ?', 'decimal(24,12)') AS y,

CAST('' AS xml ).value('sql:column("x_l2e") cast as xs:decimal ?', 'decimal(24,12)') AS x_l2e, CAST('' AS xml ).value('sql:column("y_l2e") cast as xs:decimal ?', 'decimal(24,12)') AS y_l2e, zus, zfu, iris, nqp, qp, ril,

comazus, comazfu, comanqp, comaqp

FROM DataXYZ.dbo.DataXYZ_GeoLoc_Input

GO

EXEC sys.sp_addextendedproperty @name=N'MS_DiagramPane1', @value=N'[0E232FF0-B466-11cf-A24F-00AA00A3EFFF, 1.00]

Begin DesignProperties =

Begin PaneConfigurations =

Begin PaneConfiguration = 0

NumPanes = 4

Configuration = "(H (1[40] 4[20] 2[20] 3) )"

End

Begin PaneConfiguration = 1

NumPanes = 3

Configuration = "(H (1 [50] 4 [25] 3))"

End

Begin PaneConfiguration = 2

NumPanes = 3

Configuration = "(H (1 [50] 2 [25] 3))"

End

Begin PaneConfiguration = 3

NumPanes = 3

Configuration = "(H (4 [30] 2 [40] 3))"

End

Begin PaneConfiguration = 4

NumPanes = 2

Configuration = "(H (1 [56] 3))"

End

Begin PaneConfiguration = 5

NumPanes = 2

Configuration = "(H (2 [66] 3))"

End

Begin PaneConfiguration = 6

NumPanes = 2

Configuration = "(H (4 [50] 3))"

End

Begin PaneConfiguration = 7

NumPanes = 1

Configuration = "(V (3))"

End

Begin PaneConfiguration = 8

NumPanes = 3

Configuration = "(H (1[56] 4[18] 2) )"

End

Begin PaneConfiguration = 9

NumPanes = 2

Configuration = "(H (1 [75] 4))"

End

Begin PaneConfiguration = 10

NumPanes = 2

Configuration = "(H (1[66] 2) )"

End

Begin PaneConfiguration = 11

NumPanes = 2

Configuration = "(H (4 [60] 2))"

End

Begin PaneConfiguration = 12

NumPanes = 1

Configuration = "(H (1) )"

End

Begin PaneConfiguration = 13

NumPanes = 1

Configuration = "(V (4))"

End

Begin PaneConfiguration = 14

NumPanes = 1

Configuration = "(V (2))"

End

ActivePaneConfig = 0

End

Begin DiagramPane =

Begin Origin =

Top = 0

Left = 0

End

Begin Tables =

Begin Table = "DataXYZ_GeoLoc_Input (DataXYZ.dbo)"

Begin Extent =

Top = 6

Left = 38

Bottom = 136

Right = 247

End

DisplayFlags = 280

TopColumn = 0

End

Begin SQLPane =

End

Begin DataPane =

Begin ParameterDefaults = ""

End

Begin ColumnWidths = 9

Width = 284

Width = 1500

End

Begin CriteriaPane =

Begin ColumnWidths = 11

Column = 1440

Alias = 900

Table = 1170

Output = 720

Append = 1400

NewValue = 1170

SortType = 1350

SortOrder = 1410

GroupBy = 1350

Filter = 1350

Or = 1350

End

' , @level0type=N'SCHEMA',@level0name=N'dbo', @level1type=N'VIEW',@level1name=N'lv_DataXYZ_GeoLoc_Step1'

GO

EXEC sys.sp_addextendedproperty @name=N'MS_DiagramPaneCount', @value=1 , @level0type=N'SCHEMA',@level0name=N'dbo', @level1type=N'VIEW',@level1name=N'lv_DataXYZ_GeoLoc_Step1'

GO

Small table containing 3 lines (1/year)

USE [EntrepotXYZ]

GO

/****** Object: Table [dbo].[itb_annee] Script Date: 23/11/2016 14:55:40 ******/

SET ANSI_NULLS ON

GO

SET QUOTED_IDENTIFIER ON

GO

CREATE TABLE [dbo].[itb_annee](

[IdExercice] [int] IDENTITY(1,1) NOT NULL,

[Annee] [int] NOT NULL,

[Cloture/Ouvert] [int] NOT NULL,

[NonGeoLoc] [int] NOT NULL,

[GeoLoc] [int] NOT NULL

) ON [PRIMARY]

GO

View Step2 DDL:

USE [EntrepotXYZ]

GO

/****** Object: View [dbo].[lv_DataXYZ_GeoLoc_Step2] Script Date: 23/11/2016 14:48:05 ******/

SET ANSI_NULLS ON

GO

SET QUOTED_IDENTIFIER ON

GO

CREATE VIEW [dbo].[lv_DataXYZ_GeoLoc_Step2]

AS

SELECT dbo.lv_DataXYZ_GeoLoc_Step1.intra_annee, dbo.lv_DataXYZ_GeoLoc_Step1.intra_siren, dbo.lv_DataXYZ_GeoLoc_Step1.intra_cle_id, dbo.lv_DataXYZ_GeoLoc_Step1.intra_iris,

dbo.lv_DataXYZ_GeoLoc_Step1.intra_dep, dbo.lv_DataXYZ_GeoLoc_Step1.ident_org, dbo.lv_DataXYZ_GeoLoc_Step1.siret, dbo.lv_DataXYZ_GeoLoc_Step1.rs, dbo.lv_DataXYZ_GeoLoc_Step1.cat_org,

dbo.lv_DataXYZ_GeoLoc_Step1.typeconst_red, dbo.lv_DataXYZ_GeoLoc_Step1.nbpiece_red, dbo.lv_DataXYZ_GeoLoc_Step1.construct_red, dbo.lv_DataXYZ_GeoLoc_Step1.conv_red,

dbo.lv_DataXYZ_GeoLoc_Step1.numconv_red, CAST('' AS xml ).value('sql:column("datconv_red") cast as xs:date ?', 'date') AS datconv_red, dbo.lv_DataXYZ_GeoLoc_Step1.rsexpro_red,

dbo.lv_DataXYZ_GeoLoc_Step1.siretexpro_red, dbo.lv_DataXYZ_GeoLoc_Step1.finan_red, dbo.lv_DataXYZ_GeoLoc_Step1.finanautre_red, dbo.lv_DataXYZ_GeoLoc_Step1.cus_red,

dbo.lv_DataXYZ_GeoLoc_Step1.identges_red, dbo.lv_DataXYZ_GeoLoc_Step1.ident_int, dbo.lv_DataXYZ_GeoLoc_Step1.ident_rep, dbo.lv_DataXYZ_GeoLoc_Step1.sru_expir_red,

dbo.lv_DataXYZ_GeoLoc_Step1.sru_alinea_red, dbo.lv_DataXYZ_GeoLoc_Step1.reg, dbo.lv_DataXYZ_GeoLoc_Step1.lib_reg, dbo.lv_DataXYZ_GeoLoc_Step1.dep, dbo.lv_DataXYZ_GeoLoc_Step1.lib_dep,

dbo.lv_DataXYZ_GeoLoc_Step1.epci, dbo.lv_DataXYZ_GeoLoc_Step1.libepci, dbo.lv_DataXYZ_GeoLoc_Step1.depcom_red, dbo.lv_DataXYZ_GeoLoc_Step1.lib_depcom,

dbo.lv_DataXYZ_GeoLoc_Step1.codepostal_red, dbo.lv_DataXYZ_GeoLoc_Step1.numvoie_red, dbo.lv_DataXYZ_GeoLoc_Step1.indrep_red, dbo.lv_DataXYZ_GeoLoc_Step1.typvoie_red,

dbo.lv_DataXYZ_GeoLoc_Step1.nomvoie_red, dbo.lv_DataXYZ_GeoLoc_Step1.numappt_red, dbo.lv_DataXYZ_GeoLoc_Step1.numboite_red, dbo.lv_DataXYZ_GeoLoc_Step1.esc_red,

dbo.lv_DataXYZ_GeoLoc_Step1.couloir_red, dbo.lv_DataXYZ_GeoLoc_Step1.etage_red, dbo.lv_DataXYZ_GeoLoc_Step1.complident_red, dbo.lv_DataXYZ_GeoLoc_Step1.entree_red,

dbo.lv_DataXYZ_GeoLoc_Step1.bat_red, dbo.lv_DataXYZ_GeoLoc_Step1.immeu_red, dbo.lv_DataXYZ_GeoLoc_Step1.complgeo_red, dbo.lv_DataXYZ_GeoLoc_Step1.lieudit_red,

dbo.lv_DataXYZ_GeoLoc_Step1.zus_red, dbo.lv_DataXYZ_GeoLoc_Step1.loyerprinc_red, dbo.lv_DataXYZ_GeoLoc_Step1.loyeracc_red, dbo.lv_DataXYZ_GeoLoc_Step1.contrib_red,

dbo.lv_DataXYZ_GeoLoc_Step1.loymoy, dbo.lv_DataXYZ_GeoLoc_Step1.loyermaxapl_red, dbo.lv_DataXYZ_GeoLoc_Step1.loyermaxcus_red, dbo.lv_DataXYZ_GeoLoc_Step1.locat_red,

dbo.lv_DataXYZ_GeoLoc_Step1.mode_red, dbo.lv_DataXYZ_GeoLoc_Step1.bail_red, dbo.lv_DataXYZ_GeoLoc_Step1.remlocdate_red, dbo.lv_DataXYZ_GeoLoc_Step1.contreslog_red,

dbo.lv_DataXYZ_GeoLoc_Step1.duree_vacance, dbo.lv_DataXYZ_GeoLoc_Step1.codsegpatrim_red, dbo.lv_DataXYZ_GeoLoc_Step1.libsegpatrim_red, dbo.lv_DataXYZ_GeoLoc_Step1.droit_red,

dbo.lv_DataXYZ_GeoLoc_Step1.patrimoine_red, dbo.lv_DataXYZ_GeoLoc_Step1.origine_red, dbo.lv_DataXYZ_GeoLoc_Step1.sortiepatrim_red, dbo.lv_DataXYZ_GeoLoc_Step1.newlogt_red,

dbo.lv_DataXYZ_GeoLoc_Step1.oldlogt_red, dbo.lv_DataXYZ_GeoLoc_Step1.dpedate_red, dbo.lv_DataXYZ_GeoLoc_Step1.dpeenergie_red, dbo.lv_DataXYZ_GeoLoc_Step1.dpeserre_red,

dbo.lv_DataXYZ_GeoLoc_Step1.surfhab_red, dbo.lv_DataXYZ_GeoLoc_Step1.modesurf_red, dbo.lv_DataXYZ_GeoLoc_Step1.surfmode_red, dbo.lv_DataXYZ_GeoLoc_Step1.qualacq_red,

dbo.lv_DataXYZ_GeoLoc_Step1.miscommercial_red, dbo.lv_DataXYZ_GeoLoc_Step1.prixvente_red, dbo.lv_DataXYZ_GeoLoc_Step1.prodfin_red, dbo.lv_DataXYZ_GeoLoc_Step1.mes_sanscumul,

dbo.lv_DataXYZ_GeoLoc_Step1.x, dbo.lv_DataXYZ_GeoLoc_Step1.y, dbo.lv_DataXYZ_GeoLoc_Step1.x_l2e, dbo.lv_DataXYZ_GeoLoc_Step1.y_l2e, dbo.lv_DataXYZ_GeoLoc_Step1.zus,

dbo.lv_DataXYZ_GeoLoc_Step1.zfu, dbo.lv_DataXYZ_GeoLoc_Step1.iris, dbo.lv_DataXYZ_GeoLoc_Step1.nqp, dbo.lv_DataXYZ_GeoLoc_Step1.qp, dbo.lv_DataXYZ_GeoLoc_Step1.ril,

dbo.lv_DataXYZ_GeoLoc_Step1.comazus, dbo.lv_DataXYZ_GeoLoc_Step1.comazfu, dbo.lv_DataXYZ_GeoLoc_Step1.comanqp, dbo.lv_DataXYZ_GeoLoc_Step1.comaqp, 1 AS intra_nb_log

FROM dbo.lv_DataXYZ_GeoLoc_Step1 INNER JOIN

dbo.itb_annee ON dbo.lv_DataXYZ_GeoLoc_Step1.intra_annee = dbo.itb_annee.Annee AND dbo.itb_annee.[Cloture/Ouvert] = 1

GO

EXEC sys.sp_addextendedproperty @name=N'MS_DiagramPane1', @value=N'[0E232FF0-B466-11cf-A24F-00AA00A3EFFF, 1.00]

Begin DesignProperties =

Begin PaneConfigurations =

Begin PaneConfiguration = 0

NumPanes = 4

Configuration = "(H (1[40] 4[20] 2[20] 3) )"

End

Begin PaneConfiguration = 1

NumPanes = 3

Configuration = "(H (1 [50] 4 [25] 3))"

End

Begin PaneConfiguration = 2

NumPanes = 3

Configuration = "(H (1 [50] 2 [25] 3))"

End

Begin PaneConfiguration = 3

NumPanes = 3

Configuration = "(H (4 [30] 2 [40] 3))"

End

Begin PaneConfiguration = 4

NumPanes = 2

Configuration = "(H (1 [56] 3))"

End

Begin PaneConfiguration = 5

NumPanes = 2

Configuration = "(H (2 [66] 3))"

End

Begin PaneConfiguration = 6

NumPanes = 2

Configuration = "(H (4 [50] 3))"

End

Begin PaneConfiguration = 7

NumPanes = 1

Configuration = "(V (3))"

End

Begin PaneConfiguration = 8

NumPanes = 3

Configuration = "(H (1[56] 4[18] 2) )"

End

Begin PaneConfiguration = 9

NumPanes = 2

Configuration = "(H (1 [75] 4))"

End

Begin PaneConfiguration = 10

NumPanes = 2

Configuration = "(H (1[66] 2) )"

End

Begin PaneConfiguration = 11

NumPanes = 2

Configuration = "(H (4 [60] 2))"

End

Begin PaneConfiguration = 12

NumPanes = 1

Configuration = "(H (1) )"

End

Begin PaneConfiguration = 13

NumPanes = 1

Configuration = "(V (4))"

End

Begin PaneConfiguration = 14

NumPanes = 1

Configuration = "(V (2))"

End

ActivePaneConfig = 0

End

Begin DiagramPane =

Begin Origin =

Top = 0

Left = 0

End

Begin Tables =

Begin Table = "lv_DataXYZ_GeoLoc_Step1"

Begin Extent =

Top = 6

Left = 38

Bottom = 136

Right = 247

End

DisplayFlags = 280

TopColumn = 0

End

Begin Table = "itb_annee"

Begin Extent =

Top = 138

Left = 38

Bottom = 268

Right = 247

End

DisplayFlags = 280

TopColumn = 0

End

Begin SQLPane =

End

Begin DataPane =

Begin ParameterDefaults = ""

End

Begin ColumnWidths = 9

Width = 284

Width = 1500

End

Begin CriteriaPane =

Begin ColumnWidths = 11

Column = 1440

Alias = 900

Table = 1170

Output = 720

Append = 1400

NewValue = 1170

SortType = 1350

SortOrder = 1410

GroupBy = 1350

Filter = 1350

Or = 1350

End

' , @level0type=N'SCHEMA',@level0name=N'dbo', @level1type=N'VIEW',@level1name=N'lv_DataXYZ_GeoLoc_Step2'

GO

EXEC sys.sp_addextendedproperty @name=N'MS_DiagramPaneCount', @value=1 , @level0type=N'SCHEMA',@level0name=N'dbo', @level1type=N'VIEW',@level1name=N'lv_DataXYZ_GeoLoc_Step2'

GO

Target table loaded from View_Step2

USE [EntrepotXYZ]

GO

/****** Object: Table [dbo].[dtbData_XYZ_GeoLoc] Script Date: 23/11/2016 15:09:17 ******/

SET ANSI_NULLS ON

GO

SET QUOTED_IDENTIFIER ON

GO

SET ANSI_PADDING ON

GO

CREATE TABLE [dbo].[dtbData_XYZ_GeoLoc](

[intra_annee] [smallint] NULL,

[intra_siren] [char](9) NULL,

[intra_cle_id] [varchar](46) NULL,

[intra_iris] [char](9) NULL,

[intra_dep] [varchar](3) NULL,

[ident_org] [char](9) NULL,

[siret] [char](14) NULL,

[rs] [varchar](128) NULL,

[cat_org] [varchar](3) NULL,

[typeconst_red] [varchar](2) NULL,

[nbpiece_red] [varchar](2) NULL,

[construct_red] [smallint] NULL,

[conv_red] [char](1) NULL,

[numconv_red] [varchar](64) NULL,

[datconv_red] [date] NULL,

[rsexpro_red] [varchar](128) NULL,

[siretexpro_red] [char](14) NULL,

[finan_red] [varchar](2) NULL,

[finanautre_red] [varchar](128) NULL,

[cus_red] [varchar](2) NULL,

[identges_red] [varchar](24) NULL,

[ident_int] [varchar](32) NULL,

[ident_rep] [char](10) NULL,

[sru_expir_red] [smallint] NULL,

[sru_alinea_red] [varchar](2) NULL,

[reg] [varchar](2) NULL,

[lib_reg] [varchar](70) NULL,

[dep] [varchar](3) NULL,

[lib_dep] [varchar](70) NULL,

[epci] [varchar](12) NULL,

[libepci] [varchar](128) NULL,

[depcom_red] [char](5) NULL,

[lib_depcom] [varchar](64) NULL,

[codepostal_red] [char](5) NULL,

[numvoie_red] [varchar](10) NULL,

[indrep_red] [varchar](10) NULL,

[typvoie_red] [varchar](20) NULL,

[nomvoie_red] [varchar](64) NULL,

[numappt_red] [varchar](10) NULL,

[numboite_red] [varchar](10) NULL,

[esc_red] [varchar](10) NULL,

[couloir_red] [varchar](10) NULL,

[etage_red] [varchar](10) NULL,

[complident_red] [varchar](128) NULL,

[entree_red] [varchar](10) NULL,

[bat_red] [varchar](64) NULL,

[immeu_red] [varchar](64) NULL,

[complgeo_red] [varchar](128) NULL,

[lieudit_red] [varchar](64) NULL,

[zus_red] [char](1) NULL,

[loyerprinc_red] [smallint] NULL,

[loyeracc_red] [smallint] NULL,

[contrib_red] [smallint] NULL,

[loymoy] [decimal](7, 3) NULL,

[loyermaxapl_red] [smallint] NULL,

[loyermaxcus_red] [smallint] NULL,

[locat_red] [smallint] NULL,

[mode_red] [varchar](2) NULL,

[bail_red] [varchar](7) NULL,

[remlocdate_red] [varchar](7) NULL,

[contreslog_red] [varchar](2) NULL,

[duree_vacance] [smallint] NULL,

[codsegpatrim_red] [varchar](32) NULL,

[libsegpatrim_red] [varchar](64) NULL,

[droit_red] [varchar](2) NULL,

[patrimoine_red] [smallint] NULL,

[origine_red] [varchar](2) NULL,

[sortiepatrim_red] [varchar](2) NULL,

[newlogt_red] [char](1) NULL,

[oldlogt_red] [char](1) NULL,

[dpedate_red] [varchar](7) NULL,

[dpeenergie_red] [varchar](2) NULL,

[dpeserre_red] [varchar](2) NULL,

[surfhab_red] [smallint] NULL,

[modesurf_red] [varchar](2) NULL,

[surfmode_red] [smallint] NULL,

[qualacq_red] [varchar](2) NULL,

[miscommercial_red] [char](1) NULL,

[prixvente_red] [int] NULL,

[prodfin_red] [int] NULL,

[mes_sanscumul] [bit] NULL,

[x] [decimal](24, 12) NULL,

[y] [decimal](24, 12) NULL,

[x_l2e] [decimal](24, 12) NULL,

[y_l2e] [decimal](24, 12) NULL,

[zus] [varchar](7) NULL,

[zfu] [varchar](7) NULL,

[iris] [char](4) NULL,

[nqp] [varchar](7) NULL,

[qp] [varchar](8) NULL,

[ril] [bit] NULL,

[comazus] [bit] NULL,

[comazfu] [bit] NULL,

[comanqp] [bit] NULL,

[comaqp] [bit] NULL,

[intra_nb_log] [int] NULL

) ON [PRIMARY]

GO

SET ANSI_PADDING OFF

GO

Hugo Kornelis SSC Guru Points: 64790 More actions · Answer 10

polytropic2310 (11/23/2016)
I'm sure at 99.99% that the content of tables/views is the same.

Well, I can tell with 100% certainty that there are at least two differences:

1. The statistics on the data in UAT are different from those on PROD, causing the optimizer to believe that all rows will match on UAT, and that only 50% will match on PROD.

Can you check when the statistics on the tables were last updated, and what sampling rate was used?

2. The order in which the data is processed (which, for a table scan, aligns with the physical allocation order, that under some circumstances might more or less aligh with the insertion order) is such that in PROD a huge amount of non-matching rows is processed before the 1000th match is found, whereas in UAT the first 1000 rows processed all turn out to be matches.

(And in the context of your query, "match" means that the "intra_annee" columns matches the "Annee" column of the "itb_annee" row with Cloture/Ouvert] = 1)

Note that both the above differences are indeed possible while still having the exact same data in both tables. Not very likely, but possible.

BTW, have you compared both tables? With this performance difference I understand if you focus on the big table, but the actual root cause might be the small 3-row table. What happens if you do SELECT * FROM dbo.itb_annee in both environments?

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

polytropic2310 Ten Centuries Points: 1004 More actions · Answer 11

Hello Hugo,

Hugo Kornelis (11/23/2016)
Well, I can tell with 100% certainty that there are at least two differences:
1. The statistics on the data in UAT are different from those on PROD, causing the optimizer to believe that all rows will match on UAT, and that only 50% will match on PROD.
Can you check when the statistics on the tables were last updated, and what sampling rate was used?

I've regenerated statistics 2 days ago on both VM but I'll check again tomorrow (it's late in Europe).

I'll check point 2. tomorrow too.

I go to sleep...

Steph.

Eirikur Eiriksson SSC Guru Points: 182918 More actions · Answer 12

Can you post the DDL for the table DataXYZ.dbo.DataXYZ_GeoLoc_Input?

😎

The main problem is the data conversion method used in the views.

, CAST('' AS xml ).value('sql:column("x") cast as xs:decimal ?', 'decimal(24,12)') AS x

, CAST('' AS xml ).value('sql:column("y") cast as xs:decimal ?', 'decimal(24,12)') AS y

, CAST('' AS xml ).value('sql:column("x_l2e") cast as xs:decimal ?', 'decimal(24,12)') AS x_l2e

, CAST('' AS xml ).value('sql:column("y_l2e") cast as xs:decimal ?', 'decimal(24,12)') AS y_l2e

is much much slower and resource intensive than

, CONVERT(decimal(24,12),x,0) AS x

, CONVERT(decimal(24,12),y,0) AS y

, CONVERT(decimal(24,12),x_l2e,0) AS x_l2e

, CONVERT(decimal(24,12),y_l2e,0) AS y_l2e

A simple comparison test from a Numbers table

Table 'TBL_NUMBERS'. Scan count 1, logical reads 1245, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 188 ms, elapsed time = 196 ms.

--------------------------------------------------------------------

Table 'TBL_NUMBERS'. Scan count 5, logical reads 1255, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 14655 ms, elapsed time = 6603 ms.

Hugo Kornelis SSC Guru Points: 64790 More actions · Answer 13

polytropic2310 (11/23/2016)
Hello Hugo,
Hugo Kornelis (11/23/2016)
Well, I can tell with 100% certainty that there are at least two differences:
1. The statistics on the data in UAT are different from those on PROD, causing the optimizer to believe that all rows will match on UAT, and that only 50% will match on PROD.
Can you check when the statistics on the tables were last updated, and what sampling rate was used?
I've regenerated statistics 2 days ago on both VM but I'll check again tomorrow (it's late in Europe).
I'll check point 2. tomorrow too.
I go to sleep...
Steph.

Hi Steph,

When checking point 2, one other thing you might want to do is to run SELECT TOP(10000) * FROM Dataxyz_GeoLoc_Input; this should almost always result in a plan that shows you the first 10000 rows SQL Server will encounter during an IAM scan, the scan strategy normally used for a Table Scan (if the table has no clustered index).

If the data between the two databases is the same, then I expect that the SELECT TOP(10000) * will reveal that the underlying physical storage order of the data is different. And that because of whatever cause, the PROD data is stored in such an order that all non-qualifying rows come first.

@Eirikur: Good catch! I had glanced over the query sufficiently to see that there is some XML stuff going on, but had not looked at the details. But you are right, this does appear to be a needlessly complex and slow way to do a simple data conversion. I expect that changing this will alleviate most of the problems.

@steph: Even if Eirikurs suggestion does fix all issues, I still think that you should start looking at indexes. Heap tables are almost never the right choice.

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

Eirikur Eiriksson SSC Guru Points: 182918 More actions · Answer 14

Hugo Kornelis (11/24/2016)
polytropic2310 (11/23/2016)
Hello Hugo,
Hugo Kornelis (11/23/2016)
Well, I can tell with 100% certainty that there are at least two differences:
1. The statistics on the data in UAT are different from those on PROD, causing the optimizer to believe that all rows will match on UAT, and that only 50% will match on PROD.
Can you check when the statistics on the tables were last updated, and what sampling rate was used?
I've regenerated statistics 2 days ago on both VM but I'll check again tomorrow (it's late in Europe).
I'll check point 2. tomorrow too.
I go to sleep...
Steph.
Hi Steph,
When checking point 2, one other thing you might want to do is to run SELECT TOP(10000) * FROM Dataxyz_GeoLoc_Input; this should almost always result in a plan that shows you the first 10000 rows SQL Server will encounter during an IAM scan, the scan strategy normally used for a Table Scan (if the table has no clustered index).
If the data between the two databases is the same, then I expect that the SELECT TOP(10000) * will reveal that the underlying physical storage order of the data is different. And that because of whatever cause, the PROD data is stored in such an order that all non-qualifying rows come first.
@Eirikur: Good catch! I had glanced over the query sufficiently to see that there is some XML stuff going on, but had not looked at the details. But you are right, this does appear to be a needlessly complex and slow way to do a simple data conversion. I expect that changing this will alleviate most of the problems.
@steph: Even if Eirikurs suggestion does fix all issues, I still think that you should start looking at indexes. Heap tables are almost never the right choice.

The interesting thing is that there is a difference in the internal length check of the initial xml value length check which seems to throw the optimiser way off in production, something to dig into

😎

Another thing is the cardinality estimation, not often one sees 1:1000000 difference :pinch: