July 7, 2011 at 10:16 am
Using SQL Server 2008 Express R2.
I have a rather annoying view that is central to my app. It's not as slow as I might expect, given how awful it is, but it's used constantly and is slow enough that I would like to do something with it, if possible. Sometimes it's actually pretty quick, sometimes it can take over a minute, even when the number of selected records is small and even when I am logged onto the server via remote desktop and execute the query directly with SSMSE, so it's not just network traffic or ODBC issues.
The main table has foreign keys into 20 auxiliary tables, 19 of which are not required. The 19 are simple lookup tables - autonumber primary key and associated text field; the required one has a few more fields and functions in the workplace as sort of a supergroup, but for the DB it's just another related table. I need to gather data from ALL 20 of the aux tables plus several text fields from the main table into a view, which the users then want to scroll around in. I know it's a bad idea, but have had no luck weaning anyone away from it - they want to scroll up and down in the result set and that's that. It fits the way they work, they're used to it and they like it. It's always a read-only view, so at least I'm not blocking anyone with it - updates are done by selecting a record from the view and opening another form for editing, which grabs only the one selected record. I select anywhere from a few to around 20,000 records initially (from a table of 80,000), based on the single field EvidenceLetter, keyed to a user name, which 'owns' a particular letter or letters. Users then either scroll or do further filters and searches. I've looked at all sorts of tips for speeding up queries, but none seem applicable here. I can't create an indexed view, because I use outer joins. I canβt do much initial filtering, because I don't know in advance what they might want. I can't leave out any of the aux tables, all of them may have some pertinent info. Some are more likely to contain info more than others, but all may have something, and users can filter or search by the looked-up values from the aux tables.
The inline functions are simply concatenations of fields, with a bit of formatting (parentheses, commas, that kind of stuff).
Are there any tricks I can play with the indexes in the aux tables to help? The primary keys are clustered, so I can't include columns. Would it speed things to have the primary key consist of both the ID and the text, so the query would get both fields with one seek? Would it make any sense to not have a primary key, but have a non-clustered, unique index on the key field, with the matching text value as an included column? Or does the primary key automatically do something like that already? Or should I leave the primary key and add another index of both ID and text? The aux tables are fairly small (several dozen to several thousand records) and not often updated, so any indexes that would improve read performance would be good and not that much of a drag for updates. Would indexes on the text fields of the aux tables help anything? The joins seem like the major slowdown, and they are done on the ID, not the text, although filters and searches address the looked-up text fields.
Can anyone please suggest things I could try, with a brief comment how and why they would help?
Pete
Here's the troublesome view, then table, view and function definitions. The execution plan for the problem view is included as a zip file, per Gail's instructions from the "How to Post Performance Problems" article - hope I got it right.
SELECTdbo.fnsFormatAkcesit(dbo.Akces.AkcesitPred
, dbo.Akces.Akcesit
, dbo.Akces.Rok) AS Akc
, dbo.fnsFormatDrEvid(dbo.Podrobnosti.EvidenceLetter
, dbo.Podrobnosti.EvidenceNumber
, dbo.Podrobnosti.EvidenceExtra) AS DrEvid
, dbo.fnsTaxonomie(dbo.TableOfGenuss.Genus
, dbo.TableOfSubGenuss.SubGenus
, dbo.TableOfSpeciess.Species
, dbo.TableOfSubSpeciess.SubSpecies
, dbo.Podrobnosti.Authors
, dbo.TableOfFamilys.Family
, dbo.TableOfOrders.[Order]
, dbo.TableOfGroups.[Group]) AS Taxonomie
, dbo.fnsStratigrafie(dbo.TableOfSystems.System
, dbo.TableOfSeriess.Series
, dbo.TableOfStages.Stage
, dbo.TableOfSubStages.SubStage
, dbo.TableOfLithographicUnits.LithographicUnit
,dbo.TableOfLithographicSubUnits.LithographicSubUnit
, dbo.TableOfZones.Zone) AS Stratigrafie
, dbo.TableOfCountrys.Cesky + N' - ' + dbo.TableOfCountrys.Country AS Kountry
, CASE WHEN Depozitar IS NULL THEN ''ELSE Depozitar END
+
CASE WHEN UlozisteDocasne IS NULL THEN '' ELSE ' {' + UlozisteDocasne + '}' END AS Lokace
, dbo.Podrobnosti.PodrobnostiAutoID, dbo.Podrobnosti.AkcesAutoID, dbo.Podrobnosti.EvidenceGroup, dbo.Podrobnosti.GroupAutoID
, dbo.Podrobnosti.OrderAutoID, dbo.Podrobnosti.FamilyAutoID, dbo.Podrobnosti.GenusAutoID, dbo.Podrobnosti.SubGenusAutoID
, dbo.Podrobnosti.SpeciesAutoID, dbo.Podrobnosti.SubSpeciesAutoID, dbo.Podrobnosti.SystemAutoID, dbo.Podrobnosti.SeriesAutoID
, dbo.Podrobnosti.StageAutoID, dbo.Podrobnosti.SubStageAutoID, dbo.Podrobnosti.LithographicUnitAutoID, dbo.Podrobnosti.LithographicSubUnitAutoID
, dbo.Podrobnosti.ZoneAutoID, dbo.Podrobnosti.CountryAutoID, dbo.Podrobnosti.DepozitarAutoID, dbo.Podrobnosti.UlozisteDocasne
, dbo.Podrobnosti.PDAutoID, dbo.Podrobnosti.ODAutoID, dbo.Podrobnosti.OriginAutoID, dbo.Podrobnosti.PocetKusu, dbo.Podrobnosti.OTHER_NO
, dbo.Podrobnosti.Inventarizace, dbo.Podrobnosti.Poznamka, dbo.Podrobnosti.Description, dbo.Podrobnosti.RockType, dbo.Podrobnosti.Preserv
, dbo.Podrobnosti.Original, dbo.Podrobnosti.Authors, dbo.Podrobnosti.Lokalita, dbo.Podrobnosti.IDNeDruheEvidence, dbo.Podrobnosti.EvidenceLetter
, dbo.Podrobnosti.EvidenceNumber, dbo.Podrobnosti.EvidenceExtra, dbo.Akces.Rok, dbo.Akces.AkcesitPred, dbo.Akces.Akcesit
FROMdbo.Podrobnosti INNER JOIN
dbo.Akces ON dbo.Podrobnosti.AkcesAutoID = dbo.Akces.AkcesAutoID LEFT OUTER JOIN
dbo.TableOfSystems ON dbo.Podrobnosti.SystemAutoID = dbo.TableOfSystems.SystemAutoID LEFT OUTER JOIN
dbo.TableOfDepozitars ON dbo.Podrobnosti.DepozitarAutoID = dbo.TableOfDepozitars.DepozitarAutoID LEFT OUTER JOIN
dbo.TableOfLithographicSubUnits ON
dbo.Podrobnosti.LithographicSubUnitAutoID = dbo.TableOfLithographicSubUnits.LithographicSubUnitAutoID LEFT OUTER JOIN
dbo.TableOfCountrys ON dbo.Podrobnosti.CountryAutoID = dbo.TableOfCountrys.CountryAutoID LEFT OUTER JOIN
dbo.TableOfLithographicUnits ON dbo.Podrobnosti.LithographicUnitAutoID = dbo.TableOfLithographicUnits.LithographicUnitAutoID LEFT OUTER JOIN
dbo.TableOfSeriess ON dbo.Podrobnosti.SeriesAutoID = dbo.TableOfSeriess.SeriesAutoID LEFT OUTER JOIN
dbo.TableOfStages ON dbo.Podrobnosti.StageAutoID = dbo.TableOfStages.StageAutoID LEFT OUTER JOIN
dbo.TableOfZones ON dbo.Podrobnosti.ZoneAutoID = dbo.TableOfZones.ZoneAutoID LEFT OUTER JOIN
dbo.TableOfSubStages ON dbo.Podrobnosti.SubStageAutoID = dbo.TableOfSubStages.SubStageAutoID LEFT OUTER JOIN
dbo.TableOfSubGenuss ON dbo.Podrobnosti.SubGenusAutoID = dbo.TableOfSubGenuss.SubGenusAutoID LEFT OUTER JOIN
dbo.TableOfSubSpeciess ON dbo.Podrobnosti.SubSpeciesAutoID = dbo.TableOfSubSpeciess.SubSpeciesAutoID LEFT OUTER JOIN
dbo.TableOfSpeciess ON dbo.Podrobnosti.SpeciesAutoID = dbo.TableOfSpeciess.SpeciesAutoID LEFT OUTER JOIN
dbo.TableOfOrders ON dbo.Podrobnosti.OrderAutoID = dbo.TableOfOrders.OrderAutoID LEFT OUTER JOIN
dbo.TableOfGenuss ON dbo.Podrobnosti.GenusAutoID = dbo.TableOfGenuss.GenusAutoID LEFT OUTER JOIN
dbo.TableOfGroups ON dbo.Podrobnosti.GroupAutoID = dbo.TableOfGroups.GroupAutoID LEFT OUTER JOIN
dbo.TableOfFamilys ON dbo.Podrobnosti.FamilyAutoID = dbo.TableOfFamilys.FamilyAutoID
USE [PaleoDataProPokusy]
GO
/****** Object: UserDefinedFunction [dbo].[fnsTaxonomie] Script Date: 07/07/2011 17:03:17 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE FUNCTION [dbo].[fnsTaxonomie] (@genus varchar(max), @subgenus varchar(max), @species varchar(max), @subspecies varchar(max), @authors varchar(max), @family varchar(max), @order varchar(max), @group varchar(max))
RETURNS Varchar(max)
WITH SCHEMABINDING
AS
BEGIN
-- Declare the return variable here
DECLARE @TxTmp Varchar(max)
-- Add the T-SQL statements to compute the return value here
SET @TxTmp = ltrim(rtrim(IsNull(@genus,'') + ' ' + IsNull(@subgenus + ' ','') + IsNull(@species,'')))
If @TxTmp <> '' SET @TxTmp = '<i>' + @TxTmp + '</i>'
SET @TxTmp = ltrim(rtrim(@TxTmp + ' ' + IsNull(@subspecies,'')))
If @authors <> ''
BEGIN If @TxTmp <> '' SET @TxTmp = @TxTmp + ' - ' + @authors ELSE SET @TxTmp = '{No taxon} - ' + @authors END
If @family <> ''
BEGIN If @TxTmp = '' SET @TxTmp = @family Else SET @TxTmp = @TxTmp + '; ' + @family END
If @order <> ''
BEGIN If @TxTmp = '' SET @TxTmp = @order Else SET @TxTmp = @TxTmp + '; ' + @order END
If @group <> ''
BEGIN If @TxTmp = '' SET @TxTmp = @group Else SET @TxTmp = @TxTmp + '; ' + @group END
Return @TxTmp
END
GO
/****** Object: UserDefinedFunction [dbo].[fnsStratigrafie] Script Date: 07/07/2011 17:03:17 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
-- =============================================
-- Author:<Author,,Name>
-- Create date: <Create Date, ,>
-- Description:<Description, ,>
-- =============================================
CREATE FUNCTION [dbo].[fnsStratigrafie](@system varchar(max), @series varchar(max), @stage varchar(max), @substage varchar(max), @LithographicUnit varchar(max), @LithographicSubUnit varchar(max), @zone varchar(max))
RETURNS varchar(max)
WITH SCHEMABINDING
AS
BEGIN
-- Declare the return variable here
DECLARE @StTmp Varchar(max)
If @system <> ''
SET @StTmp = @system
If @series <> ''
BEGIN If @StTmp <> '' SET @StTmp = @StTmp + '; ' + @series ELSE SET @StTmp = @series END
If @stage <> ''
BEGIN If @StTmp <> '' SET @StTmp = @StTmp + '; ' + @stage ELSE SET @StTmp = @stage END
If @substage <> ''
BEGIN If @StTmp <> '' SET @StTmp = @StTmp + '; ' + @substage ELSE SET @StTmp = @substage END
If @LithographicUnit <> ''
BEGIN If @StTmp <> '' SET @StTmp = @StTmp + '; ' + @LithographicUnit ELSE SET @StTmp = @LithographicUnit END
If @LithographicSubUnit <> ''
BEGIN If @StTmp <> '' SET @StTmp = @StTmp + '; ' + @LithographicSubUnit ELSE SET @StTmp = @LithographicSubUnit END
If @zone <> ''
BEGIN If @StTmp <> '' SET @StTmp = @StTmp + '; ' + @zone ELSE SET @StTmp = @zone END
Return @StTmp
END
GO
/****** Object: UserDefinedFunction [dbo].[fnsFormatDrEvid] Script Date: 07/07/2011 17:03:17 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE FUNCTION [dbo].[fnsFormatDrEvid] (@EL varchar(2), @EN int, @EE varchar(2))
RETURNS Varchar(15)
WITH SCHEMABINDING
AS
BEGIN
RETURN
(SELECT
CASE
WHEN (@EL IS NULL)
THEN null
ELSE @EL + ' ' + CONVERT(varchar(MAX), @EN) + CASE
WHEN (@EE <> '')
THEN ' (' + @EE + ')'
ELSE ''
END
END)
END
GO
/****** Object: UserDefinedFunction [dbo].[fnsFormatAkcesit] Script Date: 07/07/2011 17:03:17 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
-- =============================================
-- Author:<Author,,Name>
-- Create date: <Create Date, ,>
-- Description:<Description, ,>
-- =============================================
CREATE FUNCTION [dbo].[fnsFormatAkcesit]
(
-- Add the parameters for the function here
@AkcesPred as varchar(2), @Akces as int, @Rok as int
)
RETURNS varchar(max)
WITH SCHEMABINDING
AS
BEGIN
-- Declare the return variable here
-- Add the T-SQL statements to compute the return value here
-- Return the result of the function
RETURN (
SELECT IsNull(CASE WHEN (@AkcesPred = 'Br') THEN @AkcesPred + '-' WHEN (@AkcesPred = 'a') THEN @AkcesPred + '/' END, '') + CONVERT(varchar(MAX), @Akces)
+ '/' + CONVERT(varchar(MAX), @Rok) )
END
GO
/****** Object: Table [dbo].[EvidenceLetters] Script Date: 07/07/2011 17:03:15 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[EvidenceLetters](
[EvidenceLetter] [nvarchar](2) NOT NULL,
[Contents] [varchar](max) NOT NULL,
[ResponsiblePersons] [varchar](max) NOT NULL,
[Comments] [varchar](max) NULL,
[ts] [timestamp] NOT NULL,
CONSTRAINT [PK_EvidenceLetters] PRIMARY KEY CLUSTERED
(
[EvidenceLetter] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
/****** Object: Table [dbo].[Akces] Script Date: 07/07/2011 17:03:15 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[Akces](
[AkcesAutoID] [int] IDENTITY(1,1) NOT NULL,
[Rok] [smallint] NOT NULL,
[AkcesitPred] [varchar](5) NOT NULL,
[Akcesit] [int] NOT NULL,
[KusuVKatalogu] [int] NULL,
[Petrol] [bit] NULL,
[BezPoctuKusu] [bit] NULL,
[Jine] [bit] NULL,
[UzOdepsane] [bit] NULL,
[AcRozepsanoVDatabaziMK] [bit] NULL,
[DatumVytvoreni] [datetime] NULL,
[Poznamka] [varchar](max) NULL,
[ts] [timestamp] NOT NULL,
CONSTRAINT [PK_Akces] PRIMARY KEY NONCLUSTERED
(
[AkcesAutoID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY],
CONSTRAINT [idxAkcesCely] UNIQUE NONCLUSTERED
(
[Rok] ASC,
[AkcesitPred] ASC,
[Akcesit] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
/****** Object: Table [dbo].[TableOfZones] Script Date: 07/07/2011 17:03:15 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[TableOfZones](
[ZoneAutoID] [int] IDENTITY(1,1) NOT NULL,
[Zone] [nvarchar](255) NOT NULL,
[AltTexts] [varchar](max) NULL,
[ts] [timestamp] NOT NULL,
CONSTRAINT [PK_TableOfZones_1] PRIMARY KEY CLUSTERED
(
[ZoneAutoID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
/****** Object: Table [dbo].[TableOfSystems] Script Date: 07/07/2011 17:03:15 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[TableOfSystems](
[SystemAutoID] [int] IDENTITY(1,1) NOT NULL,
[System] [nvarchar](255) NOT NULL,
[AltTexts] [varchar](max) NULL,
[ts] [timestamp] NOT NULL,
CONSTRAINT [PK_TableOfSystems] PRIMARY KEY CLUSTERED
(
[SystemAutoID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
/****** Object: Table [dbo].[TableOfSubStages] Script Date: 07/07/2011 17:03:15 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[TableOfSubStages](
[SubStageAutoID] [int] IDENTITY(1,1) NOT NULL,
[SubStage] [nvarchar](255) NOT NULL,
[AltTexts] [varchar](max) NULL,
[ts] [timestamp] NOT NULL,
CONSTRAINT [PK_TableOfSubStages] PRIMARY KEY CLUSTERED
(
[SubStageAutoID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
/****** Object: Table [dbo].[TableOfSubSpeciess] Script Date: 07/07/2011 17:03:15 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[TableOfSubSpeciess](
[SubSpeciesAutoID] [int] IDENTITY(1,1) NOT NULL,
[SubSpecies] [nvarchar](255) NOT NULL,
[AltTexts] [varchar](max) NULL,
[ts] [timestamp] NOT NULL,
CONSTRAINT [PK_TableOfSubSpeciess] PRIMARY KEY CLUSTERED
(
[SubSpeciesAutoID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
/****** Object: Table [dbo].[TableOfSubGenuss] Script Date: 07/07/2011 17:03:15 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[TableOfSubGenuss](
[SubGenusAutoID] [int] IDENTITY(1,1) NOT NULL,
[SubGenus] [nvarchar](255) NOT NULL,
[AltTexts] [varchar](max) NULL,
[ts] [timestamp] NOT NULL,
CONSTRAINT [PK_TableOfSubGenuss] PRIMARY KEY CLUSTERED
(
[SubGenusAutoID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
/****** Object: Table [dbo].[TableOfStages] Script Date: 07/07/2011 17:03:15 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[TableOfStages](
[StageAutoID] [int] IDENTITY(1,1) NOT NULL,
[Stage] [nvarchar](255) NOT NULL,
[AltTexts] [varchar](max) NULL,
[ts] [timestamp] NOT NULL,
CONSTRAINT [PK_TableOfStages] PRIMARY KEY CLUSTERED
(
[StageAutoID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY],
CONSTRAINT [Stage_Idx] UNIQUE NONCLUSTERED
(
[Stage] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
/****** Object: Table [dbo].[TableOfSpeciess] Script Date: 07/07/2011 17:03:15 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[TableOfSpeciess](
[SpeciesAutoID] [int] IDENTITY(1,1) NOT NULL,
[Species] [nvarchar](255) NOT NULL,
[AltTexts] [varchar](max) NULL,
[ts] [timestamp] NOT NULL,
CONSTRAINT [PK_TableOfSpeciess] PRIMARY KEY CLUSTERED
(
[SpeciesAutoID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY],
CONSTRAINT [Species_Idx] UNIQUE NONCLUSTERED
(
[Species] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
/****** Object: Table [dbo].[TableOfSeriess] Script Date: 07/07/2011 17:03:15 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[TableOfSeriess](
[SeriesAutoID] [int] IDENTITY(1,1) NOT NULL,
[Series] [nvarchar](255) NOT NULL,
[AltTexts] [varchar](max) NULL,
[ts] [timestamp] NOT NULL,
CONSTRAINT [PK_TableOfSeriess] PRIMARY KEY CLUSTERED
(
[SeriesAutoID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
/****** Object: Table [dbo].[TableOfPDs] Script Date: 07/07/2011 17:03:15 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[TableOfPDs](
[PDAutoID] [int] IDENTITY(1,1) NOT NULL,
[PD] [nvarchar](255) NOT NULL,
[AltTexts] [varchar](max) NULL,
[ts] [timestamp] NOT NULL,
CONSTRAINT [PK_TableOfPDs] PRIMARY KEY CLUSTERED
(
[PDAutoID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
/****** Object: Table [dbo].[TableOfOrigins] Script Date: 07/07/2011 17:03:15 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[TableOfOrigins](
[OriginAutoID] [int] IDENTITY(1,1) NOT NULL,
[Origin] [nvarchar](255) NOT NULL,
[AltTexts] [varchar](max) NULL,
[ts] [timestamp] NOT NULL,
CONSTRAINT [PK_TableOfOrigins] PRIMARY KEY CLUSTERED
(
[OriginAutoID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
/****** Object: Table [dbo].[TableOfOrders] Script Date: 07/07/2011 17:03:15 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[TableOfOrders](
[OrderAutoID] [int] IDENTITY(1,1) NOT NULL,
[Order] [nvarchar](255) NOT NULL,
[AltTexts] [varchar](max) NULL,
[ts] [timestamp] NOT NULL,
CONSTRAINT [PK_TableOfOrders] PRIMARY KEY CLUSTERED
(
[OrderAutoID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
/****** Object: Table [dbo].[TableOfODs] Script Date: 07/07/2011 17:03:15 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[TableOfODs](
[ODAutoID] [int] IDENTITY(1,1) NOT NULL,
[OD] [nvarchar](255) NOT NULL,
[AltTexts] [varchar](max) NULL,
[ts] [timestamp] NOT NULL,
CONSTRAINT [PK_TableOfODs] PRIMARY KEY CLUSTERED
(
[ODAutoID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
/****** Object: Table [dbo].[TableOfLithographicUnits] Script Date: 07/07/2011 17:03:15 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[TableOfLithographicUnits](
[LithographicUnitAutoID] [int] IDENTITY(1,1) NOT NULL,
[LithographicUnit] [nvarchar](255) NOT NULL,
[AltTexts] [varchar](max) NULL,
[ts] [timestamp] NOT NULL,
CONSTRAINT [PK_TableOfLithographicUnits] PRIMARY KEY CLUSTERED
(
[LithographicUnitAutoID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
/****** Object: Table [dbo].[TableOfLithographicSubUnits] Script Date: 07/07/2011 17:03:15 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[TableOfLithographicSubUnits](
[LithographicSubUnitAutoID] [int] IDENTITY(1,1) NOT NULL,
[LithographicSubUnit] [nvarchar](255) NOT NULL,
[AltTexts] [varchar](max) NULL,
[ts] [timestamp] NOT NULL,
CONSTRAINT [PK_TableOfLithographicSubUnits] PRIMARY KEY CLUSTERED
(
[LithographicSubUnitAutoID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
/****** Object: Table [dbo].[TableOfGroups] Script Date: 07/07/2011 17:03:15 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[TableOfGroups](
[GroupAutoID] [int] IDENTITY(1,1) NOT NULL,
[Group] [nvarchar](255) NOT NULL,
[AltTexts] [varchar](max) NULL,
[ts] [timestamp] NOT NULL,
CONSTRAINT [PK_TableOfGroups] PRIMARY KEY CLUSTERED
(
[GroupAutoID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
/****** Object: Table [dbo].[TableOfGenuss] Script Date: 07/07/2011 17:03:15 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[TableOfGenuss](
[GenusAutoID] [int] IDENTITY(1,1) NOT NULL,
[Genus] [nvarchar](255) NOT NULL,
[AltTexts] [varchar](max) NULL,
[ts] [timestamp] NOT NULL,
CONSTRAINT [PK_TableOfGenuss] PRIMARY KEY CLUSTERED
(
[GenusAutoID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
/****** Object: Table [dbo].[TableOfFamilys] Script Date: 07/07/2011 17:03:15 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[TableOfFamilys](
[FamilyAutoID] [int] IDENTITY(1,1) NOT NULL,
[Family] [nvarchar](255) NOT NULL,
[AltTexts] [varchar](max) NULL,
[ts] [timestamp] NOT NULL,
CONSTRAINT [PK_TableOfFamilys] PRIMARY KEY CLUSTERED
(
[FamilyAutoID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
/****** Object: Table [dbo].[TableOfDepozitars] Script Date: 07/07/2011 17:03:15 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[TableOfDepozitars](
[DepozitarAutoID] [int] IDENTITY(1,1) NOT NULL,
[Depozitar] [char](13) NOT NULL,
CONSTRAINT [PK_TableOfDepozitars] PRIMARY KEY CLUSTERED
(
[DepozitarAutoID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
/****** Object: Table [dbo].[TableOfCountrys] Script Date: 07/07/2011 17:03:15 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[TableOfCountrys](
[CountryAutoID] [int] IDENTITY(1,1) NOT NULL,
[Country] [nvarchar](255) NOT NULL,
[Cesky] [nvarchar](255) NOT NULL,
CONSTRAINT [PK_TableOfCountrys] PRIMARY KEY CLUSTERED
(
[CountryAutoID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
/****** Object: Table [dbo].[Podrobnosti] Script Date: 07/07/2011 17:03:15 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[Podrobnosti](
[PodrobnostiAutoID] [int] IDENTITY(1,1) NOT NULL,
[AkcesAutoID] [int] NOT NULL,
[EvidenceLetter] [nvarchar](2) NULL,
[EvidenceNumber] [int] NULL,
[EvidenceExtra] [varchar](2) NULL,
[EvidenceGroup] [int] NULL,
[GroupAutoID] [int] NULL,
[OrderAutoID] [int] NULL,
[FamilyAutoID] [int] NULL,
[GenusAutoID] [int] NULL,
[SubGenusAutoID] [int] NULL,
[SpeciesAutoID] [int] NULL,
[SubSpeciesAutoID] [int] NULL,
[SystemAutoID] [int] NULL,
[SeriesAutoID] [int] NULL,
[StageAutoID] [int] NULL,
[SubStageAutoID] [int] NULL,
[LithographicUnitAutoID] [int] NULL,
[LithographicSubUnitAutoID] [int] NULL,
[ZoneAutoID] [int] NULL,
[CountryAutoID] [int] NULL,
[DepozitarAutoID] [int] NULL,
[PDAutoID] [int] NULL,
[ODAutoID] [int] NULL,
[OriginAutoID] [int] NULL,
[PocetKusu] [int] NULL,
[OTHER_NO] [varchar](max) NULL,
[Inventarizace] [varchar](max) NULL,
[Poznamka] [varchar](max) NULL,
[Description] [varchar](max) NULL,
[RockType] [varchar](max) NULL,
[Preserv] [varchar](max) NULL,
[UlozisteDocasne] [varchar](max) NULL,
[Original] [varchar](max) NULL,
[Authors] [varchar](max) NULL,
[Lokalita] [varchar](max) NULL,
[IDNeDruheEvidence] [varchar](max) NULL,
[ts] [timestamp] NOT NULL,
CONSTRAINT [PK_Podrobnosti] PRIMARY KEY NONCLUSTERED
(
[PodrobnostiAutoID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
July 7, 2011 at 1:30 pm
My first instinct would be to create a nonclustered index on EvidenceLetter in Podrobnosti, and then populate the form using a stored procedure that takes EvidenceLetter as input and performs the query above using it (instead of having a view at all). It looks like the main table scan is a bigger bottleneck than the outer joins.
By the way, are you sure that all the speed problems are on the database side, and nothing to do with loading 20,000 rows into a form?
July 7, 2011 at 1:51 pm
I'd start at the very beginning:
The massive use of VARCHAR(MAX) seems to be oversized from my point of view. Not only within the functions, but also within the table structure:
Example: TABLE [dbo].[Podrobnosti]
[OTHER_NO] [VARCHAR](MAX), [RockType] [VARCHAR](MAX) ??
What is the content of [Authors] [VARCHAR](MAX)? Hopefully, not a comma separarated list of authors... If so, the table structure would probably benefit from normalization.
July 7, 2011 at 5:17 pm
Nevyn (7/7/2011)
My first instinct would be to create a nonclustered index on EvidenceLetter in Podrobnosti, and then populate the form using a stored procedure that takes EvidenceLetter as input and performs the query above using it (instead of having a view at all). It looks like the main table scan is a bigger bottleneck than the outer joins.
My apologies - the posted table definitions don't seem to be complete. I do have such an index, no idea why the script generator didn't include it. I just tried it again, for only the Podrobnosti table, and the results are the same - it shows the field layout, the primary key and field constraints, but not the indexes. The fields in this index are:EvidenceLetter (ASC), EvidenceNumber (DESC), EvidenceExtra (ASC), EvidenceGroup (ASC), which also guards against duplicate instances of evidence letter, number and extra combinations.
By the way, are you sure that all the speed problems are on the database side, and nothing to do with loading 20,000 rows into a form?
Pretty sure - as I wrote, the view is slow (sometimes, not consistently) even when I am logged directly onto the server via Remote Desktop and execute the view in SSMSE. Also, what I've read on network database traffic says that Access (my front-end) only requests the first few rows when populating a form, precisely to avoid such delays, and loads more in the background at its leisure. It seems to be the case, because when I scroll in this, after the first visible rows appear and the form begins reacting to input again, there is sometimes a small pause at intervals, presumably as more data is brought across the network.
July 7, 2011 at 5:37 pm
LutzM (7/7/2011)
I'd start at the very beginning:The massive use of VARCHAR(MAX) seems to be oversized from my point of view. Not only within the functions, but also within the table structure:
Example: TABLE [dbo].[Podrobnosti]
[OTHER_NO] [VARCHAR](MAX), [RockType] [VARCHAR](MAX) ??
I suppose it is, but I have no real way of knowing how many characters there will be in these fields. Not all that many, usually, but where do I draw the line? 20? 50? 100? Inevitably, someone will want 101 or 102. Putting a firm number seems to me to indicate some real, defined limit. The (MAX) is the same as the use of 4000 or 8000 in previous versions of SQL Server - a non-limited text field. And how do I limit it in my functions? I've tried limits in some cases and TSQL simply chops off any excess, with no warning - not exactly the behavior I'd like. And if it did issue warnings, I'd have to add special error handling to every instance of using the function.
I'm not trying to be argumentative here, but I don't see how (MAX) is causing me any problems, nor do I see an easy path to dealing with the problems that would be caused by setting length limits. Could you maybe elaborate a bit?
What is the content of [Authors] [VARCHAR](MAX)? Hopefully, not a comma separarated list of authors... If so, the table structure would probably benefit from normalization.
It is a indeed such a list, but the individual author names are not often used as filter or search criteria. And a normalized aproach would cause additional problems. Besides adding yet another table to this circus, the authors field is part of the taxonomic description, and the order of names is important, as is the fact that they are sometimes in quotes, parentheses or with comments. Essentially, it's a simple text field containing several words. Maybe think of it like the name of a law firm - "Shylock, Shyster, Legree and Associates". It has to be exactly that way, or someone will complain, regardless of the fact that Legree would be first in alphabetical order.
July 8, 2011 at 10:52 am
but where do I draw the line? 20? 50? 100? Inevitably, someone will want 101 or 102. Putting a firm number seems to me to indicate some real, defined limit.
You'll need to analyze each and every column to define the size it might expand to. Let's have a look at the [Authors] column: assuming a name length of 30 char on average, do you really expect more than 260 author names listed in one row in that column? If not, why not use VARCHAR(8000)?
One of the most important items to cover in database design is to define appropriate data types. VARCHAR(MAX) as an "excuse" for "I don't know" is just plain wrong from my point of view.
The (MAX) is the same as the use of 4000 or 8000 in previous versions of SQL Server - a non-limited text field.
That statement is not true.
Neither is (MAX) the same as 4000 or 8000 in prev. versions, nor it is non-limited. VARCHAR(8000) and NVARCHAR(4000) are still there and are different than the (MAX) option. And the limit of VARCHAR(MAX) is around 2GigaByte.
An advice straight from BOL:
Use varchar(max) when the sizes of the column data entries vary considerably, and the size might exceed 8,000 bytes.
What do you think about the following statement:
I'm not sure if I need 100, 102 or 200 byte for that column. Just to be safe, let's set the limit at 2.147.483.647byte.
Also keep in mind, you're not "safe" from any error handling:
Since you're performing string concatenation in your functions you still need to add the error handling, since a concatenation of two varchar(max) values with 2Gb each will also not return what you might expect π
July 8, 2011 at 6:31 pm
LutzM (7/8/2011)
but where do I draw the line? 20? 50? 100? Inevitably, someone will want 101 or 102. Putting a firm number seems to me to indicate some real, defined limit.
You'll need to analyze each and every column to define the size it might expand to. Let's have a look at the [Authors] column: assuming a name length of 30 char on average, do you really expect more than 260 author names listed in one row in that column? If not, why not use VARCHAR(8000)?
I scanned the existing data and found the longest string in this field so far - 59 characters. With 80,000 entries, I probably have a reasonably sized statistical sample, but I really have no idea what could come in the future. The larger the field, the less likely it is that something won't fit, but I can't just give a firm number that will alway be big enough, unless I go to ridiculous extremes. Is VARCHAR(8000) really conceptually that different from VARCHAR(MAX)? More importantly, is it somehow BETTER? In both cases it's far more than anyone will ever type in by hand. I'm having trouble understanding how one ridiculously large number has significant advantages over another ridiculously large number.
If I do base it on the largest so far, what would you suggest as a safe reserve? +50%? 2x? 3x? 10x? It's a list of names of people who had a hand in determining the species. As a practical matter, it's generally one or two names, occasionally up to six. But it could theoretically be an entire team of people.
And there are other fields that are defined as MAX, for instance a comment field. That could be only a few words, but it could easily be an entire copy-and-pasted article. Should I let my users copy in a 4,000 word article, but not a 5,000 word article? Why?
One of the most important items to cover in database design is to define appropriate data types. VARCHAR(MAX) as an "excuse" for "I don't know" is just plain wrong from my point of view.
I do have my types clearly defined where I am certain of the content, and I agree, this is somewhat sloppy. However, I don't see where it's causing me any real problems, and we've wandered considerably afield from my original question. Unless you think MAX has some significant bearing on performance, could we stick to the problem I described in my original post?
The (MAX) is the same as the use of 4000 or 8000 in previous versions of SQL Server - a non-limited text field.
That statement is not true.
Neither is (MAX) the same as 4000 or 8000 in prev. versions, nor it is non-limited. VARCHAR(8000) and NVARCHAR(4000) are still there and are different than the (MAX) option. And the limit of VARCHAR(MAX) is around 2GigaByte.
I didn't mean they were equivalent, just that they were used for similar situations, before (MAX) became available - as a substitute for the 'I don't know' which you condemn.
An advice straight from BOL:
Use varchar(max) when the sizes of the column data entries vary considerably, and the size might exceed 8,000 bytes.
Well, BOL is not to be dismissed lightly, but I notice they don't say 'Use varchar(max) ONLY when ...'. They say that in such a case, (MAX) is appropriate, but they don't say (or at least you don't quote them saying) that it should NOT be used in other situations.
What do you think about the following statement:
I'm not sure if I need 100, 102 or 200 byte for that column. Just to be safe, let's set the limit at 2.147.483.647byte.
When you put it that way, in hard numbers, of course it sounds silly, but so does 8,000. Since this is manually typed input, nobody is going to come anywhere close to such limits, which is the whole point - certainty of enough room for any anticipated input.
Also keep in mind, you're not "safe" from any error handling:
Since you're performing string concatenation in your functions you still need to add the error handling, since a concatenation of two varchar(max) values with 2Gb each will also not return what you might expect π
Again, this is manual input. Even with every field submitted to my concat functions containing unusually large amount of text, nobody is coming anywhere near 2GB.
July 9, 2011 at 4:32 am
...could we stick to the problem I described in my original post? ...
We still do.
The issue with VARCHAR(MAX): it's a Large Object Data Type, which is a special data type since it won't always fit into a single page (a page is approx 8000byte) but a VARCHAR(8000) or NVARCHAR(4000) will. Therefore, a pointer is stored instead of the original data if the value exceed 8000 byte (side note: depending on the setting of "large value types out of row").
There are several disadvanteages of VARCHAR(MAX). The main reason I try to avoid it is that you cannot create an index on a VARCHAR(MAX) column (even though you can add it as an INCLUDED column). If you ever plan to use one of the VARCHAR(MAX) columns either in a JOIN predicate or in a WHERE clause, you're likely to end up with a table scan and usually can't do anything about it since you can't create an index on that VARCHAR(MAX) column. The result: bad performance.
There are some other performance related issues as well as front end related aspects: e.g. how is your application designed to display 2Gb of data in several "fields" ("field in terms of the frontend, not the source table)? Not to mention the disc space required.. I counted 32 VARCHAR(MAX) columns. Assuming one row in each table filled with max data size ould require 64Gb of disc space (+ index space). I hope you planned for it...
If you try to develop a robust software you should use a test scenario where all columns are filled with values of the max allowed size and check how your application deals with it.... With VARCHAR(MAX) everywhere you'll be surprised, I guess π
You might think a user won't enter that many data manually. But since there's copy and paste it doesn't take much effort to insert several thousand byte of data. Just imagine a user want to copy a rather long article into the appropriate field of the frontend app and (just accidentially) hit "paste" in several other fields that will accept VARCHAR(MAX), too. Do you think that's impossible or even ulikely?
Regarding the articles you're planning to store in the db: How will you maintain the original format (text format, page numbers, maybe even graphics)? It might be a better idea to store the article as a pdf and just store the folder and file name in the db. It depends on the business case though.
And finally, coming back to the original performance issue:
In order to improve the performance of the query, we'll need covering and narrow indexes. A covering index include all columns of a specific table used in a query. The columns used in a JOIN predicate or WHERE condition have to be part of the key columns that actually define the index whereas additional columns used in the SELECT part of a query can be moved into the INCLUDE() section of an index.
The reason to use the INCLUDE section is to make the index more narrow, since those values will only be stored at the leave level of an index. A narrow index can be searched faster than a wide index.
Looking at the columns needed to create covering indexes we'll end up with a lot of VARCHAR(MAX) columns in the INCLUDE section of the index. That's "slightly" different than being "narrow". π
And the conclusion brings me back to my first reply: start at the very beginning and fix the data type to match the business needs and the capability of the frontend app. Then start tuning the query by adding appropriate indexes.
If you'er looking for more information regarding VARCHAR(MAX) i recommend a google search for "sql server varchar max disadvantage"...
July 9, 2011 at 1:12 pm
LutzM (7/9/2011)
Ok, first off, thanks for sticking around. I've run into people in forums and newsgroups who get offended when I don't accept their blank statements as gospel. They make some derogatory comment and split, so I appreciate you taking the time to go through this.
The issue with VARCHAR(MAX): it's a Large Object Data Type, which is a special data type since it won't always fit into a single page (a page is approx 8000byte) but a VARCHAR(8000) or NVARCHAR(4000) will. Therefore, a pointer is stored instead of the original data if the value exceed 8000 byte (side note: depending on the setting of "large value types out of row").
Only if it actually exceeds that, no? As long as the data is less than 8K, it will work like any other varchar data, according to what I've been reading.
There are several disadvanteages of VARCHAR(MAX). The main reason I try to avoid it is that you cannot create an index on a VARCHAR(MAX) column (even though you can add it as an INCLUDED column).
All right, that's a serious issue, at least for some of these fields. Not in the query I'm currently tweaking, that's all IDs, but it will be in others, and is certainly a legitimate reason to avoid MAX.
There are some other performance related issues as well as front end related aspects: e.g. how is your application designed to display 2Gb of data in several "fields" ("field in terms of the frontend, not the source table)? Not to mention the disc space required.. I counted 32 VARCHAR(MAX) columns. Assuming one row in each table filled with max data size ould require 64Gb of disc space (+ index space). I hope you planned for it...
Not at all. Once again, this is all manual input. Even with copy-paste, nobody is going to put anywhere close to that much stuff into anything, not even comment fields. (Is it even possible? How much can the clipboard hold?) My use of MAX was not intended to leave room for that much stuff, it was the lazy equivalent of 'I don't know', like taking the biggest truck in the lot to be on the safe side, because I don't know just how much stuff I'm going to be hauling.
If you try to develop a robust software you should use a test scenario where all columns are filled with values of the max allowed size and check how your application deals with it.... With VARCHAR(MAX) everywhere you'll be surprised, I guess π
I doubt it - I'm pretty near certain that lots of things would crash long before I got near the maximum. Actually, I'd be surprised if they didn't.
You might think a user won't enter that many data manually. But since there's copy and paste it doesn't take much effort to insert several thousand byte of data. Just imagine a user want to copy a rather long article into the appropriate field of the frontend app and (just accidentially) hit "paste" in several other fields that will accept VARCHAR(MAX), too. Do you think that's impossible or even ulikely?
Unlikely, yes, but certainly not impossible, you're right there.
Regarding the articles you're planning to store in the db: How will you maintain the original format (text format, page numbers, maybe even graphics)? It might be a better idea to store the article as a pdf and just store the folder and file name in the db. It depends on the business case though.
No formatting of any sort - plain text. The copy-paste of articles was even a bit of a stretch. Mostly the comment fields are empty, cometimes a few words or sentences of information that don't fit into any of the other, strictly defined fields. The article was meant as an extreme case, that someone -might- want to paste some text from a professional journal or website into the comment field. But more likely they would save the entire article as an attachment to the record. (I have a separate mechanism in the application for that, not visible in the specs I posted, dealing with shared folders and filenames based on record ID number, precisely to keep potentially huge attachments out of the database.)
In order to improve the performance of the query, we'll need covering and narrow indexes. A covering index include all columns of a specific table used in a query. The columns used in a JOIN predicate or WHERE condition have to be part of the key columns that actually define the index whereas additional columns used in the SELECT part of a query can be moved into the INCLUDE() section of an index.
I've gone back and dropped the sizes of these fields to varchar(255). That's a magic number for Access, which I'm using as a front-end, and is what I generally use as the Access version of 'I don't know' in lookup tables, which I'll grant you is probably more reasonable than MAX. (I'm fairly new to SQL Server, in case it isn't painfully obvious.) Access has a similar restriction, where it allows longer fields (memo type) but has restrictions on their use in indexes and sorts. Up to 255 is a text field, which can do anything.
But my first try is not a resounding success - after changing the field sizes, I ran the query again. What used to take around a minute, worst case, suddenly took almost four. Is there something like a garbage collection that I should run after such a major change? I can wipe and repopulate the entire database quite easily - there is a production version running, with a stored procedure to empty the test DB and reload it from the live one, when my tests trash some data. I'm making all these changes in a test version (I've got that much sense, at least). Should I empty the test DB and do something like clear statistics or indexes before reloading the data?
The reason to use the INCLUDE section is to make the index more narrow, since those values will only be stored at the leave level of an index. A narrow index can be searched faster than a wide index.
For this query, would it be reasonable to build an index with all 20 foreign keys? And add the associated text fields from each lookup table as INCLUDED columns? If so, does the order make any difference (most likely or least likely to have a matching record)? Or should it be in the same order as the join clauses? Does it matter at all?
I now have individual indexes for the ID fields of each lookup table, but they don't seem to be all that helpful. Four minutes for SQL Server to select a measly 80,000 records doesn't seem right.
Hm, scratch that, I just tried, 16 fields is the max for an index. And I can't include columns from the lookup tables in a table index, and I can't create an indexed view with these columns because I'm using Outer Joins. Seems like I'm back where I started.
And the conclusion brings me back to my first reply: start at the very beginning and fix the data type to match the business needs and the capability of the frontend app. Then start tuning the query by adding appropriate indexes. If you'er looking for more information regarding VARCHAR(MAX) i recommend a google search for "sql server varchar max disadvantage"...
Well, I'm trying. I did look that up, as you suggest - lots of very passionate opinions on the subject - sometimes it's hard separating the heat from the light.
July 9, 2011 at 2:27 pm
pdanes2 (7/9/2011)
Depends. It's far from that simple.
Regardless (and I'm going to be very blunt here), the use frequent of the MAX data types is an indication of poor database design. For normal data you can make a sensible decision regarding length, with input from the users of the application. If you've got a name field 100 characters is a reasonable length to set (who has a first name or surname 100 letters long?). For addresses you can take a look at the postal rules or make a reasonable guess (My entire address is 68 characters, including building name, street, suburb, city, province country and postal code)
Getting these kinds of details is tedious and time consuming and probably one of the harder parts of designing a database, but it should be considered.
There are other downsides to MAX data types too, regardless of whether the data is in-row or not (online index rebuilds being the immediate one that comes to mind)
What used to take around a minute, worst case, suddenly took almost four. Is there something like a garbage collection that I should run after such a major change?
No, but an index rebuild is probably a good idea.
For this query, would it be reasonable to build an index with all 20 foreign keys? And add the associated text fields from each lookup table as INCLUDED columns?
No. Typically you'd start with an index per foreign key, then the lookup table should have a clustered index on the key (if there's a primary key it's clustered by default)
That's just to start though. For indexing queries one typically starts with the columns in the where clause. Order of the key makes a major difference, order of the includes does not.
That said, if you're returning every single row from every single table, indexes are not going to help much. They're good when filtering data, but your query appears not to be doing that. If you are, please post the query that filters that view.
p.s. What's with the timestamp columns?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
July 9, 2011 at 2:45 pm
As you noticed, SQLServerCentral (or SSC) is slightly different than most of the other db forums. It's more a community, not just a forum. Therefore, we prefer to clarify an issue instead of getting offended for such a minor reason (not that it won't happen from time to time...).
Did you run the query from SSMS (SQL Server Management Studio) or via ACCESS?
Do the functions reflect the changed data types (including the target data type to be varchar(510) if you concatenate two varchar(255) columns) or are those still varchar(max)?
The best would be if you could post the actual execution plan of the query (use Ctrl+M to activate the actual execution plan), run the query in Management Studio, save the execution plan (right-click on the plan and save as .sqlplan file) and post it as an attachement. This will help us to find the cause for the long running query as well as a solution for it.
Regarding the index definition: each table needs its own index def with the related columns of that specific table.
Most probably the performance can be improved significantly. But the number of returned rows is not an indicator whether a query should be fast or not: To return just one row can take a significant amount of time if there are several tables with billions of rows including aggregation involved. π
July 9, 2011 at 3:48 pm
GilaMonster (7/9/2011)
Regardless (and I'm going to be very blunt here), the use frequent of the MAX data types is an indication of poor database design.
No stress, I'm not sensitive. Senseless abuse I can do without, but useful advice is always welcome, no matter how brusque. If I'm doing something dumb, I'd rather hear about it and learn than have someone worry about bruising my ego.
Getting these kinds of details is tedious and time consuming and probably one of the harder parts of designing a database, but it should be considered.
There are other downsides to MAX data types too, regardless of whether the data is in-row or not (online index rebuilds being the immediate one that comes to mind)
I agree, it was lazy, but I didn't realize that it could also cause real problems.
How do YOU deal with possible overflows of text input? Same logic in the front-end as in the database? Wait for the DB to throw an error?
What used to take around a minute, worst case, suddenly took almost four. Is there something like a garbage collection that I should run after such a major change?
No, but an index rebuild is probably a good idea.
No help - I just rebuilt all the indexes and I'm still at 3:42, almost four times what it took before I reduced my MAX definitions to 255. No other changes in the design.
For this query, would it be reasonable to build an index with all 20 foreign keys? And add the associated text fields from each lookup table as INCLUDED columns?
No. Typically you'd start with an index per foreign key, then the lookup table should have a clustered index on the key (if there's a primary key it's clustered by default)
I do have all those, but since I'm not filtering by them, it doesn't seem to help much. I thought it should help for the joins, but it doesn't seem to.
When I limit the results, say TOP 100, the result is almost instantaneous, but then I get a random set of 100. When I add sorting criteria to control which 100, it slows way down again.
I may just have to tell the users that as long as they want to see all this info AND scroll around in the results, they're simply going to have to accept delays. I'd like to at least get it back to where it was, though. A minute is a nuisance, but tolerable (in this app). Four minutes is not.
That's just to start though. For indexing queries one typically starts with the columns in the where clause. Order of the key makes a major difference, order of the includes does not.
But that's one of the issues here - I'm using outer joins and so can't index the query. Or can I? SSMSE says I can't.
That said, if you're returning every single row from every single table, indexes are not going to help much. They're good when filtering data, but your query appears not to be doing that. If you are, please post the query that filters that view.
The view is not doing any filtering, selects from the view do. I don't expose any tables to the front-end; everything is accessed via views or stored procedures. Initially, I start with either '... WHERE EvidenceLetter IN (...)' or '... WHERE AkcesAutoID = n', depending on which of two display modes the user wants, detail only or detail within group. In both modes the grouping varies wildly, ranging from 0 to tens of thousands.
p.s. What's with the timestamp columns?
I've read in several design books and tutorials that it's a good field to include, that SQL Server automatically uses it (if present) to determine whether a record was changed by someone else while it was being edited, that otherwise it has to compare the entire record. Is that not the case?
July 9, 2011 at 4:09 pm
pdanes2 (7/9/2011)
How do YOU deal with possible overflows of text input? Same logic in the front-end as in the database? Wait for the DB to throw an error?
Validation in the application. Max lengths defined in the application. Fail fast always.
That's just to start though. For indexing queries one typically starts with the columns in the where clause. Order of the key makes a major difference, order of the includes does not.
But that's one of the issues here - I'm using outer joins and so can't index the query. Or can I? SSMSE says I can't.
I wasn't talking about indexing the view.
That said, if you're returning every single row from every single table, indexes are not going to help much. They're good when filtering data, but your query appears not to be doing that. If you are, please post the query that filters that view.
The view is not doing any filtering, selects from the view do. I don't expose any tables to the front-end; everything is accessed via views or stored procedures. Initially, I start with either '... WHERE EvidenceLetter IN (...)' or '... WHERE AkcesAutoID = n', depending on which of two display modes the user wants, detail only or detail within group. In both modes the grouping varies wildly, ranging from 0 to tens of thousands.
Then the tuning efforts need to be focused on the queries, not the view definition. Trying to tune a select * from view with no where clause when all the queries have filters is almost as pointless as trying to tune select * from table when all the queries have filters.
Test the common queries. Find the ones that are unacceptably slow. Tune them.
p.s. What's with the timestamp columns?
I've read in several design books and tutorials that it's a good field to include, that SQL Server automatically uses it (if present) to determine whether a record was changed by someone else while it was being edited, that otherwise it has to compare the entire record. Is that not the case?
[/quote]
No.
Timestamp is great if the front end app wants to tell if the record has changed since it was read. The SQL engine does not need an additional, manually added column to do that. It's purely for easy of front end application development.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
July 9, 2011 at 4:41 pm
LutzM (7/9/2011)
Did you run the query from SSMS (SQL Server Management Studio) or via ACCESS?
I do both as a matter of course, but today and yesterday I have been working from home, logging directly onto the server via TeamViewer. The Access app is on my work machine, which is turned off, so I have been running SSMSE directly on the server for the entire course of this discussion.
Do the functions reflect the changed data types (including the target data type to be varchar(510) if you concatenate two varchar(255) columns) or are those still varchar(max)?
I have only changed the field definitions in the underlying table, nothing else, and not even all of the fields yet. Some already have data considerably past 255, so I'm going to have to examine them in more detail before I decide where to set their limits. And add error handling for overflow to the app.
The best would be if you could post the actual execution plan of the query (use Ctrl+M to activate the actual execution plan), run the query in Management Studio, save the execution plan (right-click on the plan and save as .sqlplan file) and post it as an attachement. This will help us to find the cause for the long running query as well as a solution for it.
OK, I've included two plans - one bare bones and one with a TOP 100 predicate.
Regarding the index definition: each table needs its own index def with the related columns of that specific table.
I'll go through it again, I've been playing with this so much the last few months that I'm not sure where everything is any more. And I've got to get better at running and organizing scripts. So far I've been doing most of the work in the GUI, but constantly having to do things like remove and restore SchemaBinding is getting tedious.
Most probably the performance can be improved significantly. But the number of returned rows is not an indicator whether a query should be fast or not: To return just one row can take a significant amount of time if there are several tables with billions of rows including aggregation involved. π
I'm all ears...
July 9, 2011 at 5:42 pm
Here's what I would do:
Step 1: fix the data type used in / returned by the functions involved (e.g. [fnsFormatAkcesit] will never exceed 17 byte, therefore the data type for the return value should be varchar(17) instead of varchar(max))
Step 2: change the CONSTRAINT [PK_Podrobnosti] on the Podrobnosti table to PRIMARY KEY CLUSTERED
Step 3: change the CONSTRAINT [PK_Akces] on the Akces table to PRIMARY KEY CLUSTERED
Step 4: add an index on dbo.Podrobnosti.AkcesAutoID (as Gail suggested: one key per foreign key reference)
Step 5: add indexes on dbo.Podrobnosti for the other foreign key references. Start with the lookup tables with the most rows (e.g. TableOfDepozitars preferred over TableOfSystems)
Step 6: if needed, add addition indexes to the lookup tables in the structure of *AutoID INCLUDE(columns_used_in_SELECT)
If those steps won't help to bring the elapsed time into an acceptable range you might have to consider using pass through queries from your ACCESS front end. With a pass through query you could use an in-line-table valued function (AKA parameterized view) and rewrite that function to limit the rows of dbo.Podrobnosti before the joins even start. You might even need to write two separate functions: one for EvidenceLetter IN and one for AkcesAutoID.
If you need to go down that route I strongly recommend to setup a dedicated login that will only be allowed to call those functions since the login as well as password are stored in the ACCESS frontend in plain text, visible to everyone who just looks a little closer....
Viewing 15 posts - 1 through 15 (of 29 total)
You must be logged in to reply to this topic. Login to reply