February 5, 2015 at 8:25 pm
Hi
I am running an update against a table with over 100,000,000 rows which takes forever. Is there a better way to write a query like this?
UPDATE [dbo].[Test_Table]
SET [Column_34] =(
CASE
WHEN [Column_2] > 25 AND [Column_2] <= 34 THEN 1
ELSE 0 END)
February 5, 2015 at 9:21 pm
jdbrown239 (2/5/2015)
HiI am running an update against a table with over 100,000,000 rows which takes forever. Is there a better way to write a query like this?
UPDATE [dbo].[Test_Table]
SET [Column_34] =(
CASE
WHEN [Column_2] > 25 AND [Column_2] <= 34 THEN 1
ELSE 0 END)
Yes. Every system has a tipping point. 1M rows might take 6 seconds. As expected, 2M rows might take just twice that (12 seconds) and 3M just 3 times that (18 seconds). But 4 million, for no apparent reason (there are reasons... just too many to talk about here, usually having to do with memory) might take 2 hours.
So, first step is to breakup such massive updates by only updating, say, 2 million rows at a time in an update loop.
Another thing that can really slow down an update is if the columns you're updating have indexes on them.
Last but certainly not least and a very common mistake if it's a "joined" update is writing an "illegal" update where the object of the update isn't in the FROM clause. Post your code and lets have a look.
--Jeff Moden
Change is inevitable... Change for the better is not.
February 5, 2015 at 9:46 pm
Further on Jeff's answer, could you post the DDL for the table and everything that comes with it (indexes, constraints, triggers etc.)?
😎
February 6, 2015 at 8:58 am
Table DDL
CREATE TABLE [dbo].[Targus_201412_V7](
[PrimaryPhone] [varchar](10) NULL,
[CensusBlock] [varchar](15) NULL,
[City] [varchar](50) NULL,
[State] [varchar](2) NULL,
[ZIP_Code] [varchar](5) NULL,
[ZIP_Plus4] [varchar](4) NULL,
[Income] [varchar](5) NULL,
[AvgHomeValue] [varchar](5) NULL,
[MaritalStatusCode] [varchar](5) NULL,
[AgeHOH] [varchar](5) NULL,
[AgeMember2] [varchar](5) NULL,
[NumberofAdults] [varchar](5) NULL,
[NumberofChildren] [varchar](5) NULL,
[PresenceofChildren] [varchar](5) NULL,
[Age0to3] [varchar](5) NULL,
[Age4to6] [varchar](5) NULL,
[Age7to9] [varchar](5) NULL,
[Age10to12] [varchar](5) NULL,
[Age13to15] [varchar](5) NULL,
[Age16to18] [varchar](5) NULL,
[HispanicSurname] [varchar](5) NULL,
[AsianSurname] [varchar](5) NULL,
[OwnRent] [varchar](5) NULL,
[SDU] [varchar](5) NULL,
[Ported_Q314] [varchar](5) NULL,
[Port_Dt_Q314] [varchar](10) NULL,
[OCN_Q314] [varchar](4) NULL,
[Name_Q314] [varchar](50) NULL,
[TYPE_Q314] [nvarchar](30) NULL,
[Ported_Q414] [varchar](5) NULL,
[Port_Dt_Q414] [varchar](10) NULL,
[OCN_Q414] [varchar](4) NULL,
[Name_Q414] [varchar](50) NULL,
[TYPE_Q414] [nvarchar](30) NULL,
[RECODE_PHONE_TYPE_Q4] [char](60) NULL,
[METRO] [float] NULL,
[METRO_STATUS] [char](10) NULL,
[URBAN] [int] NULL,
[SUBURBAN] [int] NULL,
[RURAL] [int] NULL,
[SETTING_CHK] [int] NULL,
[KIDS0_6] [numeric](18, 0) NULL,
[KIDS7_12] [numeric](18, 0) NULL,
[KIDS13_18] [numeric](18, 0) NULL,
[CKIDS] [numeric](18, 0) NULL,
[DIFF] [numeric](18, 0) NULL,
[DIFF2] [numeric](18, 0) NULL,
[ADULTS] [numeric](18, 0) NULL,
[KIDS] [numeric](18, 0) NULL,
[AGEHOH2] [numeric](18, 0) NULL,
[INCOME0_20] [int] NULL,
[INCOME20_35] [int] NULL,
[INCOME35_50] [int] NULL,
[INCOME50_100] [int] NULL,
[INCOME100] [int] NULL,
[INCOME_CHK] [int] NULL,
[AGE18_25] [int] NULL,
[AGE26_34] [int] NULL,
[AGE35_44] [int] NULL,
[AGE45_54] [int] NULL,
[AGE55_64] [int] NULL,
[AGEHI] [int] NULL,
[AGE_CHK] [int] NULL
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
February 6, 2015 at 9:00 am
Any indexes, constraints, triggers?
😎
February 6, 2015 at 9:01 am
Index DDL
CREATE NONCLUSTERED INDEX [IX_Agehoh2] ON [dbo].[Targus_201412_V7]
(
[AGEHOH2] ASC
)
INCLUDE ( [AgeHOH]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
February 6, 2015 at 9:03 am
Good job, I'll look into this over the weekend
😎
February 6, 2015 at 9:06 am
Thanks for your help.
February 6, 2015 at 9:23 am
That statement is going to have the scan the table regardless of what you do.
The critical thing for performance is to pre-allocate (and thus pre-format) enough log space for the entire statement to run without having to add log space.
Thus, we need so see how much free log space is available in that db. Run this command:
DBCC SQLPERF ( LOGSPACE )
and review the results for the relevant db.
Since you're only updating one column, I'd hope the log space required would be ~1GB or less. So, before running the command, make sure you have unused log space of ~1GB.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
February 6, 2015 at 10:24 am
I have 15 GB allocated space and 98% free
February 6, 2015 at 10:28 am
jdbrown239 (2/6/2015)
I have 15 GB allocated space and 98% free
In the log file?
Yeah, that should be plenty, so the statement should run as fast it can.
Are there triggers, replication or something else involved that might be slowing it down?
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
February 6, 2015 at 1:50 pm
What bothers me here is the fact that this a fairly large heap, makes any kind of batch/portioning effort very costly, any chance that you can change the table to a clustered index? Further on this, what is the unique key?
😎
February 8, 2015 at 4:39 pm
I will have to profile the data for uniqueness to see if a clustered index can be created.
February 9, 2015 at 1:32 am
jdbrown239 (2/8/2015)
I will have to profile the data for uniqueness to see if a clustered index can be created.
Don't get confused between a PK and a clustered index, the Clustered index doesn't need to be unique, its just a way of sorting the table into a given order.
_________________________________________________________________________
SSC Guide to Posting and Best Practices
February 9, 2015 at 8:31 am
Jason-299789 (2/9/2015)
jdbrown239 (2/8/2015)
I will have to profile the data for uniqueness to see if a clustered index can be created.Don't get confused between a PK and a clustered index, the Clustered index doesn't need to be unique, its just a way of sorting the table into a given order.
Careful now... It's not "just a way of sorting the table into a given order".
Behind the scenes and because the clustered index is included in every non-clustered index, it's a really good idea if the Clustered Index follows the basic guidelines of being narrow, static, not null, ever increasing, and unique. Without the Unique thing, an extra column is added and if duplicates according to the clustered index appear, a uniquefier will be added to the index.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 15 (of 17 total)
You must be logged in to reply to this topic. Login to reply