TempDB Keeps Filling Up After Log Truncation

Question

Post reply

TempDB Keeps Filling Up After Log Truncation

chris_barnhart

Ten Centuries

Points: 1095
More actions
July 12, 2016 at 10:22 am

#327853

Hi all,
I've searched and searched for my condition, but can't seem to find a fit. We recently had a DB Log file grow full causing Drive Space issues and since it wasn't necessarily a mission critical DB, and we didnt have room for backup, we truncated the log file and all was good. This was a user-created DB, not the TempDB DB.
However, since then, the TempDB keeps growing full to the point the jobs or SQL will fail ran against the DB in which we truncated the log. The jobs and SQL that are now failing, haven't failed in a very, very long time but again, seems like since the space issue on the other DB, the TempDB space is an issue.
I restarted the SQL Server Instance and all Space was available...no luck. We added an additional 25GB Primary Data file, but again, no luck as it uses it all up too when running some SQL I found on this site to show ho much TmepDB space is allocated versus free---and it deallocates just fine once the SQL stops executing. I'm not really a DBA, as our only DBA passed away a couple weeks ago and I was nominated for day to day tasks as needed until a replacement has been secured. I'm afraid that I may have made a rookie detrimental mistake when truncating the log or something trying to reclaim space. The problematic DB log was over 200GB and same for the MDF file, but goes down to like 4 after shrink process that ive used one other time since the truncation...the new shrink process is a STP our previous BDA put in place that sets recovery mode to simple and then shrinks and a few other things I can post if it seems relevant.
Most of the jobs use Views from the problematic DB, so wondering if corrupt views could possibly be returning more rows than it should and causing it to cache in the TempDB. I've recompiled one of them and the job is running right now, but I'm not too hopeful.
Any direction on what else to check would be immensely appreciated!
Thanks,
Chris

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply

chris_barnhart Ten Centuries Points: 1095 More actions · Answer 1

I'm using the following 2 queries in conjunction with each other to actively monitor the problematic jobs:

SELECT

SUM (user_object_reserved_page_count)*8 as user_obj_kb,

SUM (internal_object_reserved_page_count)*8 as internal_obj_kb,

SUM (version_store_reserved_page_count)*8 as version_store_kb,

SUM (unallocated_extent_page_count)*8 as freespace_kb,

SUM (mixed_extent_page_count)*8 as mixedextent_kb

FROM sys.dm_db_file_space_usage

SELECT es.host_name , es.login_name , es.program_name,

st.dbid as QueryExecContextDBID, DB_NAME(st.dbid) as QueryExecContextDBNAME, st.objectid as ModuleObjectId,

SUBSTRING(st.text, er.statement_start_offset/2 + 1,(CASE WHEN er.statement_end_offset = -1 THEN LEN(CONVERT(nvarchar(max),st.text)) * 2 ELSE er.statement_end_offset

END - er.statement_start_offset)/2) as Query_Text,

tsu.session_id ,tsu.request_id, tsu.exec_context_id,

(tsu.user_objects_alloc_page_count - tsu.user_objects_dealloc_page_count) as OutStanding_user_objects_page_counts,

(tsu.internal_objects_alloc_page_count - tsu.internal_objects_dealloc_page_count) as OutStanding_internal_objects_page_counts,

er.start_time, er.command, er.open_transaction_count, er.percent_complete, er.estimated_completion_time, er.cpu_time, er.total_elapsed_time, er.reads,er.writes,

er.logical_reads, er.granted_query_memory

FROM sys.dm_db_task_space_usage tsu inner join sys.dm_exec_requests er

ON ( tsu.session_id = er.session_id and tsu.request_id = er.request_id)

inner join sys.dm_exec_sessions es ON ( tsu.session_id = es.session_id )

CROSS APPLY sys.dm_exec_sql_text(er.sql_handle) st

WHERE (tsu.internal_objects_alloc_page_count+tsu.user_objects_alloc_page_count) > 0

ORDER BY (tsu.user_objects_alloc_page_count - tsu.user_objects_dealloc_page_count)+(tsu.internal_objects_alloc_page_count - tsu.internal_objects_dealloc_page_count)

DESC

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 2

It would probably help if you also provided the full and complete error messages you are getting. Not much we can do based just on what you have posted so far since we can't see what you see.

chris_barnhart Ten Centuries Points: 1095 More actions · Answer 3

I apologize...i knew I would leave something important out. Error is below:

(Message 2528) DBCC execution completed. If DBCC printed error messages, contact your system administrator. [SQLSTATE 01000] (Message 2528) Could not allocate space for object 'dbo.SORT temporary run storage: 422216907948032' in database 'tempdb' because the 'PRIMARY' filegroup is full. Create disk space by deleting unneeded files, dropping objects in the filegroup, adding additional files to the filegroup, or setting autogrowth on for existing files in the filegroup. [SQLSTATE 42000] (Error 1105) The statement has been terminated. [SQLSTATE 01000] (Error 3621). The step failed.

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 4

1) Since systems are being taken offline and jobs are failing I STRONGLY urge you to get a professional consultant on board to stop the bleeding and get the trains back on the track!! There are numerous good ones here on SSC.com and elsewhere.

2) Stop shrinking the production database files!! Horribly bad thing to do!

3) Make the production database(s) tlog file as big as you think it needs to be for normal operations and leave it there. Then tempdb growing cannot prevent tlog from growing (which stops all database DML activity).

4) have you identified a pattern to the tempdb growth? i.e. happens overnight, big batch job/data load/report run, etc? Did your monitoring reveal cause of growth?

5) sp_whoisactive is AWESOME FREE tool to see what is going on right now, including tempdb usage. Self-documented. See 30-day blog post series for details on usage (including a differential time option).

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 5

chris_barnhart (7/12/2016)
I apologize...i knew I would leave something important out. Error is below:
(Message 2528) DBCC execution completed. If DBCC printed error messages, contact your system administrator. [SQLSTATE 01000] (Message 2528) Could not allocate space for object 'dbo.SORT temporary run storage: 422216907948032' in database 'tempdb' because the 'PRIMARY' filegroup is full. Create disk space by deleting unneeded files, dropping objects in the filegroup, adding additional files to the filegroup, or setting autogrowth on for existing files in the filegroup. [SQLSTATE 42000] (Error 1105) The statement has been terminated. [SQLSTATE 01000] (Error 3621). The step failed.

I have to agree with Kevin, this could easily take more than what can be provided for free on a forum. From the error message, I wouldn't be surprised if there is a query out there that is doing a Cartesian join on two or more tables base on the error message.

chris_barnhart Ten Centuries Points: 1095 More actions · Answer 6

Thanks for the advice...we have a contractor on the way, but not until Monday.

I've been monitoring the consumption using the first set of SQL, the second one shows me queued memory and such related.

Seems to be things calling views on the DB so far...I'm awaiting for a job to return as we speak that uses a View, that was recompiled this morning as an effort.

Just to clarify (if it matters), i shrunk the Log file, not the Database file...again, not being argumentative, just trying to be as accurate as possible.

Thanks so much

jasona.work SSC Guru Points: 50117 More actions · Answer 7

chris_barnhart (7/12/2016)
Thanks for the advice...we have a contractor on the way, but not until Monday.
I've been monitoring the consumption using the first set of SQL, the second one shows me queued memory and such related.
Seems to be things calling views on the DB so far...I'm awaiting for a job to return as we speak that uses a View, that was recompiled this morning as an effort.
Just to clarify (if it matters), i shrunk the Log file, not the Database file...again, not being argumentative, just trying to be as accurate as possible.
Thanks so much

As a general guideline, shrinking any of the files that are part of the database (data file, log file, secondary data files) is a bad thing.

If it's the log file, it's only going to end up growing again until it's as big as it needs to be (how big is that? It depends on your workload, etc.)

The fact that the previous DBA had a recovery model change and shrink as a normal process would seem to indicate either he didn't know what the impact would be, a regular data load or delete that would cause the growth, not backing up the transaction log (in Full Recovery model,) or the company is too cheap to provide sufficient storage for the DB to "breathe."

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 8

chris_barnhart (7/12/2016)
Thanks for the advice...we have a contractor on the way, but not until Monday.
I've been monitoring the consumption using the first set of SQL, the second one shows me queued memory and such related.
Seems to be things calling views on the DB so far...I'm awaiting for a job to return as we speak that uses a View, that was recompiled this morning as an effort.
Just to clarify (if it matters), i shrunk the Log file, not the Database file...again, not being argumentative, just trying to be as accurate as possible.
Thanks so much

Depending on how small the initial log file is and how it grows (fixed size vs %) could easily cause issues. Where SQL Server can use fast initialization when a data file is grown, when a transaction log is grown, the new space has to be fully initialized before it can be used which will pause processing during that time. Transaction logs should be sized to handle daily work without the need to grow except when extraordinary processing occurs (large ingests of data, updating of large portion of data, or major deletes of data).

Also, is if the database is using the full recovery model or bulk logged model but no transaction log backups are being taken or are infrequently taken, the transaction log could easily use up drive space.

chris_barnhart Ten Centuries Points: 1095 More actions · Answer 9

I believe its the latter reason...he had backup jobs set up on the server, but not for this DB. It did in fact grow too large since I truncated the first time. When it filled up again, I used the same procedure as he used for the "crucial" dbs. Ive read quite a bit since then and understand this is a band-aid at best and we just need to perform regual rbackups. I read an arcticle called "STOP SHRINKING DB FILES" and it was very insightful, but past the fact unfortunately,

chris_barnhart Ten Centuries Points: 1095 More actions · Answer 10

Lynn Pettis (7/12/2016)
chris_barnhart (7/12/2016)
Thanks for the advice...we have a contractor on the way, but not until Monday.
I've been monitoring the consumption using the first set of SQL, the second one shows me queued memory and such related.
Seems to be things calling views on the DB so far...I'm awaiting for a job to return as we speak that uses a View, that was recompiled this morning as an effort.
Just to clarify (if it matters), i shrunk the Log file, not the Database file...again, not being argumentative, just trying to be as accurate as possible.
Thanks so much
Depending on how small the initial log file is and how it grows (fixed size vs %) could easily cause issues. Where SQL Server can use fast initialization when a data file is grown, when a transaction log is grown, the new space has to be fully initialized before it can be used which will pause processing during that time. Transaction logs should be sized to handle daily work without the need to grow except when extraordinary processing occurs (large ingests of data, updating of large portion of data, or major deletes of data).
Also, is if the database is using the full recovery model or bulk logged model but no transaction log backups are being taken or are infrequently taken, the transaction log could easily use up drive space.

It is in a FULL Recovery Model and is NOT being backed up...i will not shrink again, but will ask for the SAN/Lun to be expanded instead and then when the DBA gets onsite, work out a space to back it up to.

I guess they didnt consider crucial as the tables are refreshed each night, but the TempDB is interfering with that now...seems to have started after I sent the Log=NUL the first time. Chasing my tail ever since.

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 11

chris_barnhart (7/12/2016)
Thanks for the advice...we have a contractor on the way, but not until Monday.

Need a different consultant in the short-term then, IMNSHO! 😎

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 12

chris_barnhart (7/12/2016)
Lynn Pettis (7/12/2016)
chris_barnhart (7/12/2016)
Thanks for the advice...we have a contractor on the way, but not until Monday.
I've been monitoring the consumption using the first set of SQL, the second one shows me queued memory and such related.
Seems to be things calling views on the DB so far...I'm awaiting for a job to return as we speak that uses a View, that was recompiled this morning as an effort.
Just to clarify (if it matters), i shrunk the Log file, not the Database file...again, not being argumentative, just trying to be as accurate as possible.
Thanks so much
Depending on how small the initial log file is and how it grows (fixed size vs %) could easily cause issues. Where SQL Server can use fast initialization when a data file is grown, when a transaction log is grown, the new space has to be fully initialized before it can be used which will pause processing during that time. Transaction logs should be sized to handle daily work without the need to grow except when extraordinary processing occurs (large ingests of data, updating of large portion of data, or major deletes of data).
Also, is if the database is using the full recovery model or bulk logged model but no transaction log backups are being taken or are infrequently taken, the transaction log could easily use up drive space.
It is in a FULL Recovery Model and is NOT being backed up...i will not shrink again, but will ask for the SAN/Lun to be expanded instead and then when the DBA gets onsite, work out a space to back it up to.
I guess they didnt consider crucial as the tables are refreshed each night, but the TempDB is interfering with that now...seems to have started after I sent the Log=NUL the first time. Chasing my tail ever since.

Based on your statement that the tables are refreshed nightly, you may consider using the Simple Recovery model for that one database. I would still run a nightly backup after the refresh.

chris_barnhart Ten Centuries Points: 1095 More actions · Answer 13

Finally found the issue as a result of testing and monitoring as suggested by y'all. The Cartesian Join comment got me thinking...what could cause a TempDB to grow full with SORT message being thrown other than that...so, I started taking one of the problematic queries and running late last night/early morning and saw it as chewing upwards of 65GBs before I would finally cancel because I didn't want to hit the 75GB allocated. I started eliminating JOINs until I found the one that caused the TempDB growth. According to everyone I asked, no changes had been made.

I ran the query through an execution plan and saw two very high uses of Heap/Sort. I found a copy of this very same table in which the problematic JOIN uses in our Test environment and noticed a column was missing...in other words, a new column in Production. Based on that and the new data inserted into the table (70) rows, I re-examined the query logically to see how that would play a role and there it was...because none of the Views used (JOINs used the View) has been updated to accommodate the hierarchy of the new column, the new rows were redundant and would actually cause the JOIN to bring back the applicable rows, 70 times what it normally would. So, what was 50K rows returned originally, would turn into roughly 3.5M rows if my math is correct (or something close to that). 3M+ rows would probably be a lot more to cache and sort than what we have setup to handle originally (2 25GB Files) and even the new space allocated (3 25GB Files).

Once we dropped the new column and associated data and re-ran the problematic query, the TempDB space usage was much happier and I could see it rise, then deallocate nicely, then return the results like it used to. Checking through system tables for objects using that table, I saw about 20 views and an STP all using this table and JOINED in the same manner as my problematic query. That would explain the TempDB growth, as well as the DB growth itself as it was duplicating data on refresh/insert. I've since given the knowledge over to the offending person who made the change and showed the repercussions of that action and how wide-spread something like that can be.

Anyways, I can't thank y'all enough for taking the time to help a wannabe DBA in distress...I can finally sleep again without fear of being paged! THANK YOU SO MUCH FOR YOUR ASSISTANCE AND TIME!!!!

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 14

Glad you found a root cause, and were smart enough to look for other similar situations!

Sounds like a use case for http://www.red-gate.com/products/dlm/dlm-dashboard/ :w00t:

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service