June 29, 2018 at 6:57 am
Hi,
I have an sql server 2017 enterprise and sometimes (not always), when it runs a job (the jobs he has problem with – not every day – it is t-sql) it starts running, and it doesn’t stop, but after a while it also does nothing, and if you check the statistics, even the job is running, and it didn’t finish, the server is doing nothing.
I usually restart the server and then it works (I have tried flushing the cache without restarting the server, but didn’t help).
Any ideas what I need to look for? It is only on one server (the test server runs the same jobs without any problem.)
And we are on virtual servers.
Ideas on what to do will be welcome.
Thanks
June 29, 2018 at 7:05 am
astrid 69000 - Friday, June 29, 2018 6:57 AMHi,
I have an sql server 2017 enterprise and sometimes (not always), when it runs a job (the jobs he has problem with – not every day – it is t-sql) it starts running, and it doesn’t stop, but after a while it also does nothing, and if you check the statistics, even the job is running, and it didn’t finish, the server is doing nothing.I usually restart the server and then it works (I have tried flushing the cache without restarting the server, but didn’t help).
Any ideas what I need to look for? It is only on one server (the test server runs the same jobs without any problem.)
And we are on virtual servers.
Ideas on what to do will be welcome.
Thanks
Is this for just one particular job? Can you post the t-sql from the problem job? Is there anything in the job history or the error log?
Sue
June 29, 2018 at 7:08 am
Is there any prolonged blocking?
June 29, 2018 at 7:13 am
There is no error in the log, there is not prolonger blocking.
It happen in two different jobs.
One jobs is an ssis package the basically runs merge statements.
the second job is just a count records on a table.
when we look at the job (for example running for 6 hours, something that is not supposed to run for more than 1) you see the server doing nothing just nothing.
it is so weird
June 29, 2018 at 7:26 am
I need to add that we don't have a huge server, the virtual server is 32, and we have allocated to the server 16 of those (we also run ssis and ssas on there).
can it be that the memory gets low and the server empty pages and one of the pages is the execution plan of the job?
June 29, 2018 at 7:48 am
astrid 69000 - Friday, June 29, 2018 7:26 AMI need to add that we don't have a huge server, the virtual server is 32, and we have allocated to the server 16 of those (we also run ssis and ssas on there).
can it be that the memory gets low and the server empty pages and one of the pages is the execution plan of the job?
Good thought but it wouldn't work that way. However, you are right and that really isn't much memory and memory may or may not be an issue. Do you get any memory errors being logged in the error log on that server? Or in the Windows event logs?
You originally said " if you check the statistics, even the job is running, and it didn’t finish, the server is doing nothing"
What statistics did you check and where - what query or were you checking something in Management Studio?
And how do you know the job is running - was this in Job Activity monitor or something else?
This server and the test server where you have no problems - are they pretty much the same in terms of versions, hardware, configurations, security/permissions?
Sue
June 29, 2018 at 9:25 am
Is there any maintainance jobs scheduled to that DB.
Have checked Indexes.
Try to run updatestatistics on the DB if not done already.
Thank You.
Regards,
Raghavender Chavva
June 29, 2018 at 9:59 am
Sue_H - Friday, June 29, 2018 7:48 AMastrid 69000 - Friday, June 29, 2018 7:26 AMI need to add that we don't have a huge server, the virtual server is 32, and we have allocated to the server 16 of those (we also run ssis and ssas on there).
can it be that the memory gets low and the server empty pages and one of the pages is the execution plan of the job?Good thought but it wouldn't work that way. However, you are right and that really isn't much memory and memory may or may not be an issue. Do you get any memory errors being logged in the error log on that server? Or in the Windows event logs?
You originally said " if you check the statistics, even the job is running, and it didn’t finish, the server is doing nothing"
What statistics did you check and where - what query or were you checking something in Management Studio?
And how do you know the job is running - was this in Job Activity monitor or something else?
This server and the test server where you have no problems - are they pretty much the same in terms of versions, hardware, configurations, security/permissions?Sue
We are not seeing any memory errors.
when we check the Job activity monitor the job is shown as running. Idera monitoring tools show the job as active (sometimes sleeping but I haven't yet been able to determine what that means really) when you check the memory usage and disk usage in task manager the server does not appear to be under any stress at all.
the two servers are virtually identical aside from one has 4 processors allocated to it and one has 2.
June 29, 2018 at 10:00 am
Raghavender Chavva - Friday, June 29, 2018 9:25 AMIs there any maintainance jobs scheduled to that DB.Have checked Indexes.
Try to run updatestatistics on the DB if not done already.
no maintenance jobs scheduled at that time. have checked indices for what? update statistics is performed regularly...
July 2, 2018 at 7:44 am
cford 46557 - Friday, June 29, 2018 9:59 AMSue_H - Friday, June 29, 2018 7:48 AMastrid 69000 - Friday, June 29, 2018 7:26 AMI need to add that we don't have a huge server, the virtual server is 32, and we have allocated to the server 16 of those (we also run ssis and ssas on there).
can it be that the memory gets low and the server empty pages and one of the pages is the execution plan of the job?Good thought but it wouldn't work that way. However, you are right and that really isn't much memory and memory may or may not be an issue. Do you get any memory errors being logged in the error log on that server? Or in the Windows event logs?
You originally said " if you check the statistics, even the job is running, and it didn’t finish, the server is doing nothing"
What statistics did you check and where - what query or were you checking something in Management Studio?
And how do you know the job is running - was this in Job Activity monitor or something else?
This server and the test server where you have no problems - are they pretty much the same in terms of versions, hardware, configurations, security/permissions?Sue
We are not seeing any memory errors.
when we check the Job activity monitor the job is shown as running. Idera monitoring tools show the job as active (sometimes sleeping but I haven't yet been able to determine what that means really) when you check the memory usage and disk usage in task manager the server does not appear to be under any stress at all.
the two servers are virtually identical aside from one has 4 processors allocated to it and one has 2.
What does the job do? Can you post the t-sql that just stops but keeps running?
Check the job schedule to make sure it's not continuous and check the on success and on failure steps to see what the job would do in either case.
Sue
July 3, 2018 at 6:37 am
Good morning,
Right now it is running, since I restarted the server, the problem is that it doesn't happen every day.
The job is an ssis package that calls store procedures that have merge statements.
On fail it will stop the job as we need this db loaded before we can continue and on success will go to the next step.
CREATE PROCEDURE [LoadStoreProc_sp] @LoadDate datetime
AS
DECLARE
@_LoadDate DATETIME = @LoadDate
BEGIN
SET NOCOUNT ON;
DECLARE
@SummaryOfChanges TABLE(Change VARCHAR(20));
MERGE
MyDB.dbo.MyTable
AS Target --
USING
(
SELECT
DISTINCT
CONVERT(CHAR(40), HASHBYTES('SHA1', LTRIM(RTRIM(COALESCE(NULL,'')))), 2)
AS HashKey --
, @_LoadDate
AS LoadDate
, 'MyDBOne.dbo.MyOtherTable'
AS RecordSource
, @_LoadDate
AS LoadDate
, ''
AS [ID]
UNION
SELECT
DISTINCT
HashKey --
, LoadDate
, RecordSource
, LoadDate
, [ID]
FROM
MyDBOne.dbo.MyOtherTable --
WHERE
LoadDate = @_LoadDate
)
AS Source
(
HashKey --
, LoadDate
, RecordSource
, LastSeenDate
, PartID
)
ON
(
Target.[PartID] = Source.[PartID] AND
Target.[RecordSource] = Source.RecordSource
)
WHEN MATCHED THEN
UPDATE
SET LastSeenDate = Source.LastSeenDate
WHEN NOT MATCHED THEN
INSERT
(
HashKey
, LoadDate
, RecordSource
, LastSeenDate
, [PartID]
) --
VALUES
(
Source.HashKey
, Source.LoadDate
, Source.RecordSource
, Source.LastSeenDate
, Source.[PartID]
) --
OUTPUT
$action INTO @SummaryOfChanges;
END
GO
July 3, 2018 at 1:40 pm
astrid 69000 - Tuesday, July 3, 2018 6:37 AMGood morning,
Right now it is running, since I restarted the server, the problem is that it doesn't happen every day.
The job is an ssis package that calls store procedures that have merge statements.
On fail it will stop the job as we need this db loaded before we can continue and on success will go to the next step.
CREATE PROCEDURE [LoadStoreProc_sp] @LoadDate datetime
AS
DECLARE
@_LoadDate DATETIME = @LoadDateBEGIN
SET NOCOUNT ON;
DECLARE
@SummaryOfChanges TABLE(Change VARCHAR(20));
MERGE
MyDB.dbo.MyTable
AS Target --
USING
(
SELECT
DISTINCT
CONVERT(CHAR(40), HASHBYTES('SHA1', LTRIM(RTRIM(COALESCE(NULL,'')))), 2)
AS HashKey --
, @_LoadDate
AS LoadDate
, 'MyDBOne.dbo.MyOtherTable'
AS RecordSource
, @_LoadDate
AS LoadDate
, ''
AS [ID]
UNION
SELECT
DISTINCT
HashKey --
, LoadDate
, RecordSource
, LoadDate
, [ID]
FROM
MyDBOne.dbo.MyOtherTable --
WHERE
LoadDate = @_LoadDate
)
AS Source
(
HashKey --
, LoadDate
, RecordSource
, LastSeenDate
, PartID
)
ON
(
Target.[PartID] = Source.[PartID] AND
Target.[RecordSource] = Source.RecordSource
)
WHEN MATCHED THEN
UPDATE
SET LastSeenDate = Source.LastSeenDate
WHEN NOT MATCHED THEN
INSERT
(
HashKey
, LoadDate
, RecordSource
, LastSeenDate
, [PartID]
) --
VALUES
(
Source.HashKey
, Source.LoadDate
, Source.RecordSource
, Source.LastSeenDate
, Source.[PartID]
) --
OUTPUT
$action INTO @SummaryOfChanges;
END
GO
Thanks for posting all of that.
It very well could be that it's actually the package hanging and not the job. Do you have logging enabled for the package? If not, enable that.
If the job has multiple steps, you should be able to see what step it is hung on which might also help you narrow things down. You could also add another step after the package to create some dummy table or whatever and see if the job can actually get to that step.
Sue
July 4, 2018 at 2:40 pm
Do you mean to enable the login for the the ssis package?
I will check that out.
Last night also stopped after working for 4 days straight and after I restarted the server (twice), it worked perfectly.
We did have the same issue before with a job that was calling directly a store procedure.
Maybe the servers are hunted :).
July 4, 2018 at 3:01 pm
astrid 69000 - Wednesday, July 4, 2018 2:40 PMDo you mean to enable the login for the the ssis package?
I will check that out.
Last night also stopped after working for 4 days straight and after I restarted the server (twice), it worked perfectly.
We did have the same issue before with a job that was calling directly a store procedure.
Maybe the servers are hunted :).
I think most servers are haunted - things come up with them that scare heck out of me 🙂
I meant you can enable the SSIS package itself for logging . You can configure fairly verbose logging with SSIS.
But that might not make a difference as you say you had the same issue with a job calling just a stored procedure. Intermittent problems are so fun aren't they?
Did you check the Windows event logs when this is going on? It would be good (but hard) to know when it starts doing nothing and then check the windows event logs at that time. One other odd thing that can happen is jobs can hang (rare but I've seen it before) when they hit blocking issues related to writing to the job history. You may want to check the size of the job history table and make sure you are limiting the size on the Agent properties.
You could try querying the system_health events sessions as it does capture errors. Doesn't mean the whatever is going on could be in there but it's worth checking.
Otherwise, I'm guessing you'd need to have a server side trace or extended events session to monitor the job or jobs that have this issue. You'd really want to narrow things down on that though since it will be noisy enough just waiting for the issue to happen. You can filter down to a specific job to monitor.
Sue
.
July 10, 2018 at 10:08 pm
astrid 69000 - Wednesday, July 4, 2018 2:40 PMDo you mean to enable the login for the the ssis package?
I will check that out.
Last night also stopped after working for 4 days straight and after I restarted the server (twice), it worked perfectly.
We did have the same issue before with a job that was calling directly a store procedure.
Maybe the servers are hunted :).
I too have the same issue with 2012 Server, the job runs forever without invoking the SSIS package, restarting the SSIS services and Agent Services is the only option which fixes the issue every time, tried every suggestion posted in the comments but nothing fixed it.. thought it was a bug in 2012 I don't know that it still exists in 2017.
Sorry for my bad English..
Viewing 15 posts - 1 through 15 (of 29 total)
You must be logged in to reply to this topic. Login to reply