Email Alerts if backup job is failing

  • Hi, I would want to know how do I setup an email alert notification for events of backup failures?

    My current backup is done via a 3rd party application and not triggered from SQL Agent itself. Thus the backup admin will receive alerting for any backup failures, DBAs like myself are not in the loop as backups are solely manage by Backup Admin themself.

    However there has been time where backup admin guys missed out on the backup alert and didn't perform a retry. Thus causing transaction logs to grow too large.

    I would want to create a second alert triggered by SQL server to DBA's email to alert us if the backups failed. Is this possible?

  • Where i work we also have a backup tool, we have access to the backup tool dashboards and we have access to the policies (execute,modify,delete and create) of backups of SQL, we can filter backups per server and database to see if they are going fine or if there's and issue, we can even do reporting.

    You should be notified by the backup tool when a SQL backup fails. if they don't want you receive these emails, make sure you tell your manager and if something happens, its gonna be on them, not on you.

    The only way i think you can pull that off without the tool is you gotta develop something inside SQL Server to check MSDB backup tables and check their status.

    You could also use read  the SQL Server log, you can check when was the last backup, the database that was backed up, the type of backup..

    We have a monitoring team but they are not as reliable as the DBAs for handling backup issues or they have many other things to do..

     

  • First of all, if missing a single T-Log backup causes the T-Logs to grow too large, then you folks need to sit down and come up with some extra space for the T-Logs because that kind of stuff IS GOING TO HAPPEN.

    Second, what do you mean the backup admin guys missed an alert and didn't perform a retry???  They shouldn't be performing a simple retry... the alert should tell them WHY the failure occurred and only try a retry if it was something that was basically expected.  If it wasn't expected, they should figure out what's wrong and fix it!  If it was "one of those occasional network blips", someone really needs to be calibrated because, in this day and age, there's really no excuse for such a thing.

    And, you're paying for it so DEMAND that the DBAs be added to the failed backup alerts.  Then do as Alejandro suggests above... you (should) know the backup schedule... sample the MSDB tables or (if the 3rd party software hasn't screwed everything up), read the SQL Error log at least once for every log file backup cycle.  If there are any failures, send an email to the DBAs immediately.  Include the "backup admins" on the email only if you're not trying to get them fired for being incompetent because there's absolutely NO justifiable excuse for "backup admins" to miss a failed backup... not ever. PERIOD.

    Now, here's another hint.. where are the backups being stored and have you verified that your backup files are actually being backed up to tape or stored in long term storage on separate devices somewhere?  I trust absolutely no one when it comes to things like this because I've had way too many people be wrong or flat out lie to me with the "proof" of that being the inability of the supposed "backup team" to even come close to meeting either RPO or RTO when the chips were actually down.

    With that last thought in mind, when is the last time you've done a test restore of ANY database, especially your large super high value databases?

    Like many of the heavy hitters will tell you, a "backup plan" doesn't result in a bloody thing... you need to design and frequently test a "Restore Plan".  If you have a contract with some 3rd party for RPO and RTO, TEST THEM and make sure there's an acceptable and regular requirement to do so in the contract!  If they're claiming that they don't need to and that it will "cost extra to do so", then you've hired the wrong company and should start looking for a better company.  If those people are in-house, they need a major pork-chop session in the woodshed with a high level of management and the DBAs.

    And, yeah... I DO practice what I preach... I don't have the monster size databases that some do but I do test restore my two "money makers" (~1TB and 1/2TB) every night and I restore the others once per month (sometimes more).  Obviously, those restores are to differently named databases.  Sometimes, they're to the same machine and sometimes to a different machine.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi Jeff & Alejandro,

    Thanks so much for the insights and feedbacks. I've just join this new organization less than a month only so I'm still learning some of the process flow for this organization. On my previous job I'm the one managing all servers & database backups as well as being a DBA, so I will be able to capture backup failures fix it ASAP.

    However after moving to this new position I'm handling purely the database side of things thus backups are not transparent to me. Which is why I'm thinking of adding an additional alert towards the backups so I will be aware if things are not performing correctly. Also since my previous job also uses a 3rd party tool to perform & monitor the backups there's no alerting being set up on the DB server level as well. Which is why I'm trying to figure out how can I know if these sort of backup fails.

    If the backup are triggered via SQL agent then alerting shouldn't be an issue. But for backups initiated by 3rd party tools, I wonder if setting up the alert in SQL agent will it able to capture any failures as well?

  • JasonO wrote:

    But for backups initiated by 3rd party tools, I wonder if setting up the alert in SQL agent will it able to capture any failures as well?

    Have you looked in the MSDB tables to see if the backups are being registered there so you can do Alejandro suggested?  If the backups are not to be found there, then you'll have to ask the "backup guys" where they get stored and have your SQL Server take a look at that "place" every so often to take an inventory of files.

    The big thing here is that the "backup" folks should be happy to share some information with you because "another set of eyes" on the system will make things easier for them.  Remember that the answer is always "no" until you ask.  If they want to know "why", just explain that it would give you the necessary early warning to start watch for full log file space and, possibly, to take actions for a "soft" shutdown of databases.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • ...to alert us if the backups failed. Is this possible?

    No.  But you can alert yourself if a successful backup did not occur.  [There is a subtle distinction.]

    For example, adjust the max times below to match what you need and the script below will give you a list of db names that failed the backup conditions.  If you need help turning that list into an email, just let me know.

    DECLARE @max_hours_without_full_or_diff_backup decimal(4, 1)
    DECLARE @max_mins_without_log_backup int

    SET @max_hours_without_full_or_diff_backup = 24.5
    SET @max_mins_without_log_backup = 15

    SELECT
    database_name,
    MAX(CASE WHEN type = 'D' THEN backup_finish_date END) AS last_full_backup,
    MAX(CASE WHEN type = 'I' THEN backup_finish_date END) AS last_diff_backup,
    MAX(CASE WHEN type = 'L' THEN backup_finish_date END) AS last_log_backup
    FROM msdb.dbo.backupset
    GROUP BY
    database_name
    HAVING
    (MAX(CASE WHEN type = 'D' THEN backup_finish_date END) < DATEADD(MINUTE, -@max_hours_without_full_or_diff_backup * 60, GETDATE()) AND
    MAX(CASE WHEN type = 'I' THEN backup_finish_date END) < DATEADD(MINUTE, -@max_hours_without_full_or_diff_backup * 60, GETDATE())) AND
    (MAX(CASE WHEN type = 'I' THEN backup_finish_date END) < DATEADD(MINUTE, -@max_mins_without_log_backup, GETDATE()))
    ORDER BY
    database_name

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • When looking at backups at a company, I found a "full" backup was running every hour during the day, according to the MSDB tables. Digging a bit more, it seems a 3rd party tool DATTO is taking server snapshots every hour, which is somehow also registering as a SQL backup, TYPE = 'D'

    backupmediafamily.device_type = 7 (virtual) for the 3rd party tool, and I do not think is usable for a true SQL restore if needed.

    I now omit device_type = 7 when I look at backup history, so I think I only get true native SQL backups in my query. I don't know if type 7 is also a legit backup in other situations.

    • This reply was modified 3 years, 7 months ago by  homebrew01.
    • This reply was modified 3 years, 7 months ago by  homebrew01.
    • This reply was modified 3 years, 7 months ago by  homebrew01.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply