SQLServerCentral Article

Notebook Jobs in Azure Data Studio

,

Azure Data Studio (ADS) is a lightweight IDE built on Visual Studio Code. I've written a few articles on how the tool works, and this one continues the series. In this article, I want to look at the way Notebook jobs can be created in Azure Data Studio.

The other articles in this series on ADS works are listed in The Mastering Azure Data Studio Series.

You can download the tool and read those articles to learn about how to use this in writing code. I've gotten more comfortable with the tool, and while I still like SSMS, there are some features in ADS that I prefer.

SQL Agent and Notebook Jobs

In a previous article, we covered the SQL Agent management in ADS. That plugin really just mimics what SSMS has for managing jobs, albeit with a few bugs.

However, the Agent extension has a new feature: notebook jobs. If I select the SQL Agent section in the server overview (right-click a server, Manage), I see a Notebooks tab across the top. Selecting this shows me my notebook jobs.

Notebook job listing

I have none here, but let's see how this works.

Creating a Notebook Job

First, create a notebook. I've got a basic one with some of Glenn Berry's Diagnostic queries.

Notebook with queries

If I click "New Notebook Job", I see a blade on the right side.

new notebook blade

I've filled in a few things, but this contains the information about my notebook. Obviously you need a notebook first, and I've selected one I created previously. I also had to pick two databases: one for storage and one for execution. The queries in this notebook don't need a database specifically, so I've picked the same database for both. The name of the notebook gets added as the job name.

Notebook job blade filled in

I needed to pick a schedule, so I did that.

Choosing a schedule

Once this is done, I return to the blade. Before I click OK, let's look at my instance. Here is my dba database, which is where I picked to store the notebook and its execution results. Nothing really here.

Almost empty database

When I click OK to create the job, I see this in ADS. My job is listed, with the name I gave it.

I also see two new tables in my DBA database. One of these is to store the notebook template, and one stores the results from runs, which are copies of the notebooks with the cells executed.

new tables in my database

If I look at the tables, I see that the template table essentially maps a notebook (stored as nvarchar max) to a job and has the database and the last run error. The materialized table also has the run time information.

table structures

Running a Job

Obviously I set a schedule and the job will run then. However, if I right click the job in ADS, I can select run. I can also just start the job from SQL Agent (SSMS or sp_start_job).

Right click job to run

When I do this, I get a message in the lower right hand corner of ADS.

Status message

In a few seconds, I click refresh and I see more data about my job in the list. I see the last run time, next time, status, and the first bar of a graph that indicates successful runs.

Job contains last run information

If I run this multiple times, I'll see more bars. It looks like the max I see is the last 5 bars.

Bars of last 5 runs

If I click the job, I get details for this job. I can see the overview with some metadata (target, last/next run, schedule) and below that the last 5 runs.

Notebook job details

I can double click one of the runs and a notebook opens in ADS. You can see that I have execution results after the code cells. I see the datestamp in the first one.

notebook filled out with details

Below this, I see other results from the queries.

notebook query 3 results

If I look in the tables, I see my results. I see the same template, but now my multiple runs in the materialized table. I could always extract the notebook from this table if I wanted something more than the results of the last 5 runs.

table results of notebook runs.

I don't know how many runs I can see in ADS, but I ran this 12 times and got a new section (Past Runs) in the details for the job.

12 results in ADS

If I delete the job in ADS, the tables and the job definition go from SQL Server. I right click to delete.

delete job

No more tables.

the tables are missing

No more job.

Job deleted from instance

Be very careful here.

What's in the Job Definition?

If I look at the job in SSMS, it looks like any job. You can see the things I entered below, and the job is owned by me since I created it.

job properties

The Steps are more interesting. There is one, and it's a PowerShell step. There is no "notebook job" type you can choose.

job step metadata

Instead, ADS adds a bunch of PowerShell to run your notebook.

PoSh script to run the notebook

I won't detail what happens, but there are a few functions called to essentially run the notebook with  a loop through all the cells. I'm surprised this doesn't use Invoke-SqlNotebook, but what do I know.

The full script (for my job/instance) is here:

$TargetDatabase = "dba"
$JobId =  "$(ESCAPE_SQUOTE(JOBID))"
$StartTime = "$(ESCAPE_SQUOTE(STRTTM))"
$StartDate = "$(ESCAPE_SQUOTE(STRTDT))"
$JSONTable = "select * from notebooks.nb_template where job_id = $JobId"
$sqlResult = Invoke-Sqlcmd -Query $JSONTable -Database $TargetDatabase -MaxCharLength 2147483647
$FirstNotebookError = ""
function ParseTableToNotebookOutput {
    param (
        [System.Data.DataTable]
        $DataTable,
        [int]
        $CellExecutionCount
    )
    $TableHTMLText = "<table>"
    $TableSchemaFeilds = @()
    $TableHTMLText += "<tr>"
    foreach ($ColumnName in $DataTable.Columns) {
        $TableSchemaFeilds += @(@{name = $ColumnName.toString() })
        $TableHTMLText += "<th>" + $ColumnName.toString() + "</th>"
    }
    $TableHTMLText += "</tr>"
    $TableSchema = @{ }
    $TableSchema["fields"] = $TableSchemaFeilds
    $TableDataRows = @()
    foreach ($Row in $DataTable) {
        $TableDataRow = [ordered]@{ }
        $TableHTMLText += "<tr>"
        $i = 0
        foreach ($Cell in $Row.ItemArray) {
            $TableDataRow[$i.ToString()] = $Cell.toString()
            $TableHTMLText += "<td>" + $Cell.toString() + "</td>"
            $i++
        }
        $TableHTMLText += "</tr>"
        $TableDataRows += $TableDataRow
    }
    $TableDataResource = @{ }
    $TableDataResource["schema"] = $TableSchema
    $TableDataResource["data"] = $TableDataRows
    $TableData = @{ }
    $TableData["application/vnd.dataresource+json"] = $TableDataResource
    $TableData["text/html"] = $TableHTMLText
    $TableOutput = @{ }
    $TableOutput["output_type"] = "execute_result"
    $TableOutput["data"] = $TableData
    $TableOutput["metadata"] = @{ }
    $TableOutput["execution_count"] = $CellExecutionCount
    return $TableOutput
}
function ParseQueryErrorToNotebookOutput {
    param (
        $QueryError
    )
    $ErrorString = "Msg " + $QueryError.Exception.InnerException.Number +
    ", Level " + $QueryError.Exception.InnerException.Class +
    ", State " + $QueryError.Exception.InnerException.State +
    ", Line " + $QueryError.Exception.InnerException.LineNumber +
    "`r`n" + $QueryError.Exception.Message
    
    $ErrorOutput = @{ }
    $ErrorOutput["output_type"] = "error"
    $ErrorOutput["traceback"] = @()
    $ErrorOutput["evalue"] = $ErrorString
    return $ErrorOutput
}
function ParseStringToNotebookOutput {
    param (
        [System.String]
        $InputString
    )
    $StringOutputData = @{ }
    $StringOutputData["text/html"] = $InputString
    $StringOutput = @{ }
    $StringOutput["output_type"] = "display_data"
    $StringOutput["data"] = $StringOutputData
    $StringOutput["metadata"] = @{ }
    return $StringOutput
}
$TemplateNotebook = $sqlResult.notebook
$executeDatabase = $sqlResult.execute_database
try {
    $TemplateNotebookJsonObject = ConvertFrom-Json -InputObject $TemplateNotebook
}
catch {
    Throw $_.Exception
}
$DatabaseQueryHashTable = @{ }
$DatabaseQueryHashTable["Verbose"] = $true
$DatabaseQueryHashTable["ErrorVariable"] = "SqlQueryError"
$DatabaseQueryHashTable["OutputAs"] = "DataTables"
$DatabaseQueryHashTable["Database"] = $executeDatabase
$CellExcecutionCount = 1
foreach ($NotebookCell in $TemplateNotebookJsonObject.cells) {
    $NotebookCellOutputs = @()
    if ($NotebookCell.cell_type -eq "markdown" -or $NotebookCell.cell_type -eq "raw" -or $NotebookCell.source -eq "") {
        continue;
    }
    switch($NotebookCell.source.getType()){
        System.Object[] {
            $DatabaseQueryHashTable["Query"] = ($NotebookCell.source -join "`r`n" | Out-String)
        }
        String  {
            $DatabaseQueryHashTable["Query"] = $NotebookCell.source
        }
    }
    $SqlQueryExecutionTime = Measure-Command { $SqlQueryResult = @(Invoke-Sqlcmd @DatabaseQueryHashTable  4>&1) }
    $NotebookCell.execution_count = $CellExcecutionCount++
    $NotebookCellTableOutputs = @()
    if ($SqlQueryResult) {
        foreach ($SQLQueryResultElement in $SqlQueryResult) {
            switch ($SQLQueryResultElement.getType()) {
                System.Management.Automation.VerboseRecord {
                    $NotebookCellOutputs += ParseStringToNotebookOutput($SQLQueryResultElement.Message)
                }
                System.Data.DataTable {
                    $NotebookCellTableOutputs += ParseTableToNotebookOutput $SQLQueryResultElement  $CellExcecutionCount
                }
                Default { }
            }
        }
    }
    if ($SqlQueryError) {
        if(!$FirstNotebookError){
            $FirstNotebookError = $SqlQueryError.Exception.Message.Replace("'", "''")
        }
        $NotebookCellOutputs += ParseQueryErrorToNotebookOutput($SqlQueryError)
    }
    if ($SqlQueryExecutionTime) {
        $NotebookCellExcutionTimeString = "Total execution time: " + $SqlQueryExecutionTime.ToString("hh\:mm\:ss\.fff")
        $NotebookCellOutputs += ParseStringToNotebookOutput($NotebookCellExcutionTimeString)
    }
    $NotebookCellOutputs += $NotebookCellTableOutputs
    $NotebookCell.outputs = $NotebookCellOutputs
}
$result = ($TemplateNotebookJsonObject | ConvertTo-Json -Depth 100)
Write-Output $result
$result = $result.Replace("'","''")
$InsertQuery = "INSERT INTO notebooks.nb_materialized (job_id, run_time, run_date, notebook, notebook_error) VALUES ($JobID, '$StartTime', '$StartDate','$result','$FirstNotebookError')"
$SqlResult = Invoke-Sqlcmd -Query $InsertQuery -Database $TargetDatabase
$InsertQuery = "UPDATE notebooks.nb_template SET last_run_notebook_error = '$FirstNotebookError' where job_id = $JobID"
$SqlResult = Invoke-Sqlcmd -Query $InsertQuery -Database $TargetDatabase

The one thing I will note is that this script doesn't seem to work with named instances. If I run a notebook on a named instance in ADS, I see this (nothing opens if I click):

failed notebook run

If I go into SSMS and look at the history, I see this, a failure in line 6, which is the Invoke-SqlCmd step:

Failed job step details

I suspect this is failing as I don't have the same databases on this instance and the cmdlet is looking at the default instance. Or maybe something isn't escaped. I don't know that I want to debug this PoSh since anything I did to edit the job would still fail.

Summary

In this article, we looked at how to create, run, and manage notebook jobs in ADS. Notebooks are an interesting idea, and for a team, this is a nice way to get a bunch of queries executed, and then stored, with some documentation or explanation around them. I don't know how many runs you can keep, but I suspect enough to make this a nice tool for a team of DBAs.

The downside is that if you delete a job, it seems to delete history from your tables. If you delete the last job, your tables go away. That's a poor design, IMHO. Don't delete the tables or data. There also appear to be problems with named isntances.

This might be a good tool for you, but be careful, don't store stuff in msdb/master, and backup regularly.

Rate

5 (2)

You rated this post out of 5. Change rating

Share

Share

Rate

5 (2)

You rated this post out of 5. Change rating