Database Weekly for {date}

Problems displaying this newsletter? View online.

The Complete Weekly Roundup of SQL Server News

Hand-picked content to sharpen your professional edge

Editorial

Using Big Data to Improve Health

One of the frustrations of medical research is in getting good representative data. It should, you might suppose, be easy: you just scoop up all the medical records into a huge medical database, anonymize the data, and then just sharpen up your Python skills. Sigh; nope.

The problem with the main 'big-data' medical sets such as the IBM Watson/Truven database is that they largely represent only people with medical insurance. This doesn't always cause problems, provided all conclusions bear in mind that it is a skewed population. This is why the medical records of a country such as the UK are potentially so valuable for research: they represent an entire population. With the help of a database that provides information on this scale, we can do a great deal more to advance medical and pharmaceutical science.

What, if anything, stands in our way? The first problem is that such records can't reliably be pseudonymized or 'de-identified'. If you are researching individuals, and you know quite a bit about them, there are likely to be a couple of unusual characteristics that will allow you to match the records to the individual. Researchers have shown that it can result in a match rate of over 90%.

The second problem is a regulatory problem. Medical records in Europe are not owned by the entity that collects or stores the data but to the patient. Although few people will disagree with the intention of allowing their records to be used, many will refuse consent because there are doubts about security. If there is a breach, you can't change your medical history as you would your password. The GDPR's guidance is that confidential patient information can only be used by hospital and university researchers, medical royal colleges and pharmaceutical companies researching new treatments.

The third problem is the poor general understanding of the constraints of statistics. Statistical methods should come with the same warnings as a chain-saw. We still haven't reversed out of the statistical nightmares that presented a false '40%' conclusion about the value of statins in reducing cholesterol, certain types of which we now discover we need in spadesful for good health, unlike statins.

The fourth problem is that there is no central database for medical records. It is difficult for this to happen because of the poor "interoperability" of the data, and the inability of the many health information systems to work together to join up the many separate, and sometimes warring 'care settings' making up the NHS (National Health Service).

A fifth problem is bad data; by which I mean poor data quality, completeness and accuracy. There have been many reports of potentially incorrect codes being used to record illnesses and treatments, as well as missing or invalid identifiers, such as NHS numbers.

It is typical of projects of this sort that few, if any, problems involve database technology. We have all the analysis tools we need. The tasks we face are mainly organizational and they'll take time to resolve.

Phil Factor

Join the debate, and respond to the editorial on the forums

The Weekly News

All the headlines and interesting SQL Server information that we've collected over the past week, and sometimes even a few repeats if we think they fit.

Vendors/3rd Party Products

Learn the fundamentals of Redgate Products

Redgate invite you to look around their newly refurbished University. If you are new to their tools there are simple, easy to follow courses to get you up and running, and if you are a long-time customer you can check out the recently added courses which take you through some more technical tips, tricks and troubleshooting.

SQL Change Automation 4.0: Collaborative Database Development Across Visual Studio and SQL Server Management Studio

SQL Change Automation's Development component for developing new databases and modifying existing databases, using migrations, now integrates directly into SQL Server Management Studio as well Visual Studio. It allows teams to collaborate effectively during development, regardless of their preferred IDE, and in a way that integrates easily with common build/integration servers and release management tools.

DevOps Self-Service for Databases with SQL Clone

Phil Factor describes the freedom of being able to "self-serve" databases, during testing and development, and explains how it works with SQL Clone.

You just Build It don’t you? 12 Common Database Build Blockers

Database deployments, like the sheep of exasperated hill-farmers, often find strange and unexpected ways to self-destruct. Phil Factor describes the most common things that can go wrong, and how a reliable automated database build process can prevent messy accidents.

Are you making the most of your Redgate Tools?

Whether you are a long time Redgate customer or have recently made a purchase you can find helpful and easy to follow video courses on Redgate University. With the brand-new look and the recently added courses you will soon be getting the most from your Redgate tools.

Redgate need users looking to do CI/CD for Oracle Databases

Are you working with Oracle databases or know somebody that is? Does a company or client that you are working with also have an Oracle counterpart? If so, we want to talk to you/them about how they are currently working. We are currently running an Early Access Program (EAP) for a migrations-based solution for Oracle users. Sign up today, get the beta release and let us know your feedback!

Administration

DBA Training Plan 22: Where to Learn More

Pick your specialty, pick your learning method...

DBA Training Plan 21: Building a New SQL Server

Brent Ozar's not-so-secret SQL Server setup checklist.

SQL Server 2019: WAIT_ON_SYNC_STATISTICS_REFRESH Redux

If you’re up in the billion row range, automatic stats creation and updates might just start to hurt. On SQL Server 2019, you’ll have some evidence for when refreshes take a long time, but still nothing for when the initial creation takes a long time.

Azure Databricks and Spark

Adventures in the TCP stack: Uncovering performance regressions in the TCP SACKs vulnerability fixes

Applying Linux patches for the TCP SACKs vulnerabilities caused some regressions when running the Databricks Runtime (DBR) on the Amazon Web Services (AWS) platform. This article describes the analysis that determined the TCP stack was the source of the degradation.

Azure SQL Database

Azure SQL Database Types

A quick summary post of the various different types of Azure SQL Database.

Backup and Recovery

Accelerated Database Recovery

Accelerated Database Recovery for Azure SQL Databases and SQL Server 2019 will potentially cut the recovery time for canceled/rolled-back transactions and the time spent on database recovery.

Different ways to start a SQL Server single user mode

On the various ways to start SQL Server in single user mode, for example when you need to restore a master database or other system databases from backup.

Computing in the Cloud (Azure, Google, AWS)

Building Storage for SQL Server (and other database) Virtual Machines in the Cloud

Having seen a few improperly configured VMs lately, Joey D'Antoni explains how storage works in Azure, versus the way we traditionally did things on-premises.

Conferences, Classes, Events, and Webinars

Successfully Migrate Your Databases to Azure

OnWednesday October 2 16.00-17.00 BST/ 10.00-11.00 Central join John Sterrett, CEO of Procure SQL, to learn the difference between Azure SQL Database, SQL Managed Instances, Elastic Pools, and SQL Virtual Machines.

SQL in the City Summits

Redgate is hosting an upcoming series of exclusive Summits, in cities worldwide, to help senior data professionals deliver value quicker, while keeping their data safe. Find out more here and register for your nearest event

DMO/SMO/Powershell

Making the Power BI PowerShell cmdlets easier to use

Aaron Nelson describes one of his favorite ways to take control and make PowerShell easier to work with: Argument Completers.

PowerShell 7 Preview 4

New features and bug fixes in the latest preview of PowerShell 7, currently targeting December 2019 for a Release Candidate and January 2020 for General Availability.

How to Use Parameters in PowerShell

Every DBA should have basic PowerShell skills. In this article, Greg Moore explains how to write a PowerShell script that takes parameters.

ETL/SSIS/Azure Data Factory/Biml

ADF Data Flows: Generate models from drifted columns

When you build transformations that need to handle changing source schemas, your logic becomes tricky. This article discusses the possible approaches, their pros and cons.

Using Project Connections In SSIS

Tim Mitchell explains the basics of how project connections in SSIS can reduce the number of distinct connection managers in a single project.

Overcome Challenges of Lifting and Shifting SSIS to Azure (webinar)

Andy Leonard demonstrates the challenges of scripting SSIS Catalog Environments, Environment Variables, References, Reference Mappings, and Literals for deployment to another SSIS Catalog.

General

SQLSaturday #911 - Orlando 2019

On Sept 28, 2019, a free day of SQL Server training in Orlando, FL, US.

SQLSaturday #906 - São Paulo 2019

On Sept 28, 2019, a free day of SQL Server training in Sao Paulo, Brazil.

SQLSaturday #868 - Kharkiv 2019

On Sept 28, 2019, a free day of SQL Server training in Kharkiv, Ukraine.

SQLSaturday #904 - Madrid 2019

Sept 28, 2019, a free day of SQL Server training in Madrid, Spain.

SQLSaturday #914 - Torun 2019

Sept 28, 2019, a free day of SQL Server training in Torun, Poland.

Hardware

Mid-Range 1440P AMD Gaming PC Build Guide

Glenn Berry's build guide for a Mid-Range 1440P AMD Gaming PC ($750.00 to $1500.00 for the main components).

SQL Server Unequal NUMA CPU engagement

With physical servers using NUMA configuration - or vms using vNUMA - a number of factors can lead to uneven CPU utilization among the NUMA nodes.

MDX/DAX

Strong and weak relationships in Power BI

This article describes what weak relationships are and the differences between strong and weak relationship in Power BI and DAX.

Performance Tuning SQL Server

How to Find Top N Largest Time Gaps in Date Ranges in SQL Server with T-SQL Code

A T-SQL tool to help application programmers analyze application script performance issues based on their debug log.

How Forced Parameterization in SQL Server Affects Filtered Indexes

Forced parameterization is a valuable feature in SQL Server, reducing plan cache bloat by not introducing a new plan for every single combination of predicate literals. However, if you use filtered indexes, those indexes could be rendered useless.

SQL Server Statistics Health Reminder

A reminder that keeping your statistics up to date is one of the more critical components of maintaining good SQL Server query performance.

PowerPivot/PowerQuery/PowerBI

Compare Power BI files with Power BI Comparer tool

A Power BI Comparer-tool makes it super easy to compare all properties of 2 Power BI files.

Power BI Custom Format String Examples, Part 2: Dates And Times

Chris Webb explains some useful date and time formats that aren’t covered well by the examples built into Power BI Desktop, and highlights a few things that aren’t possible right now.

Power BI Desktop Dashboard for IIS Log Analysis

When you need to explain to a manager, or product stakeholders what the IIS web server logs say, you need visuals, or a dashboard, not command line output. Enter Power BI.

Power BI Refresh Failed – Trouble shooting

Some Power BI datasets have been successfully refreshed but there have also been some inconsistent refresh failures.

Some Methods for Calculating Quarter in DAX for Power BI

There is no DAX Quarter function in Power BI, but Reza Rad explains some easy methods for calculating it.

Preventing & Managing Refresh Failures in Power BI

How to prevent a refresh failure caused by duplicates appearing in a lookup table after the report has been built, the model has been loaded to PowerBI.com and the scheduled refresh has been set up using a gateway.

Only Get the Last Few Periods of Data into Power BI using Power Query Filtering

Power BI reports can get slow if you have huge amount of data. One of the ways to speed up the performance is to load only the part that is needed for analysis.

Reporting Services

Tuning SQL Server Reporting Services

This article will cover many areas that database administrators need to be aware of in order to properly license, recover, and tune a Reporting Services installation. These topics apply to both SQL Server Reporting Services as well as Power BI Report Server.

Security News and Issues

Banks, Arbitrary Password Restrictions and Why They Don't Matter

Tory Hunt states the case against arbitrary password restrictions on banks such as short max lengths and disallowed characters, but also why they might not matter.

T-SQL

SQL Server Stored Procedure Context Switching and Impersonation Example

Context switching refers to modifying the context in which code operates from one user to another. The same code can function differently across user contexts because database role memberships and permissions may differ between users.

Version Store Usage for ONLINE Column Operations

On investigating the cause of inexplicable version store growth, Josh the Coder discovers that when you update the values in the column being altered, during an online column operation, the tempdb version store is used to store the new values until the operation completes.

What’s Faster? COUNT(*) or COUNT(1)?

One of the biggest and undead myths in SQL is that COUNT(*) is faster than COUNT(1). Or was it that COUNT(1) is faster than COUNT(*)? Impossible to remember, because there’s really no reason at all why one should be faster than the other. But is the myth justified? Let’s measure!

A Bug With OBJECTPROPERTYEX And Column Store Indexes

A bug when you use OBJECTPROPERTYEX to get the cardinality of a clustered columnstore index.

The Hidden Modal Window Behind SSMS

Suddenly, the SSMS window will not register any mouse-clicks although it will seemingly look fine. The reason is because there is a Modal Window somewhere hidden.

Calculating invoice settlements with window functions

A fresh take at computing the time-to-payment on a large number of invoices, with multiple, overlapping, partial payments.

How to import JSON file into a SQL recordset with OPENJSON

Is it possible to read a simple JSON file using T-SQL? It certainly is.

Tools for Development

Automated blue/green database deployments

Bob Walker explains some techniques to make blue/green deployments with a database achievable and straightforward.

Virtualization and Containers/Kubernetes

Using kubectl logs to read the SQL Server Error Log in Kubernetes

When working with SQL Server running containers the Error Log is written to standard out. Kubernetes will expose that information to you via kubectl. Let’s check out how it works.

Non-Root SQL Server 2019 Containers

SQL Server 2019 containers are now safer, because they start the SQL Server process as a non-root user, by default.

Running SQL Server containers as non-root

The new SQL Server dockerfile, in the mssql-server-linux-non-root directory of the mssql-docker repository on Github, allows SQL Server containers to run as non-root. Andrew Pruski explains why this is a good thing.

This email has been sent to {email}. To be removed from this list, please click here. If you have any problems leaving the list, please contact the webmaster@sqlservercentral.com. This newsletter was sent to you because you signed up at SQLServerCentral.com. Note: This is not the SQLServerCentral.com daily newsletter list, and unsubscribing to this newsletter will not stop you receiving the SQL Server Central daily newsletters. If you want to be removed from that list, you can follow the instructions on the daily newsletter.