| The Complete Weekly Roundup of SQL Server News | Hand-picked content to sharpen your professional edge |
| Representing uncertainty in difficult times As a data person, I'm appalled at the errors made by many journalists in reporting the spread of the Covid-19 pandemic. I'm aghast at the way that they can publish fantastic nonsense that would have caused apoplexy to any peer reviewer in a scientific journal. However, what worries me the most is how uncertain data, initially provided with caveats, is eventually rendered to the public as certainty. It brings me to a more general point. How should we, as responsible data people, represent uncertainty in data? Obviously, we need to track the course of any pandemic as accurately as possible. To do this we need whatever figures are available. Once a figure is recorded, however, it takes on an undeserved aura of truth. It could have been a rough estimate, or a best effort in the light of a blaze of contradictory evidence. Most sources of the data for the incidence of covid-19 are collated and published with caveats and warnings for the unwary. They are intended for a scientific audience that understands the constraints and uncertainties. When this data reaches the press, the temptation is to compare them in different regions or countries, and use the results to further almost any political cause; a temptation that is seldom resisted. The data was usually published as a daily accumulation, from zero, to the total. When figures are uncertain, this makes sense because, if you find, in retrospect, that certain facts weren't known at the time, you merely bump the total. Easy: but if this happens, the intermediate figures are invalidated. Why wouldn't deaths, for example, be certain? The major problem is attributing the cause of death. It might seem obvious, but the great majority of these deaths occur with people having other severe underlying medical conditions, or who are in general frail health. This results in a huge variation in reporting: Belgium, for example, counts all coronavirus deaths outside hospitals in its daily statistics: deaths in care homes account for 53% of the total. Belgium's official toll also includes people suspected of having died of coronavirus, without a confirmed diagnosis. Other countries count only confirmed diagnoses, even though a post-mortem could be delayed by days. Deaths in care homes tend to be under-reported in Europe, and ignored in other countries, sometimes for political or economic reasons. What about the figures for 'confirmed cases'? A minefield according to the EU's European Centre for Disease Prevention and Control. If reported deaths are so unreliable, why not just track the course of the pandemic via the excess mortality over and above the normal? The problem is accuracy. The normal variance in the death rate is too great to be useful. The scale of deaths due to the Covid-19 pandemic are within the normal variance. Certainty is elusive. How should one represent uncertainty? We used to have a clever convention where if the data was "sketchy", the graph was in sketch form, indicating the uncertainty. Some graphing packages still allow this. We should, of course, report the variance of normally-distributed data, but data like this has a high level of bias and inherent noise. This needs to be represented, because decisions made based on bad data are likely to be bad decisions. Phil Factor Join the debate, and respond to the editorial on the forums |
The Weekly News | All the headlines and interesting SQL Server information that we've collected over the past week, and sometimes even a few repeats if we think they fit. |
Vendors/3rd Party Products |
For most application developers, it’s unthinkable to work without version control. So why is the database any different? This blog lays out 6 clear benefits to applying version control best practice to your database alongside your application. |
In the latest version of SQL Change Automation, we enable you to be more flexible in your approach to Database DevOps and combine elements of both state- and migrations- based approaches. Learn more about the new workflow. |
Learn how to use SQL Clone together with Git hooks and SQL Change Automation to branch your database in Git as easily as your code. By enabling multiple copies of the same database to exist on a single instance, the integration of these tools allows seamless database branch switching. |
Develop your skills, and get support and advice from peers and members of the Redgate Advocate team. |
AI/Machine Learning/Cognitive Services |
In this post, I create the simplest possible classifier, called ZeroR, to show that even this classifier can achieve surprisingly high values for accuracy (i.e. the ratio of correctly predicted instances)… and why this is not necessarily a good thing. |
What if I have a sprawling SQL Server estate and other folks want a report on the current state of patching? |
The harm caused by attempting to shrink tempdb files. |
Analysis Services / BI on the MS Stack |
This article from Imran Quadri Syed walks the through steps involved to implement a successful Data Quality Monitoring framework that would validate incoming data against predefined data quality rules and capture data quality metrics. |
How to give Agent jobs, such as to build a database, meaningful names in Azure DevOps. |
Azure SQL Database provides the Performance overview and Query Performance insight pages, but if you want to use tools you’re familiar with, like wait statistics, DMVs and so on, you can; they just need a few tweaks. |
Azure SQL Database is a feature rich database platform that offers a wide range of options for compute and scale. Tim Radney explains the different tiers. |
Azure SQL Managed Instance |
Use SSMS without using a VNET by enabling a public endpoint for your managed instance. |
Conferences, Classes, Events, and Webinars |
Extended Events open up the whole world of special metrics and alerting available to you within SQL Monitor. Join Microsoft MVP Grant Fritchey to discover how to extend what’s possible. |
Redgate's Chris Unwin and Chris Kerswell host Coeo’s CEO Justin Langford and Principal Consultant, Andy Jones to talk database DevOps and the benefits to your organization. |
Redgate bring you new free weekly courses from the Community circle presented by Data Platform MVPs. |
In this short post, I’ll highlight two common scenarios that lead to diagonal x-axis labels—long category names on bar charts and long date labels on line graphs—and a couple ideas to try instead. |
Figuring out how you want to analyze and report rank data can be tricky. Will you tally up which choices earned respondent’s #1 rank? Top 3? Will you weight the choices in some way? |
Database Design, Theory and Development |
In concept modeling you will be defining your business vocabulary. It’s fundamental to your business knowledge blueprint. |
ETL/SSIS/Azure Data Factory/Biml |
Migrating SSIS Packages in Azure can be a complex task. SSMS 18 comes with new features for helping us to start configuring SSIS packages to run in Azure. |
When you’re working in ADF, you can schedule the pipeline to run on a schedule and you have the option to create a new trigger or select an existing one, but they have meaningless names, so how do you know which one you want? Daniel Janik fixes this. |
HA/DR/Always On/Clustering |
When first introduced the Availability Group feature was Enterprise only. However SQL Server 2016 introduced a basic version of it to Standard edition, and there have been a few enhancements since then. |
Robert Sheldon discuses emerging trends in storage like virtual SANs, intelligent storage, computational storage and storage-class memory. |
Performance Tuning SQL Server |
Depending on the size of the table, and the query you’re tuning, sometimes it’s reasonable to have a non-clustered index on a table with the same key as the table’s clustered index. |
Aaron Bertrand creates an extended events session to track checkpoint duration for databases that may benefit from changing to indirect checkpoints |
A 'hidden cost' of overindexing, which can impact transaction log throughput. |
PowerPivot/PowerQuery/PowerBI |
How to use a Power Automate custom connector in a flow. |
Power Query is notorious for being case sensitive. Even its language is case sensitive. Often though you get data from users where they are using different cases for the same data. |
Paul Turley provides a few Power BI reports on the spread of Covid-19, showing all countries with the percent of change in non-recovered cases in the past three days. |
Power BI formulas to create a "Profit and Loss" statements |
The magic of PowerQuery enables us to reuse scalar value functions within PowerQuery, and Query Folding is supported. |
This article describes how to use conditional formatting with a DAX expression to color the rows of a table in Power BI based on the order number instead of using the alternate rows color formatting option. |
All the most needed packages for data science, statistical usage and every-day usage with R. |
When you’re doing DUI operations against tables with millions of rows, you have to be really careful about SQL Server’s estimates. |
Values blocks are a really useful little bit of code in SQL Server. Basically, they are a block of defined values that you can use pretty much like any other data set. |
Grant Fritchey runs a few tests to find out... |
It seems that SQL Server only caches Parameters when it does something non-trivial with them. |
When the optimizer doesn’t estimate the correct amount of memory for a query, either memory is wasted that could be used for other processes or some operations will spill to disk. Microsoft has added Memory Grant Feedback to help overcome this issue. In this article, Greg Larsen explains what you need to know about this new feature. |
If you update a row without actually changing its contents, does it still hurt? |
“What if we just used NVARCHAR(4000) as the default datatype parameter for any query, regardless of what datatype the table has in it – like numbers or dates?” It actually works pretty well in most situations, believe it or not. |
Kendra Little tackles Query Tuning, why we need to do it, who needs to do it, the tools we need and more. |
On using Query Store to track usage of queries over time. |
Virtualization and Containers/Kubernetes |
When I shift working locations I work on my Macbook, and I still need access to a SQL Server instance. Docker is a great solution to allow me to run SQL Server containers on my Macbook without having to set up anything. |
This email has been sent to {email}. To be removed from this list, please click here. If you have any problems leaving the list, please contact the webmaster@sqlservercentral.com. This newsletter was sent to you because you signed up at SQLServerCentral.com. Note: This is not the SQLServerCentral.com daily newsletter list, and unsubscribing to this newsletter will not stop you receiving the SQL Server Central daily newsletters. If you want to be removed from that list, you can follow the instructions on the daily newsletter. |
|
|