Problems displaying this newsletter? View online.
Database Weekly
The Complete Weekly Roundup of SQL Server News by SQLServerCentral.com
Hand-picked content to sharpen your professional edge
Editorial
 

Analyzing Breached Data

A few of you out there might be data scientists who profile data regularly. Probably a fair number of you do import/export work and learn to check data values, perhaps with counts, distincts, or other aggregates. I don't know if the performance tuners out there look at the skew of data or the details of what is in a query that needs improvement. However, all of you are likely familiar with data and trying to query it for some type of meaning.

One of the largest data breaches occurred with National Public Data. Troy Hunt analyzed the breach as a part of his work with haveIbeenpwned. The piece is an interesting analysis of the data, trying to determine both it's legitimacy as well as what is actually included in the breach. It's a fascinating read and I encourage you to look at it not just from the data analysis side, but also to be aware of what data about you is being aggregated and sold by companies.

The read is interesting as it is a bit of a detective story, digging through data in a folder, which is something I've had to do. I've had people in previous jobs just dump a bunch of data on me and ask me to load it into a database. Or a table. Often without them knowing what type of data it is, what formats, do files relate to each other? Are there multiple tables worth of data in a file? All questions I've had to ask myself (and answer), and similar to what Troy did to analyze the breach.

Data is very important to many of us, in different ways, but I'm often amazed at how few people actually understand how to organize data and ensure others can track the metadata about their data (what their data represents). I'm guessing this is why every person that gets an extract of data to load into Excel formats it in different ways.

In many cases, people want the ability to query data, but they prefer to just focus on one table that contains a lot of information. They don't want to know how to "join" data together. I think this might be the reason we see so many views in databases, and why we have views built on views. Each new client of the database needs their own view structure.

The world of data is a mess, even inside an organization. Once we start moving data between organizations, it's truly a mess. We might bemoan all the inefficiencies and work we do to move, change, and re-load data as custom, human ETL machines, but there is one great thing about this tangled web. It provides for steady, secure jobs for many of us with no end of work in sight.

Steve Jones - SSC Editor

Join the debate, and respond to the editorial on the forums

 
The Weekly News
All the headlines and interesting SQL Server information that we've collected over the past week, and sometimes even a few repeats if we think they fit.
Vendors/3rd Party Products

Connecting Easily to Data Containers from Database Tools and IDEs

Data containers can be created, destroyed and reset remarkably quickly, making them very effective for test-driven database development. Each time we reset a data container, it will have the same host, but both the port and password will be different. This article demonstrates how we can use simple PowerShell automation to smooth the process of dealing with ephemeral connection details in our IDEs and command line tasks.

AI/Machine Learning/Cognitive Services

PowerShell and AI: Create a Microsoft Word Copilot

From IT Pro - Microsoft Windows Information, Solutions, Tools

Explore a PowerShell script that allows users to interact with a Word document using natural language queries, inspired by Microsoft’s Copilot.

From Train-Test to Cross-Validation: Advancing Your Model’s Evaluation

Many beginners will initially rely on the train-test method to evaluate their models. This method is straightforward and seems to give a clear indication of how well a model performs on unseen data. However, this approach can often lead to an incomplete understanding of a model’s capabilities. In this blog, we’ll discuss why it’s important to go beyond the basic train-test split and how cross-validation can offer a more thorough evaluation of model performance.

Ask a Data Ethicist: How Can We Set Realistic Expectations About AI?

From Dataversity

One of the most important questions about using AI responsibly has very little to do with data, models, or anything technical. It has to do with the power of...

Taxonomy of Generative AI Misuse

From Schneier on Security

Interesting paper: “Generative AI Misuse: A Taxonomy of Tactics and Insights from Real-World Data“: Generative, multimodal artificial intelligence (GenAI) offers transformative potential across industries, but its misuse poses significant risks....

Administration of SQL Server

Introducing sp_CheckBackup: Check Your SQL Server Backups for Recoverability Issues

From StraightPath Solutions SQL Blog

Whether you are backing up your SQL Server databases with Ola Hallengren’s scripts, Maintenance Plans, or some third-party software, your backups are your lifeline for recovering your data in...

Observing Extended Events

From Scary DBA (Grant Fritchey)

Sure, you can right click on a running session for Extended Events and open the Live Data window, but are there other ways of observing what Extended Events is...

Azure Databricks, Spark and Snowflake

Navigating Databricks Compute Options for Cost-Effective and High-Performance Solutions

You can use Databricks for a vast range of applications these days. From handling streaming datasets, running Deep Learning models to populating data model fact tables with complex transformations, choosing the correct compute option can seem a lot like a stab in the dark followed by (potentially expensive) trial end error.

Some Great Queries Using the Query History

Now that we know so much about getting data from Snowflake using the query history, we should be able to get all kinds of data that will be helpful for us. In the next few queries, I want to provide some queries I use that should make your Snowflake usage a bit more useful.

Change Data Capture Pipeline Automation in Snowflake

From Simple Talk

This article offers technical guidance on automating updates and tracking changes of data examples using Snowflake’s CDC capabilities. The examples will be using healthcare type data, but the basic...

Azure SQL

Simplify development with Dev Container templates for Azure SQL Database

From Azure SQL

What are Dev Containers? A development container essentially packages up your project's development environment using the Development Container Specification (devcontainer.json). This specification enriches your container with metadata and content necessary...

Azure SQL Managed Instance

Using MI Link to transfer CLR binaries from SQL Server to Azure SQL Managed Instance

From MSDN Blogs

Previous posts discussed what CLR is, how we can import 3rd party DLLs and how we can use CLR to invoke REST APIs directly from Azure SQL MI. Today, we will touch upon another pain point that we’ve observed – transferring CLR assemblies from on-prem. to cloud; and we will do that by creating a brand-new MI link.

Native Windows principals for SQL Managed Instance | Data Exposed

From Azure SQL

This capability simplifies the migration to SQL Managed Instance and unblock the migration of legacy applications that are tied to windows logins. This feature plays a vital role for...

Cloud - AWS

Download AWS RDS SQL Server Backup

From MSSQL Tips

Learn how to create a backup of an AWS RDS for SQL...

Conferences, Classes, Events, and Webinars

PASS Summit Futures Scholarship

In celebration of their 25th anniversary in 2024, Redgate - host of PASS Summit - is thrilled to introduce the PASS Summit Futures Scholarship. This initiative aims to empower the next generation of data professionals from diverse backgrounds. Applications are open internationally to students and early-career professionals. Each of the 10 winners will be awarded one complimentary 3-day ticket to PASS Summit 2024 and US$2000 payment towards flights, accommodation and expenses. Applicants should ensure they can comply with the full Terms and Conditions before applying, and the deadline to submit applications is August 23, 2024.

PASS Summit Scholarships? Awesome!

From Simple Talk

This year, PASS is introducing scholarships to support individuals who might not typically have the opportunity to attend the PASS Data Community Summit. These scholarships are available to two...

Join Me At Data Saturday Dallas Sept 6-7

From Erik Darling Data

Spring Training This September, I’ll be presenting my full day training session The Foundations Of SQL Server Performance Tuning for Data Saturday Dallas. All attendees will get free access for...

Atlanta Microsoft BI Group Meeting on September 3rd (Create Code Copilots with Large Language Models)

From Prologika (Teo Lachev's Weblog)

Atlanta BI fans, please join us in person for the next meeting on Monday, September 3th at 6:30 PM ET. Your humble correspondent will show you how to use...

DMO/SMO/Powershell

PowerShell: When to use Add-Type and New-Object ?

In this blog post, I will discuss when you need to add classes with Add-Type and when you can provide classes with New-Object. I would say, let’s get started.

Feedback on dbatools Start-DbaMigration

From Callihan Data

It’s time to follow up on a post from a few months ago where I said I’d give my thoughts on Start-DbaMigration from dbatools after picking up more experience....

Handle Long PowerShell Scripts With Background Jobs

From IT Pro - Microsoft Windows Information, Solutions, Tools

PowerShell expert Brien Posey demonstrates a technique for managing long-running scripts.

Data Privacy, Compliance, and Governance

Texas Sues GM for Collecting Driving Data without Consent

From Schneier on Security

Texas is suing General Motors for collecting driver data without consent and then selling it to insurance companies: From CNN: In car models from 2015 and later, the Detroit-based car manufacturer...

4 Ways Embedded BI Improves Data Governance

From Dataversity

We live in a data-driven culture, which means that as a business leader, you probably have more data than you know what to do with. To gain control over...

Database Design, Theory and Development

T-SQL Tuesday 177: Managing database code

From Eitan Blumin

This month’s #tsql2sday is hosted by Malathi, a.k.a Mala, a.k.a diligentdba (b), and asks us about how we manage our database… Read More »T-SQL Tuesday 177: Managing database code The post T-SQL Tuesday...

DevOps and Continuous Delivery (CI/CD)

Why Every Project Needs a CI/CD Pipeline No Matter How Small

Have you ever experienced a scenario as a developer when a seemingly simple weekend bug fix throws your entire project into disarray? You push the changes, excited to resolve the issue, only to be met with a cascade of errors and a frustrated team come Monday morning. This scenario, unfortunately, is a common problem for projects that rely on manual deployments. But there’s a way to automate these processes, ensuring smoother releases, fewer errors, and a more efficient development workflow. The hack is it to use CI/CD (continuous integration/continuous delivery) pipelines to solve this problem and achieve the desired outcome.

DevOps for SQL databases

From Richard Swinbank

In August 2024's T-SQL Tuesday, Mala Mahadevan asks us to reflect on how we manage database code. Thanks for hosting, Mala!

Excel

Setup Excel as Front End Application for SQL Server

From MSSQL Tips

In this article, we walk through how to set up a simple Excel interface to work with SQL Server data to insert, update, delete, and select data.

Hardware

The AMD Ryzen 9 9950X and Ryzen 9 9900X Review: Flagship Zen 5 Soars - and Stalls

From AnAndTech

Earlier this month, AMD launched the first two desktop CPUs using their latest Zen 5 microarchitecture: the Ryzen 7 9700X and the Ryzen 5 9600X. As part of the new...

Microsoft Fabric ( Azure Synapse Analytics, OneLake, ADLS, Data Science)

Fabric Capacity Limits

From Prologika (Teo Lachev's Weblog)

Here is table that is getting more and more difficult to find as searching for Fabric capacity limits returns results about CU compute units (for the most part meaningless...

Fabric Direct Lake: Memory Utilization with Interactive Operations

From Prologika (Teo Lachev's Weblog)

As I mentioned in my Power BI and Fabric Capacities: Thinking Outside the Box, memory limits of Fabric capacities could be rather restrictive for large semantic models with imported...

Update that can help you create your own Microsoft Fabric environment

From Kevin Chant

 

Oracle/PostgreSQL/MySQL/other RDBMS

MySQL vs PostgreSQL: Which Open-Source Database is right for you?

When I joined a growing startup company as a backend developer, we were at crossroads choosing between MySQL and PostgreSQL for our backend. Our team was divided: some favored MySQL for its speed and simplicity, while others leaned towards PostgreSQL for its advanced features and robustness.

MySQL vs PostgreSQL: Which Open-Source Database is right for you?

From Simple Talk

When I joined a growing startup company as a backend developer, we were at crossroads choosing between MySQL and PostgreSQL for our backend. Our team was divided: some favored...

Performance Tuning SQL Server

Five Reasons Why Your SQL Server Is Slow Right Now

From Erik Darling Data

Five Reasons Why Your SQL Server Is Slow Right Now Thanks for watching! Going Further If this is the kind of SQL Server stuff you love learning about, you’ll...

A Little About Out Of Date Statistics In SQL Server

From Erik Darling Data

A Little About Out Of Date Statistics In SQL Server Thanks for watching! Going Further If this is the kind of SQL Server stuff you love learning about, you’ll...

PostgreSQL

Steps to Change a PostegreSQL Column Datatype when Included in a View

From MSSQL Tips

In this article, we look at the steps to change a ...

As a SQL Server DBA, Postgres Backups Surprised Me.

From Brent Ozar Unlimited

Over the last week, I’ve been working on putting together a Postgres version of the Stack Overflow database, just like the SQL Server one that I’ve distributed for almost...

Andrei Lepikhov: Probing indexes to survive data skew in Postgres

From Planet Postgres

This is the story of an unexpected challenge I encountered and a tiny but fearless response to address the Postgres optimiser underestimations caused by a data skew, miss in statistics or inconsistency between statistics...

PowerPivot/PowerQuery/PowerBI

(Livestream Replay) The Power of SVGs in Power BI - Kerry Kolosko

From Havens Consulting

DESCRIPTION Many of us learn through action an...

Product Reviews and Articles

Book Review: Deciphering Data Architectures

From The Bit Bucket

I had some clear time this morning so I read a recent book called Deciphering Data Architectures (Choosing Between a Modern Data Warehouse, Data Fabric, Data Lakehouse, and Data...

Data Container Revisions, Resets and Graduations

From Product learning – Redgate Software

Redgate Test Data Manager allows developers to save each new version of a database as a data container revision. After making local development changes to the container, or running...

Product Upgrades and Releases

SQL Server Diagnostic Information Queries for August 2024

From Glenn Berry

Introduction These are my SQL Server Diagnostic Information Queries for August 2024, aka my DMV Diagnostic Queries. They allow you to get a very comprehensive view of the configuration...

SQL Server 2019 Cumulative Update 28

From Glenn Berry

On August 1, 2024, Microsoft released SQL Server 2019 Cumulative Update 28. This is Build 15.0.4385.2. By Microsoft’s count, there are 12 public fixes or improvements in this CU,...

General Availability: Maintenance window support for Azure SQL Database Hyperscale named replica.

From Azure SQL

We are excited to announce that Hyperscale named replicas now support configuring a specific maintenance windows. You can now choose from predefined time slots for maintenance and setup alerts to be...

Python

PySpark: The flexibility of a loop

From Simple Talk

Pyspark has many flexible syntaxes which are not so common to other languages. One of these syntaxes is the loop format. Loops in PySpark can be used to build...

R Language

Mastering String Concatenation in R: A Comprehensive Guide

String concatenation is a fundamental operation in data manipulation and cleaning. If you are working in R, mastering string concatenation will significantly enhance your data processing capabilities. This blog post will cover different ways to concatenate strings using base R, the stringr, stringi, and glue packages.

SQL Server Security and Auditing

Making It Easy To Tell When Data Changed With Change Data Capture In SQL Server

From Erik Darling Data

Making It Easy To Tell When Data Changed With Change Data Capture In SQL Server Thanks for watching! Going Further If this is the kind of SQL Server stuff...

Security News and Issues

NIST Releases First Post-Quantum Encryption Algorithms

From Schneier on Security

From the Federal Register: After three rounds of evaluation and analysis, NIST selected four algorithms it will standardize as a result of the PQC Standardization Process. The public-key encapsulation mechanism...

T-SQL and Query Languages

Compare SQL Server XACT_STATE vs @@TRANCOUNT Functions

From MSSQL Tips

This article will compare the SQL Server XACT_STATE and @@TRANCOUNT functions and illustrate the differences with examples. The goal is to reduce the confusion between these two functions, allowing...

Query Exercise Answer: What Makes SELECT TOP 1 or SELECT MAX Different?

From Brent Ozar Unlimited

This Query Exercise was very different: I didn’t ask you to solve a particular problem. I pointed out that I’ve heard advice that SELECT MAX is faster than SELECT...

Don't Mix with Datetime

Microsoft encourages us not to use the datetime data type:

Tech News

Open Source Doesn't Mean a Trademark Free-For-All

From IT Pro - Microsoft Windows Information, Solutions, Tools

Open-source licenses give you rights to copy, and in some cases, they define patent rights and responsibilities, but when they mention trademarks at all, most of them exclude rights...

Google’s AI Search Gives Sites Dire Choice: Share Data or Die

From IT Pro - Microsoft Windows Information, Solutions, Tools

Publishers say blocking the company's AI bot could also prevent their sites from showing up in search.

Future-Proofing Your Data Strategy: A Deep Dive into Data Engineering Trends

From Dataversity

Businesses today rely heavily on data and strive to harness its full potential while navigating a constantly changing landscape. The secret is to combine smart analytics with a strong...

US Considers a Rare Antitrust Move: Breaking Up Google

From IT Pro - Microsoft Windows Information, Solutions, Tools

This action would mark Washington's first attempt to break up a company for illegal monopolization since the failed bid to dismantle Microsoft two decades ago.

4 Ways to Reduce Reliance on Bad Open Source Packages

From IT Pro - Microsoft Windows Information, Solutions, Tools

Follow these four steps to proactively identify and eliminate risks from problematic open source packages.

The Lighter Side

CodeSOD: Stored Procedures are Better

From Daily WTF

We all know that building SQL queries via string c...

 
RSS FeedTwitter
This email has been sent to {email}. To be removed from this list, please click here. If you have any problems leaving the list, please contact the webmaster@sqlservercentral.com. This newsletter was sent to you because you signed up at SQLServerCentral.com. Note: This is not the SQLServerCentral.com daily newsletter list, and unsubscribing to this newsletter will not stop you receiving the SQL Server Central daily newsletters. If you want to be removed from that list, you can follow the instructions on the daily newsletter.
©2019 Redgate Software Ltd, Newnham House, Cambridge Business Park, Cambridge, CB4 0WZ, United Kingdom. All rights reserved.
webmaster@sqlservercentral.com

 

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -