Analyzing Breached Data

A few of you out there might be data scientists who profile data regularly. Probably a fair number of you do import/export work and learn to check data values, perhaps with counts, distincts, or other aggregates. I don't know if the performance tuners out there look at the skew of data or the details of what is in a query that needs improvement. However, all of you are likely familiar with data and trying to query it for some type of meaning.

One of the largest data breaches occurred with National Public Data. Troy Hunt analyzed the breach as a part of his work with haveIbeenpwned. The piece is an interesting analysis of the data, trying to determine both it's legitimacy as well as what is actually included in the breach. It's a fascinating read and I encourage you to look at it not just from the data analysis side, but also to be aware of what data about you is being aggregated and sold by companies.

The read is interesting as it is a bit of a detective story, digging through data in a folder, which is something I've had to do. I've had people in previous jobs just dump a bunch of data on me and ask me to load it into a database. Or a table. Often without them knowing what type of data it is, what formats, do files relate to each other? Are there multiple tables worth of data in a file? All questions I've had to ask myself (and answer), and similar to what Troy did to analyze the breach.

Data is very important to many of us, in different ways, but I'm often amazed at how few people actually understand how to organize data and ensure others can track the metadata about their data (what their data represents). I'm guessing this is why every person that gets an extract of data to load into Excel formats it in different ways.

In many cases, people want the ability to query data, but they prefer to just focus on one table that contains a lot of information. They don't want to know how to "join" data together. I think this might be the reason we see so many views in databases, and why we have views built on views. Each new client of the database needs their own view structure.

The world of data is a mess, even inside an organization. Once we start moving data between organizations, it's truly a mess. We might bemoan all the inefficiencies and work we do to move, change, and re-load data as custom, human ETL machines, but there is one great thing about this tangled web. It provides for steady, secure jobs for many of us with no end of work in sight.

Building or Buying Analytics

by Steve Jones

SQLServerCentral

One of the decisions that I've been involved with at the beginning of every software project is whether to buy software to solve the problem or build our own. This might be a quick "is there software anyone knows about to do this?" query, or an in-depth review of the marketplace or something in between. […]

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2019-06-08

220 reads

Discuss

Celebrate Success

by Grant Fritchey

SQLServerCentral

Today Grant reminds us to not only think about the things we can do better, but remember the good things that we accomplish.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2019-09-28

154 reads

Discuss

Fantasy SQL Server

by Steve Jones

SQLServerCentral

This week Steve looks back at this month's T-SQL Tuesday invitation.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2019-09-14

283 reads

Discuss

Always Check on the Basics

by Steve Jones

SQLServerCentral

The basics are important, especially with regards to backup and restore in SQL Server.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2019-08-31

340 reads

Discuss

Do You Have the Gifts to Be a DBA?

by drsql@hotmail.com

SQLServerCentral

Database Weekly

This editorial was originally published on Jul 13, 2019. It is being re-run as Steve is out of town. I recently had the pleasure of catching Paul McCartney in concert, and he was amazing. I have been a fan forever and have heard him tell the same stories he over and over with great delight. […]

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

3 (2)

You rated this post out of 5. Change rating

2024-08-21 (first published: 2019-07-13)

528 reads

Discuss

Analyzing Breached Data

Rate

Share

Categories

Share

Rate

Analyzing Breached Data

Rate

Share

Categories

Share

Rate

Related content

Building or Buying Analytics

Celebrate Success

Fantasy SQL Server

Always Check on the Basics

Do You Have the Gifts to Be a DBA?