Big Data or Small Data

I went to San Francisco for Small Data SF, a conference sponsored by Mother Duck. The premise of the event was that smaller sets of data are both very useful and prevalent. The manifesto speaks to me, as I am a big fan of smaller sets of data for sure. I also think that most of the time we can use less data than we think we need, especially when it's recent data. That often is more relevant and we end up with contorted queries that try to weight new or old data differently to reflect this. Maybe the best line for me is this one:

Bigger data has an opportunity cost: Time.

I think time is a very valuable commodity and large sets of data can slow you down. There's also the chance that looking at too much data starts to blur the lines of understanding. We may start to miss information in our dataset, or we may find people arguing about different things the data means, because we have so much data that we can find support for any position somewhere in the vast sea of numbers, strings, and dates.

Big data also has a real cost in resources, often money. One of the examples was from the organizer, who once gave a demo on stage, querying a PB of data. That's impressive, and lots of us would want to be able to query our very-large-but-less-than-PB-sized data in minutes. However, the thing that wasn't disclosed in the demo was the query cost over USD$5k.

I've heard from a number of customers and speakers that most people don't have big data. Most of us have 100s-of-GB-sized working sets of data, sometimes with TB-sized archives in the same database that slow everything down. If we could easily extract out the useful data, we could query those hundreds of GB more efficiently.

This is especially true in the era of small devices that can handle something close to a TB of data in a small form factor. With some of the columnar systems that compress data, a TB of raw data might be substantially compressed in Parquet files or an analysis system like DuckDB. In that case, we might realistically search and analyze 1TB of data on a laptop.

I know that big data is relative, but many of us face challenges with data sizes and query performance. I know lots of you embrace the challenge and see working with TB (or larger) systems as a badge of honor. I also know the reality is that most of us struggle to separate our archive data from current working data in our systems. However, if we could, would most of you want to work with smaller data sets or do you enjoy large ones? I know which way I lean.

Revolutionizing Real-Time Data Management with SQL Server, Kafka, and Informatica

by Kiran Peddireddy

SQLServerCentral

This article examines how one can structure a pipeline for processing real-time data using Kafka and Informatica.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

4 (4)

You rated this post out of 5. Change rating

2023-04-26

4,502 reads

Discuss

Big Data Downsides

by Steve Jones

SQLServerCentral

Too much data, especially for some data analysis isn't good for diversity of any product domain.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2023-04-12

144 reads

Discuss

The Danger of Sharing Data

by Steve Jones

SQLServerCentral

Big Data

When a company assembles a lot of data to draw conclusions, that's great. However, when that one company sells data to a number of competitors, that can be a problem.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2023-02-22

161 reads

Discuss

Do You Have Big Data?

by Steve Jones

SQLServerCentral

Big Data

Data sizes are always growing. Stats on world data are astounding, as are the stats many of us experience in our lives. Plenty of us have moved from MB management to GBs, and I see plenty of people dealing with TB storage at home. Most of that data is likely from images and video, but […]

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2020-09-09

146 reads

Discuss

Data Orchestration

by Steve Jones

SQLServerCentral

Steve talks data virtualization in the age of growing data sets and larger workloads

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2020-02-27

305 reads

Discuss

Big Data or Small Data

Rate

Share

Categories

Share

Rate

Big Data or Small Data

Rate

Share

Categories

Share

Rate

Related content

Revolutionizing Real-Time Data Management with SQL Server, Kafka, and Informatica

Big Data Downsides

The Danger of Sharing Data

Do You Have Big Data?

Data Orchestration