Problems displaying this newsletter? View online.
SQL Server Central
Featured Contents
Question of the Day
 
The Voice of the DBA
 

Big Data or Small Data

I went to San Francisco for Small Data SF, a conference sponsored by Mother Duck. The premise of the event was that smaller sets of data are both very useful and prevalent. The manifesto speaks to me, as I am a big fan of smaller sets of data for sure. I also think that most of the time we can use less data than we think we need, especially when it's recent data. That often is more relevant and we end up with contorted queries that try to weight new or old data differently to reflect this. Maybe the best line for me is this one:

Bigger data has an opportunity cost: Time.

I think time is a very valuable commodity and large sets of data can slow you down. There's also the chance that looking at too much data starts to blur the lines of understanding. We may start to miss information in our dataset, or we may find people arguing about different things the data means, because we have so much data that we can find support for any position somewhere in the vast sea of numbers, strings, and dates.

Big data also has a real cost in resources, often money. One of the examples was from the organizer, who once gave a demo on stage, querying a PB of data.  That's impressive, and lots of us would want to be able to query our very-large-but-less-than-PB-sized data in minutes. However, the thing that wasn't disclosed in the demo was the query cost over USD$5k.

I've heard from a number of customers and speakers that most people don't have big data. Most of us have 100s-of-GB-sized working sets of data, sometimes with TB-sized archives in the same database that slow everything down. If we could easily extract out the useful data, we could query those hundreds of GB more efficiently.

This is especially true in the era of small devices that can handle something close to a TB of data in a small form factor. With some of the columnar systems that compress data, a TB of raw data might be substantially compressed in Parquet files or an analysis system like DuckDB. In that case, we might realistically search and analyze 1TB of data on a laptop.

I know that big data is relative, but many of us face challenges with data sizes and query performance. I know lots of you embrace the challenge and see working with TB (or larger) systems as a badge of honor. I also know the reality is that most of us struggle to separate our archive data from current working data in our systems. However, if we could, would most of you want to work with smaller data sets or do you enjoy large ones? I know which way I lean.

Steve Jones - SSC Editor

Join the debate, and respond to today's editorial on the forums

 
 
 Featured Contents
Stairway to Synapse Analytics

Level 4 of the Stairway to Synapse Analytics: Create and Use a Delta Lake

Sucharita Das from SQLServerCentral

Introduction In Level 1 of this series, I discussed Synapse Analytics basics and the steps for creating the Synapse Workspace. In Level 2, we analyzed Data Lake files using the Serverless SQL Pool. In Level 3, we analyzed Data Lake files using the Spark Pool. In Levels 4 and level 5, I will discuss the Delta […]

Technical Article

Why a Self-Join Requires Halloween Protection

Additional Articles from SQLServerCentral

I was asked recently why Halloween Protection was needed for data modification statements that include a self-join of the target table. This gives me a chance to explain, while also covering some interesting product bug history from the SQL Server 7 and 2000 days.

Technical Article

Save the date for PASS Summit 2025!

Additional Articles from PASS

PASS Data Community Summit will return to Seattle next year! Save the date for this incredible in-person event for global data professionals, which will take place at Summit, Seattle Convention Center, from November 17-21, 2025!

Blog Post

From the SQL Server Central Blogs - Ways to land data into Fabric OneLake

James Serra from James Serra's Blog

Microsoft Fabric is rapidly gaining popularity as a unified data platform, leveraging OneLake as its central data storage hub for all Fabric-integrated products. A variety of tools and methods...

Microsoft Power Platform Up and Running: Learn to Analyze Data, Create Solutions, Automate Processes, and Develop Virtual Agents with Low Code Programming

Site Owners from SQLServerCentral

The book begins with the basics, explaining what low-code and no-code are and showing how to maximize efficiency in creating business applications for one's organization. Next, the book describes the Microsoft Power Platform's foundation, as well as all its components and services. Readers will begin with practical exercises right away, beginning with provisioning a Power Platform environment. Next, the book delves deeper into the Power Platform components such as Power Apps, Power Automate, Power BI, and Power Virtual Agents. Towards the end, the book explains practical exercises for each feature or service where you will gradually build a small business solution for a fictitious organization, Project Wizards, Inc.

 

 Question of the Day

Today's question (by Steve Jones - SSC Editor):

 

Function Defaults

I have created this function in SQL Server 2022:
CREATE FUNCTION dbo.AddInt (@one INT, @two INT = 1) RETURNS INT
AS
BEGIN
    RETURN @one + @two
END
How can I call this and invoke the default value for @two?

Think you know the answer? Click here, and find out if you are right.

 

 

 Yesterday's Question of the Day (by Steve Jones - SSC Editor)

The Random Rows

If I run this code in SQL Server 2019 on Adventureworks, how many random values are returned?

SELECT TOP (10)
       be.BusinessEntityID
     , RAND (BusinessEntityID)
FROM Person.BusinessEntity AS be;

Answer: 10

Explanation: There are 10 different random vales returned. This table has the BusinessEntityID as the PK, so the value is different for every row. For RAND() the same seed value returns the same result, but in this case, we have different seeds, so different RAND() values. Ref: RAND - https://learn.microsoft.com/en-us/sql/t-sql/functions/rand-transact-sql?view=sql-server-ver16

Discuss this question and answer on the forums

 

 

 

Database Pros Who Need Your Help

Here's a few of the new posts today on the forums. To see more, visit the forums.


SQL Server 2016 - Administration
Query is filling Tempdb - I have query which is filling TEMPDB I need suggestion how to modify this query to which can minimize the TEMPDB space to fill update #TempShow1 set Name = case when a2.ID is NULL then a1.name else a2.name end, Address = case when a2.ID is NULL then a1.Address else a2.Address end, State = case when […]
NC index creation duration question - Hey All, My setup is as follows: Table with Clustered Index on ID col. Size 13 GB.  200+ fields, many LOB cols. Single non-clustered index with key CaseNumber and 2 include columns. Size 200 MB. Server performance was good, normal, i.e. hardware doesn't explain the issue. Scenario: I attempted to add a new Non-clustered index […]
SQL Server 2016 - Development and T-SQL
Script is filling Tempdb - I have query which is filling TEMPDB I need suggestion how to modify this query to which can minimize the TEMPDB space to fill update #TempShow1 set Name = case when a2.ID is NULL then a1.name else a2.name end, Address = case when a2.ID is NULL then a1.Address else a2.Address end, State = case when […]
Trigger After Update to Update the Updated Table - Okay I can easily see how to update a table field after an Insert but do I use the system Inserted when creating a trigger to update a field in the table that was just updated and do I use AFTER UPDATE or FOR UPDATE CREATE TRIGGER [dbo].[TG_ResolvedDate_I] ON [dbo].[Issues] AFTER UPDATE AS BEGIN SET […]
Group Delete - I have 2 tables which I need to cleanup old costkeys, based upon a driver table. The driver table is [CostKeyDeletes] and used as input to the cstdetails for cleanup. The costkeys very in record counts so I need a batchsize limit. I would like see logging of the process showing which CostKey it working […]
SQL Server 2012 - T-SQL
Group records during capture - I have a table which is being written to by another application. A few things of note I can't alter schema, or unable to upgrade the sql version at this time. I need some help in a couple of areas. The first is to grab the last 15 minutes of information from this table. The […]
SQL Server 2019 - Administration
Copy a large table from one table to another - Hi, I have large table with 75 columns and 1.1 billion rows. Want to know the fastest way to copy from one table(non partitioned) to another(partitioned) table. Did anyone tried this. If yes, how long it took to load the data. I tried SSIS package with multiple threads, select- insert in batches and OPTION MAXDOP. […]
Reporting Services
Migrating from SSRS 2008 (SP Integrated) to 2022 (Native) - Has anyone done a migration from sharepoint integrated ssrs to native? I'm not finding any specific documentation for this use case. We were able to get the reports migrated via the RS utility and ssrs_migration script provided by microsoft but the subscriptions and report history didnt come across. From the microsoft documentation it seems we're […]
Integration Services
Date data type not recognized with oledb type provider - Hello, We recently made the following observation working with  VS2022  combined with the SSIS extension 'SQL Server Integration Services Projects 2022  (version 1.5) When creating a connection manager the driver of choice is set to  SQLOLEDB.1 With these type of connection string the SQL datatype 'date' is not recognized and set to  WSTR(10) , be […]
Suggestions
Peer review SQL Central articles - Hi, I would like to contribute to SQLServerCentral in peer reviews. As there are many articles that need to be reviewed on a daily basis, i assume this site needs some reviewers. May i know the process to become reviewer and start contributing.?
SQLServerCentral.com Website Issues
Topic marked as spam - by mistake? - I received the following notification a few hours ago. It seems legit but has been marked as spam – should it be resurrected? harrylune wrote: Do you have any advice on how to set up git repositories for the things indicated in the title? I'm particularly interested in hearing from anyone who have experience with […]
Working with Oracle
CPU Performance Advice in SQL Server 2005 - I am having problems with CPU performance on SQL Server 2005. CPU spikes frequently to 90-100% for no apparent reason. I have checked the running jobs and queries but found nothing unusual. The server has about 30 concurrent connections and the database is about 5GB. - Check long queries and optimize them. - Use SQL […]
Third Party Products
ESP Scheduling Tool - Has anyone ever used Broadcom's ESP Scheduling tool? I have questions regarding it and the use of password-protected SSIS packages if anyone can help.
SQL Server 2022 - Development
Getting blockage on Update Statement - Hi, Need your help. I've a SP that is using Update statement on a table. 2-3 processes are updating this table at the same time. I'm getting frequent blockage. How can we handle to prevent blockage? Will using SET XACT_ABORT ON can help? Or setting to isolation level can help? SET TRANSACTION ISOLATION LEVEL READ […]
Which selection criteria is more efficient. - Is it more efficient to search on an int than a nullable DateTime? I have a table where I can choose to search on an int field in the where clause or a nullable datetime field looking for nulls in the where clause. Both methods return the same resultset. While I am most likely nitpicking, […]
 

 

RSS FeedTwitter

This email has been sent to {email}. To be removed from this list, please click here. If you have any problems leaving the list, please contact the webmaster@sqlservercentral.com. This newsletter was sent to you because you signed up at SQLServerCentral.com.
©2019 Redgate Software Ltd, Newnham House, Cambridge Business Park, Cambridge, CB4 0WZ, United Kingdom. All rights reserved.
webmaster@sqlservercentral.com

 

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -