SQLServerCentral.com Community Update for {date}

Problems displaying this newsletter? View online.

Featured Contents

Ingesting Data From REST API endpoints: Data Engineering with Fabric
Relationships with Microsoft Fabric GraphQL API
From the SQL Server Central Blogs - The query processor ran out of internal resources and could not produce a query plan – When You’ve Managed to Confuse SQL With a Crazy IN Clause.
From the SQL Server Central Blogs - Extending a SQL Server Failover Cluster Across Regions in Google Cloud Platform (GCP)
Practical Database Auditing for Microsoft SQL Server and Azure SQL: Troubleshooting, Regulatory Compliance, and Governance

Question of the Day

The Hash Join I

Featured Script

Real-Time SQL Server to BigQuery Streaming ETL using CDC

The Voice of the DBA

Separate Reads

Recently I was watching a presentation on how to scale performance in your SQL Server environment and one of the suggestions was setting up Availability Groups (AGs) and having read-intent connections that would query the secondary and not the primary. It's not a bad idea, and the SQL Native Client (and other drivers) support this and make it easy to implement.

The pattern of using multiple connections in an application, one for reads and one for writes, has been suggested often. However, in practice, I've rarely seen this work. Apparently having a connection variable, named dbConn, for writes and a second one, named dbConnReadOnly, for reads is too complex for most developers or teams.

Or maybe the idea of having to pick the right access point is a human problem? I've seen no shortage of problems in restaurants when we have specific "in" and "out" doors. Lots of people go through the wrong one and we end up with plates and food on the floor. Even broken noses or fingers at times. Perhaps I shouldn't pick on software developers too much.

How many of you use two connections from apps? Meaning, do you think about reads and writes in separate connections. Even if you read and write from the same database, this can be a nice practice that future proofs code. It's a small change, but it gives you room to grow if you get a read replica for analytics or reporting.

Of course, you could take it too far with different connections for different "services", aiming for a microservice-style architecture. We could have dbUser for user stuff and dbOrders for the business side, and other connections for other services. I wouldn't do that, as I think many of us will get confused, and we'll often be doing two different type of service things in the same code. If I need something from a customer to write an order, do I have two connections in my method? I could, but I bet lots of developers would try to re-use a single one.

Plus, if developers get into trouble with two connections, then what will they do with 5 or more? There are lots of ORMs that might even support this, or if they do, not make this easy to code.

I've always liked the idea of separating reads and writes, but maybe the better solution is using one connection whenever we have simple CRUD work and another one for any sort of complex querying or reporting. That would make more sense as I suspect many of us will eventually offload reporting or analytics in some way to another system. A Delta Lakehouse of some sort seems likely if the current trend takes hold in more organizations.

Let me know today if you used (or have tried) different connections for reads and writes.

Steve Jones - SSC Editor

Join the debate, and respond to today's editorial on the forums

Featured Contents

Ingesting Data From REST API endpoints: Data Engineering with Fabric

John Miner from SQLServerCentral

In this article on Fabric we will examine how to get data from a REST API.

Relationships with Microsoft Fabric GraphQL API

Additional Articles from MSSQLTips.com

Learn how to join tables together when using the GraphQL API with Microsoft Fabric in this step-by-step article.

From the SQL Server Central Blogs - The query processor ran out of internal resources and could not produce a query plan – When You’ve Managed to Confuse SQL With a Crazy IN Clause.

david.fowler 42596 from SQL Undercover

Recently I received a cry for help over Teams. The issue was that an application was throwing up the following SQL error, The query processor ran out of internal...

From the SQL Server Central Blogs - Extending a SQL Server Failover Cluster Across Regions in Google Cloud Platform (GCP)

david.bermingham from Clustering for Mere Mortals

I was the principal author of this SIOS whitepaper, which describes how to build a 2-node SQL Server cluster in Google Cloud Platform (GCP) spanning multiple zones. Today, I’ll...

Practical Database Auditing for Microsoft SQL Server and Azure SQL: Troubleshooting, Regulatory Compliance, and Governance

Site Owners from SQLServerCentral

Know how to track changes and key events in your SQL Server databases in support of application troubleshooting, regulatory compliance, and governance. This book shows how to use key features in SQL Server ,such as SQL Server Audit and Extended Events, to track schema changes, permission changes, and changes to your data. You’ll even learn how to track queries run against specific tables in a database. ss

Question of the Day

Today's question (by Steve Jones - SSC Editor):
	The Hash Join I
What are the two inputs called to a hash join operation in SQL Server? (choose 2)
Think you know the answer? Click here, and find out if you are right.

Yesterday's Question of the Day (by Steve Jones - SSC Editor)

Comparing Images

I am building an ETL process between these tables in SQL Server 2022 set to 160 compatibility level:

CREATE TABLE Image_Staging
( imageid     INT NOT NULL CONSTRAINT Image_StagingPK PRIMARY KEY
, imagestatus TINYINT
, imagebinary IMAGE);
GO

CREATE TABLE Images
( imageid       INT NOT NULL CONSTRAINT ImagesPK PRIMARY KEY
, imagestatus   TINYINT
, imagemodified DATETIME
, imagebinary   IMAGE);
GO

I want to run this query to check if the images already loaded exist. This will help me decide if I need to insert or update an image. What happens with this query?

SELECT i.imageid
FROM
  dbo.Image_Staging AS ist
  INNER JOIN dbo.Images AS i
    ON ist.imagebinary = i.imagebinary;

Answer: I get this error: Msg 402, Level 16, State 1, Line 19

Explanation: The ntext, text, and image datatypes cannot be used in direct joins. Ref: Joins - https://learn.microsoft.com/en-us/sql/relational-databases/performance/joins?view=sql-server-ver16

Discuss this question and answer on the forums

Featured Script

Real-Time SQL Server to BigQuery Streaming ETL using CDC

rahul.sarabu from SQLServerCentral

CDC Changes: The script queries the CDC tables in SQL Server to retrieve the changes (inserts, updates, deletes) since the last sync. Each change is processed with a mapped operation type (INSERT, UPDATE, DELETE).
Real-Time Streaming to BigQuery: The captured changes are streamed directly to BigQuery using its real-time insert_rows_json method, avoiding the need for batch uploads via Google Cloud Storage.
Tracking Last Sync Time: The script tracks the last synchronization time and updates it after every successful sync, ensuring no data is missed.
Low Latency: By continuously querying the CDC tables and streaming the changes, the script achieves near real-time data synchronization.

import pyodbc
from google.cloud import bigquery
from datetime import datetime, timedelta
# SQL Server Connection Configuration
sql_conn_str = "DRIVER={ODBC Driver 17 for SQL Server};SERVER=your_server_name;DATABASE=your_db_name;UID=your_username;PWD=your_password"
# BigQuery Configuration
bq_client = bigquery.Client()
bq_table_id = "your_project_id.your_dataset.your_table"
# Initialize BigQuery Schema for Streaming Inserts
schema = [
bigquery.SchemaField("id", "INTEGER"),
bigquery.SchemaField("column1", "STRING"),
bigquery.SchemaField("column2", "STRING"),
bigquery.SchemaField("operation_type", "STRING"), # To track inserts, updates, deletes
bigquery.SchemaField("timestamp", "TIMESTAMP")
]
# Step 1: Retrieve changes from CDC in SQL Server
def fetch_cdc_changes(last_sync_time):
conn = pyodbc.connect(sql_conn_str)
cursor = conn.cursor()
# Query the CDC table to get changes since the last sync (inserts/updates/deletes)
query = f"""
SELECT __$operation, id, column1, column2, __$start_lsn, __$seqval
FROM cdc.your_table_CT
WHERE __$start_lsn > ? ORDER BY __$start_lsn ASC
"""
cursor.execute(query, last_sync_time)
changes = cursor.fetchall()
cursor.close()
conn.close()
return changes
# Step 2: Stream data to BigQuery
def stream_to_bigquery(rows):
errors = bq_client.insert_rows_json(bq_table_id, rows) # Stream rows to BigQuery
if errors:
print(f"Errors occurred: {errors}")
else:
print(f"Successfully streamed {len(rows)} rows to BigQuery.")
# Helper: Map CDC operation codes to human-readable operations
def map_operation(op_code):
operation_map = {1: 'DELETE', 2: 'INSERT', 3: 'UPDATE'}
return operation_map.get(op_code, 'UNKNOWN')
# Main ETL Process
def real_time_etl(last_sync_time):
# Fetch changes from SQL Server
changes = fetch_cdc_changes(last_sync_time)
rows_to_insert = []
for change in changes:
operation_type = map_operation(change[0]) # Map CDC operation codes
rows_to_insert.append({
"id": change[1],
"column1": change[2],
"column2": change[3],
"operation_type": operation_type,
"timestamp": datetime.utcnow() # Use current UTC time for tracking
})
if rows_to_insert:
# Stream the changes to BigQuery in real-time
stream_to_bigquery(rows_to_insert)
if __name__ == "__main__":
# Step 3: Track the last synchronization time
# For the first run, initialize last_sync_time with a past date or store it externally (e.g., in a database)
last_sync_time = datetime.utcnow() - timedelta(minutes=5) # Example: sync changes in the last 5 minutes
# Real-time ETL process
while True:
real_time_etl(last_sync_time)
# Update last_sync_time after each successful sync (could be saved externally)
last_sync_time = datetime.utcnow()
# Sleep for a short interval to mimic near-real-time processing (e.g., every 1 minute)
time.sleep(60)

Database Pros Who Need Your Help

Here's a few of the new posts today on the forums. To see more, visit the forums.

SQL Server 2017 - Administration

Manual Distribution Database Cleanup - Hi All. I have a reasonably straightforward transactional replication setup. I had two databases on Server a, that are replicated to server b for reporting purposes. All was going well until Halloween (yeah, I know) when one of the databases encountered file corruption. The database remained running but the backups failed, it truly was my […]

SQL Server 2017 - Development

Deduplicating rows by choosing the row with the shortest string - Hello T-SQL experts I have a table containing team codes and descriptions. Unfortunately, many of the team codes are duplicated, often with different descriptions. I want to deduplicate the table (in T-SQL) by selecting the shortest description (in number of characters) for each team code. How do I do this in a non-manual fashion? I […]

SQL Server 2016 - Development and T-SQL

query help - I need some help to optimize this query as it runs in many threads in an over night, and it pegs CPU. I can't change schema or field attributes as that is not in my control. The attvalue is a multipurpose field and believed to be some of my bottle neck. I have 32gig of […]

SQL Server 2019 - Development

Error converting data type varchar to numeric - issue resolved.

Two queries or one? - The setup: 3 tables: core_users, core_roles, and core_user_roles. core users is the "base" table, core roles is a "look up" table, and core_user_roles contains 1-to-many foreign key references. To fix an oversight from when this was originally coded... the job spec says we need to insert any missing roles for a user if that user […]

Best Clustered Index Configuration for Partitioned Multi-Tenant Table with RLS - I’m working with a SQL Server database that supports a multi-tenant application. We have row-level security based on a TENANTID column, which restricts data access by tenant. Here’s the setup: All tables include an identity column named ID that is currently the first key in the clustered index (defined as (ID, TENANTID)). The tables are […]

Error loading multiple CSV files in SSIS - Hi everyone I have a bunch of CSV files that I need to bulk insert into a table. I saw this video on YouTube and implemented it. It works for some files but not for others. Basically, the approach from the video is to use a multi flat file connection manager to handle loading multiple […]

Reporting Services

Does SSRS Report with Stored Procedure execute SQL string? - I declare @Where based on the input parameter in the stored procedure. Set @SQL = 'Select * from Table1 ' + @where EXECUTE(@SQL) I created a dataset in SSRS to run the stored procedure. It returned no fields. Does SSRS Report with Stored Procedure execute SQL string? Thanks.

Integration Services

SSIS Package Consuming Excessive Disk Space on Local Drive (C:) - Hello everyone, I’ve developed an SSIS process to compare two relatively large tables. When I run the package from Visual Studio, the processing takes up space on my local PC's hard drive and continues until the drive is full (C:). Does anyone know why this Thanks in advance for your feedbac

Building Pivot from SSIS with growing list of Month Columns - Good afternoon, I have an SSIS package which loads raw records into a sheet of an Excel file. I now need a second sheet with a Pivot (for these raw records) where the columns are of Months. But obviously with time, the list of months will continuously grow. My understanding is that for purposes of […]

Unable to Pivot data in MS Excel for the (Output) file of SSIS package - Good Morning, I have built an SSIS package which produces Excel File Output. In the course of the Run, the package itself makes a copy of the Template Excel file before populating it with the new data. One of the fields is called 'Count' (it is actually stored as a Varchar in the SQL Table). […]

SQL Server 2022 - Administration

SQL Server Constrained Delegation - Hi, hoping someone can help. We're in the process of migrating to a new SQL instance and bulk load / insert from SMB share isn't working in SMSS or via SQL agent job I'm reasonably confident its Kerberos delegation as I see ANONYMOUS in the file server audit log with constrained delegation and when I […]

Unencrypted connections in Always on Availability Group - Hi, In my Always On Availability environment, I am seeing two encrypt_option values as FALSE in the DMV sys.dm_exec_connections. This is causing issues in the database vulnerability scan. Please note that an SSL certificate is already applied, and the Force Encryption option is set to "Yes." How can I resolve this issue?

SQL Server 2022 - Development

Get Sum and Last month sum - I want to get the sum and I want to get the sum of only last month for VenNum by FiYr and DocDte. I am having trouble adding the sum of only the previous month. Select count(*) as cnt, T1.FiYr, T1.DocDte, T1.VenNum, Sum(T1.DocCurrAmt) as DocCurrAmt From WalgreensCnlySapPaidHistory.[dbo].[CnlySapRawPaidHistory] T1 Where DocTypeDesc Like '%Scan Base%' and T1.VenNum […]

Closest to ProcDate - Not sure I have this 2nd left join correct. I need to grab the closest #Deptvalues.DeptDate and #Depvalues.Department within 1 day (before or after) of #ProcValues.ProcDate for each row. Each PatID and ProcDate combination in the #ProcValues table should look at the #Deptvalues table and create this connection. The desired outcome is enclosed. Appreciate any […]

This email has been sent to {email}. To be removed from this list, please click here. If you have any problems leaving the list, please contact the webmaster@sqlservercentral.com. This newsletter was sent to you because you signed up at SQLServerCentral.com.