Apache Spark

Technical Article

Improving Performance In Spark Using Partitions

  • DatabaseWeekly

In this blog post we are going to show how to optimize your Spark job by partitioning the data correctly. To demonstrate this we are going to use the College Score Card public dataset, which has several key data points from colleges all around the United States. We will compute the average student fees by state with this dataset.

You rated this post out of 5. Change rating

2019-04-12

Blogs

The Book of Redgate: Mistakes

By

This is kind of a funny page to look at. The next page has...

ADF Pipeline Debugging Fails with BadRequest – The Sequel

By

A while ago I blogged about a use case where a pipeline fails during...

Why I stopped using MCP for AI coding stuff

By

Something has shifted quietly in 2026. The developers I know/respect—the ones actually shipping, not...

Read the latest Blogs

Forums

Dynamic Unpivot

By pietlinden

I have a table I didn't design that has tons of repeating groups in...

Writing as an Art and a Job

By Steve Jones - SSC Editor

Comments posted to this topic are about the item Writing as an Art and...

String Similarity II

By Steve Jones - SSC Editor

Comments posted to this topic are about the item String Similarity II

Visit the forum

Question of the Day

String Similarity II

What is the range for the result from the EDIT_DISTANCE_SIMILARITY() function in SQL Server 2025?

See possible answers