Apache Spark

Technical Article

Improving Performance In Spark Using Partitions

  • DatabaseWeekly

In this blog post we are going to show how to optimize your Spark job by partitioning the data correctly. To demonstrate this we are going to use the College Score Card public dataset, which has several key data points from colleges all around the United States. We will compute the average student fees by state with this dataset.

You rated this post out of 5. Change rating

2019-04-12

Blogs

My 2024 in Data: Music

By

This is my last week of the year working (I guess I come back...

A New Word: Suente

By

suente– n. the state of being so familiar with someone that you can be...

Side Projects

By

Anyone (everyone?) who has ever tried to learn a programming language knows that to...

Read the latest Blogs

Forums

Automating Export of PowerQuery to SQL Server using DAX Studio...?

By pietlinden

I wrote a PowerQuery that parses a table from a PDF, and I want...

Attaching an SQL Server database without Transaction Log Files through SQL Server Management Studio.

By Noman072

Comments posted to this topic are about the item Attaching an SQL Server database...

Superseded Indexes

By Steve Jones - SSC Editor

Comments posted to this topic are about the item Superseded Indexes

Visit the forum

Question of the Day

Superseded Indexes

Which of these indexes is superseded by another?

See possible answers