Aggregation over multiple columns

Question

Post reply

Aggregation over multiple columns

technicx101

Valued Member

Points: 65
More actions
November 16, 2008 at 7:03 pm

#193864

Hi,
I have a table with 33 columns of Skill types e.g. Skill1, Skill2, ..etc. A particular skill, say, "C#" can be found in any of the columns. Basically this table captures employee skill details. I would like to have a query that will list down the entire list of skills along with count of each type across all employees. i.e. "How many resources do I have with C# skills? How many do I have with C++ skills and so on.
I am a newbie to SQL and still groping the dark. Any help will be greatly appreciated
John

Viewing 5 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

Bevan Keighley SSCommitted Points: 1825 More actions · Answer 1

Hi John,

Ideally you need to redesign so that the skills , and SkillsPerPerson are stored in separate tables...

Person

--------

PersonID

PersonName....

Skills

-----

SkillID

SkillName....

PersonSkills

----------

PersonID

SkillID

Designing the tables in this way will mean the database design will not need to be updated when a new skill comes along. It will also make query design VERY easy. This is best practice and is called Normalisation.

But, as I dont know your environment I may be jumping to conclusions. Here is one way to solve your problem using your current design:

Table design based on your information and some dummy data....

create table #Skills (

EmployeeID int identity(1,1),

skillA varchar(128),

skillB varchar(128),

skillC varchar(128),

skillD varchar(128) )

truncate table #skills

insert #Skills(skillA, skillB, skillC, skillD)

select 'C#', 'VB', null, null

union select 'SQL', 'AJAX', 'VB', null

union select 'C++', 'C#', 'VB', null

union select 'ASP', 'ASP.NET', 'C++', 'Javascript'

And the query....

with cteSkills(Skill)

as

(select distinct skillA as Skill from #skills

union

select distinct skillB as Skill from #skills

union

select distinct skillC as Skill from #skills

union

select distinct skillD as Skill from #skills)

select Skill , count(Skill) as SkilledPeople

from cteSkills inner join #skills on (Skill=SkillA) or (Skill=SkillB) or (Skill=SkillC) or (Skill=SkillD)

where Skill is not null group by Skill

order by count(Skill) desc

You will need to add more columns to the query for each database column.

B

sgmunson SSC Guru Points: 110651 More actions · Answer 2

You might want to look up UNPIVOT in BOL, which could be used similarly to the following:

;WITH CTE_SKILLS AS (

SELECT DISTINCT EMPLOYEE_ID, SKILL

FROM SKILL_TABLE

UNPIVOT(SKILL FOR SKILL_TYPE IN (

SKILL1,SKILL2,SKILL3,SKILL4,SKILL5,SKILL6,SKILL7,SKILL8,

SKILL9,SKILL10,SKILL11,SKILL12,SKILL13,SKILL14,SKILL15,SKILL16,

SKILL17,SKILL18,SKILL19,SKILL20,SKILL21,SKILL22,SKILL23,SKILL24,

SKILL25,SKILL26,SKILL27,SKILL28,SKILL29,SKILL30,SKILL31,SKILL32,

SKILL33)) AS UPVT

)

SELECT SKILL, COUNT(EMPLOYEE_ID) AS EMP_COUNT

FROM CTE_SKILLS

GROUP BY SKILL

While this will get past the poor database design, how it performs will be a question mark until you test it. I have to agree that this database needs to be re-designed.

Steve

(aka smunson)

:):):)

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

tfifield SSCrazy Eights Points: 9655 More actions · Answer 3

What I would do is write some code that normalizes the data so that each person/skill set is in a separate row. Then deal with the data in this normalized fashion.

It's very risky to write a lot of code that un-pivots the the data on the fly. You can write a considerable amount of SQL code that assumes - let's say - that there are only 4 skill sets that can be populated for the person. People can tell you that there will never, under any circumstances, ever be more than 4. Well, 6 months or a year later someone will come up with a good reason for there to be 5th and then a 6th skill set. I've dealt with this type of thing for years and it almost always happens.

If the majority of your code deals with normalized data, then the only code you have to change when the new column is added is that code that pumps the data into the normalized tables - everything else is the same.

Todd Fifield

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 4

technicx101 (11/16/2008)
Hi,
I have a table with 33 columns of Skill types e.g. Skill1, Skill2, ..etc. A particular skill, say, "C#" can be found in any of the columns. Basically this table captures employee skill details. I would like to have a query that will list down the entire list of skills along with count of each type across all employees. i.e. "How many resources do I have with C# skills? How many do I have with C++ skills and so on.
I am a newbie to SQL and still groping the dark. Any help will be greatly appreciated
John

You've been dealt a great dis-service with a denormalized table like that. An "unpivot" is certainly in order, as others have suggested. BUT, it's impossible to help because you've neither provided table structure nor example data. Please see the link in my signature below.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)