SQL for calculating count max based on probablility

Question

SQL for calculating count max based on probablility

Col -490185

Grasshopper

Points: 14
More actions
September 14, 2007 at 7:57 am

#118566

Hi All,
we have to deal with a situation where the company has decided to reorganise departments and realocate users to them. For business reasons we need to retain our existing department naming convention so we have to map our departments to the new department codes - also the mapping has to be one to one. We have a table mapping users to our department and the external table has a table mapping users to the new departments. We would like to map our departments to the new department codes based on probabilities(i.e. map departments based on where most users are assigned). For example, in below table we would look to map our Dept 1 to DeptCode 200. Apologies if this does not make a lot of sense, still trying to get my head around it myself.
Anyone have an idea of what sort of aggregation sql I would need to achieve this?
Table A
UserId DeptName
1001 Dept 1
1002 Dept 1
1003 Dept 1
1004 Dept 1
1005 Dept 2
1006
Dept 2

Table B
NoUserIds Department
4 Dept 1
2 Dept 2

Table C (external table)
UserId DeptCode
1001 200
1002 201
1003 200
1004 202
1005 203
1006 203

Thanks,
Col

Viewing 10 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

Kenneth Wilhelmsson SSC-Dedicated Points: 30043 More actions · Answer 1

So the deal is to start numbering the deptmt with the highest # users at 200, then on from there as the #users descends? And this is also a oneshot deal I assume?

I'd do it simplest possible.

create table #x ( rank int identity not null, dept varchar(10) not null, users int not null )

insert #x ( dept, users )

select deptname, count(*)

from #a

group by deptname

order by count(*) desc

alter table #x add newDptmt int null

update #x set newDptmt = 199 + rank

select * from #x

rank dept users newDptmt

----------- ---------- ----------- -----------

1 Dept 1 4 200

2 Dept 2 2 201

(2 row(s) affected)

Now you know which dept - newdeptmt combinations there is, so it's easy to map from here.

/Kenneth

David McFarland SSChampion Points: 11815 More actions · Answer 2

Are you mapping Dept 1 to DeptCode 200 because 2 employees that were in Dept 1 are in 200, while only 1 each are in 201 and 202? If so, then what do you map to DeptCode 201 and 202, using your example. There are several issues I can see if this is the plan, including how to break ties, etc.

ETA: Kenneth came up with an entirely different assumption from your OP than I did, so some clarification would definitely help.

Col -490185 Grasshopper Points: 14 More actions · Answer 3

Hi Kenneth/ David,

thanks for your prompt replies.

David, to clarify things, this is a one off operation to try and automate the mapping process. All unmapped departments will be manually mapped later. So we are simply saying, based on probabilities, here are the likely mappings, in the sample given - Dept 1 = Dept Code 200.

Kenneth, thanks for the solution. The only difficulty is that the new dept code are not necessarily consecutive so don't think the adding the ranking value to a set value, 199, will work as outlined. Need to be able to get the ranking as you outlined and then map back the dept name of the dept code with highest ranking.

Thanks again for help

Kenneth Wilhelmsson SSC-Dedicated Points: 30043 More actions · Answer 4

ummm... I don't quite get it...

Are you now saying that depending on the actual count of emps between different departments, you could end up with 'new' names like 200 and then 202 (skipping 201) etc...?

If so, what are the rules for that then?

(in all honesty it seems a bit silly )

/Kenneth

Col -490185 Grasshopper Points: 14 More actions · Answer 5

It's quite straight forward actually. Based on the sample I gave, 4 of the users are assigned to Dept 1. In the new structure 2 of these 4 are assigned to Dept Code 200 and 1 each to 201 and 202. So we map 200 to Dept 1 based on the fact that we can currently only do a 1 to 1 join.

I'm not looking to justify the business logic behind this as I realise it does not make much sense and in fairness it is likely to change quite soon. What I am seeking is whether this is possible in SQL and if so how it can be done.

Thanks for your help,

Col

Vladan SSC-Insane Points: 22072 More actions · Answer 6

I'll try to repeat what I have understood... correct me where I'm wrong.

You have a table "old" that lists all employees, and to which department they belong.

You have a table "new" that lists all employees, and to which department they will belong in future.

Department codes in OLD and NEW are different, employee codes are the same in both tables.

For each new department, you want to find out, from which old department it will receive most employees. Where there is no tie on the first rank, link the new and old departments. Each of the departments (both new and old) can only be linked to one department from the other group.

Unclear situations will be handled manually.

Is that it?

Kenneth Wilhelmsson SSC-Dedicated Points: 30043 More actions · Answer 7

Ok.. I think it's slowly sinking in... Didn't read that closely in the example data given..

The table C - that one already exists? Is that it?

So the emps are already assigned to their 'new' departmentcodes?

And, the 'problem' is that you can only use one of the possible many new codes to map against the old code, and that 'one-new' should be the one with the highest number emps assigned?

Does all tables (A, B and C) already exists, or is the problem how to generate one or more of them..?

I'm still not quite sure as to what the actual question is...?

/Kenneth

Col -490185 Grasshopper Points: 14 More actions · Answer 8

Yes you're both close to the mark now!

To clarify further. Tables A and C exist, table B we created to clarify the situation, i.e. how many users are assigned to each dept in table A - this could easily be done with SQL so simply there for display purposes.

Where more than 1 user is assigned to a dept in Table A, I need to find the Dept code in Table C where most of these users are assigned to, i.e. is there a way in SQL to calculate the dept code in Table C where most users from the same dept in Table A are assigned to? In the sample give this would mean most users from Dept1 are assigned to dept code 200.

I'm thinking this should not be too difficult using Max and Count functions but for some reason can come up with query to do this.

Thanks for help

Vladan SSC-Insane Points: 22072 More actions · Answer 9

Try this for starters:

SELECT a.deptname, c.deptname, COUNT(*)

FROM TableA a

JOIN TableC c ON c.userid=a.userid

ORDER BY 1, 3 DESC

It does not solve the fact that once a department is linked, it can't be linked again, so situations where both dept200 and dept222 get most employees from former dept01 have to be handled manually... but if the number of departments is not too high, this could be enough to give you the info you need and you'd do the linking manually.

UserId	DeptName
1001	Dept 1
1002	Dept 1
1003	Dept 1
1004	Dept 1
1005	Dept 2
1006	Dept 2

UserId	DeptCode
1001	200
1002	201
1003	200
1004	202
1005	203
1006	203

NoUserIds	Department
4	Dept 1
2	Dept 2