SQL for calculating count max based on probablility

  • Hi All,

    we have to deal with a situation where the company has decided to reorganise departments and realocate users to them. For business reasons we need to retain our existing department naming convention so we have to map our departments to the new department codes - also the mapping has to be one to one. We have a table mapping users to our department and the external table has a table mapping users to the new departments. We would like to map our departments to the new department codes based on probabilities(i.e. map departments based on where most users are assigned). For example, in below table we would look to map our Dept 1 to DeptCode 200. Apologies if this does not make a lot of sense, still trying to get my head around it myself.

    Anyone have an idea of what sort of aggregation sql I would need to achieve this?

    Table A

    UserIdDeptName
    1001Dept 1
    1002Dept 1
    1003Dept 1
    1004Dept 1
    1005Dept 2
    1006

    Dept 2

     

    Table B

    NoUserIdsDepartment
    4Dept 1
    2Dept 2

     

    Table C (external table)

    UserIdDeptCode
    1001200
    1002201
    1003200
    1004202
    1005203
    1006203

     

    Thanks,

    Col

  • So the deal is to start numbering the deptmt with the highest # users at 200, then on from there as the #users descends? And this is also a oneshot deal I assume?

    I'd do it simplest possible.

    create table #x ( rank int identity not null, dept varchar(10) not null, users int not null )

    insert #x ( dept, users )

    select deptname, count(*)

    from  #a

    group by deptname

    order by count(*) desc

    alter table #x add newDptmt int null

    update #x set newDptmt = 199 + rank

    select * from #x

    rank        dept       users       newDptmt   

    ----------- ---------- ----------- -----------

    1           Dept 1     4           200

    2           Dept 2     2           201

    (2 row(s) affected)

    Now you know which dept - newdeptmt combinations there is, so it's easy to map from here.

    /Kenneth

  • Are you mapping Dept 1 to DeptCode 200 because 2 employees that were in Dept 1 are in 200, while only 1 each are in 201 and 202? If so, then what do you map to DeptCode 201 and 202, using your example. There are several issues I can see if this is the plan, including how to break ties, etc.

    ETA: Kenneth came up with an entirely different assumption from your OP than I did, so some clarification would definitely help.

     

     

  • Hi Kenneth/ David,

     

    thanks for your prompt replies.

    David, to clarify things, this is a one off operation to try and automate the mapping process. All unmapped departments will be manually mapped later. So we are simply saying, based on probabilities, here are the likely mappings, in the sample given - Dept 1 = Dept Code 200.

    Kenneth, thanks for the solution. The only difficulty is that the new dept code are not necessarily consecutive so don't think the adding the ranking value to a set value, 199, will work as outlined. Need to be able to get the ranking as you outlined and then map back the dept name of the dept code with highest ranking.

     

    Thanks again for help

  • ummm... I don't quite get it...

    Are you now saying that depending on the actual count of emps between different departments, you could end up with 'new' names like 200 and then 202 (skipping 201) etc...?

    If so, what are the rules for that then?

    (in all honesty it seems a bit silly )

    /Kenneth

  • It's quite straight forward actually. Based on the sample I gave, 4 of the users are assigned to Dept 1. In the new structure 2 of these 4 are assigned to Dept Code 200 and 1 each to 201 and 202. So we map 200 to Dept 1 based on the fact that we can currently only do a 1 to 1 join.

    I'm not looking to justify the business logic behind this as I realise it does not make much sense and in fairness it is likely to change quite soon. What I am seeking is whether this is possible in SQL and if so how it can be done.

     

    Thanks for your help,

    Col

  • I'll try to repeat what I have understood... correct me where I'm wrong.

    You have a table "old" that lists all employees, and to which department they belong.

    You have a table "new" that lists all employees, and to which department they will belong in future.

    Department codes in OLD and NEW are different, employee codes are the same in both tables.

    For each new department, you want to find out, from which old department it will receive most employees. Where there is no tie on the first rank, link the new and old departments. Each of the departments (both new and old) can only be linked to one department from the other group.

    Unclear situations will be handled manually.

    Is that it?

  • Ok.. I think it's slowly sinking in... Didn't read that closely in the example data given..

    The table C - that one already exists? Is that it?

    So the emps are already assigned to their 'new' departmentcodes?

    And, the 'problem' is that you can only use one of the possible many new codes to map against the old code, and that 'one-new' should be the one with the highest number emps assigned?

    Does all tables (A, B and C) already exists, or is the problem how to generate one or more of them..?

    I'm still not quite sure as to what the actual question is...?

    /Kenneth

  • Yes you're both close to the mark now!

    To clarify further. Tables A and C exist, table B we created to clarify the situation, i.e. how many users are assigned to each dept in table A - this could easily be done with SQL so simply there for display purposes.

    Where more than 1 user is assigned to a dept in Table A, I need to find the Dept code in Table C where most of these users are assigned to, i.e. is there a way in SQL to calculate the dept code in Table C where most users from the same dept in Table A are assigned to? In the sample give this would mean most users from Dept1 are assigned to dept code 200.

     

    I'm thinking this should not be too difficult using Max and Count functions but for some reason can come up with query to do this.

     

    Thanks for help

  • Try this for starters:

    SELECT a.deptname, c.deptname, COUNT(*)

    FROM TableA a

    JOIN TableC c ON c.userid=a.userid

    ORDER BY 1, 3 DESC

    It does not solve the fact that once a department is linked, it can't be linked again, so situations where both dept200 and dept222 get most employees from former dept01 have to be handled manually... but if the number of departments is not too high, this could be enough to give you the info you need and you'd do the linking manually.

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply