Mapping group by

  • Hello guys,

    is there any tricky way to catch mapping of records in group by command? I need to keep mapping of record from input table to inserted table after group by. The mapping should be directly in query (e.g. somehow use OUTPUT clause). I can't run the same query due to performance reason. Please see sample below.

    Thanks

    table t

    idcol

    ------------------

    1A

    2B

    3B

    4B

    5B

    6B

    7B

    8C

    result table

    r

    idcol

    -----------------

    1A

    2B

    8C

    required mapping:

    beforeafter

    11

    22

    32

    42

    52

    62

    72

    88

    Script for generating sample:

    drop table t

    drop table r

    go

    create table t

    (

    id int IDENTITY(1,1) PRIMARY KEY,

    col NVARCHAR(100) NOT NULL

    )

    go

    create table r

    (

    id int PRIMARY KEY,

    col NVARCHAR(100) NOT NULL

    )

    go

    insert t (col)

    VALUES('A'),('B'),('B'),('B'),('B'),('B'),('B'),('C')

    insert r (id, col)

    SELECT

    MIN(id) AS id,

    col

    from t

    group by col

    select * from t

    select * from r

  • I don't know if I understood correctly what you're after.

    Give this a try:

    DECLARE @t TABLE

    (

    id INT NOT NULL,

    col NVARCHAR(100) NOT NULL

    )

    INSERT INTO @T

    VALUES(1,'A'),(2,'B'),(3,'B'),(4,'B'),(5,'B'),(6,'B'),(7,'B'),(8,'C')

    ;WITH groupedResults AS (

    SELECT MIN(id) AS id, col

    FROM @t

    GROUP BY col

    )

    UPDATE A

    SET id = B.id

    OUTPUT INSERTED.id AS id, INSERTED.col

    FROM @t AS A

    INNER JOIN groupedResults AS B

    ON A.col = B.col

    -- Gianluca Sartori

  • Thanks SSCrazy, this is good idea. But the point is that I have one big insert where is the aggregation and group by.

    I have to get that mapping into one INSERT command. Your idea is UPDATE command source joined to result which is the second command and additional performance.

    The table is big, let's say about 500k rows after "group by" the target table will be about 400k rows.

  • You mentioned two tables, but I can't understand how they come into play.

    Can you post:

    a) sample initial data in your tables

    b) your desidered output in both tables

    -- Gianluca Sartori

  • Please see my initial post:

    source: table t

    target: table r

    operation:

    insert r (id, col)

    SELECT

    MIN(id) AS id,

    col

    from t

    group by col

    The mapping required:

    required mapping:

    beforeafter

    1 1

    2 2

    3 2

    4 2

    5 2

    6 2

    7 2

    8 8

  • This is exactly what I don't get.

    What do you mean with "mapping"? Should the minimum ID for each "col" be stored in an additional column in table "t"? Do you want to output this datum to a result set to be consumed on the app side?

    -- Gianluca Sartori

  • Yes, the mapping need to be stored in table.

    Mapping Table:

    Map_t_to_r

    --------------------------

    t_IDr_ID

    1.......1

    2.......2

    3.......2

    4.......2

    5.......2

    6.......2

    7.......2

    8.......8

    (Sorry I used dots instead spaces to distinguish columns. Trailing spaces or tabs don't work in this form)

    I wanted to use OUTPUT for that but this is not trivial and SELECT doesn't have OUTPUT clause.

  • From what I can gather, for each entry in t, you want to have a result of t.id, and the minimum t.id where t.col = inserted.col - correct?

    In order to accomplish this in the OUTPUT clause, you would need to be able to perform a sub-query in the output clause - which you can't do.

    What is the source of the data being inserted into t? What you'll have to do is probably join to that to get the minimum id for that col.

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • You can't group results of an OUTPUT clause without storing intermediate results into a table.

    In this case, it would be the same thing as re-issuing the group by query:

    DECLARE @t TABLE

    (

    id INT NOT NULL PRIMARY KEY,

    col NVARCHAR(100) NOT NULL

    )

    INSERT INTO @T

    VALUES(1,'A'),(2,'B'),(3,'B'),(4,'B'),(5,'B'),(6,'B'),(7,'B'),(8,'C')

    DECLARE @Map_t_to_r TABLE

    (

    id INT NOT NULL,

    groupId INT NOT NULL

    )

    DECLARE @r TABLE

    (

    id int PRIMARY KEY,

    col NVARCHAR(100) NOT NULL

    )

    ;WITH groupedResults AS (

    SELECT MIN(id) AS id, col

    FROM @t

    GROUP BY col

    )

    INSERT INTO @Map_t_to_r

    SELECT A.id, B.id

    FROM @t AS A

    INNER JOIN groupedResults AS B

    ON A.col = B.col

    INSERT INTO @r

    SELECT MIN(id) AS id, col

    FROM @t

    GROUP BY col

    -- Gianluca Sartori

  • Please see my comments in brackets.

    From what I can gather, for each entry in t, you want to have a result of t.id, and the minimum t.id where t.col = inserted.col - correct?

    {Yes}

    In order to accomplish this in the OUTPUT clause, you would need to be able to perform a sub-query in the output clause - which you can't do.

    {Seems that yes, I'm looking for some tricky solution 🙂 }

    What is the source of the data being inserted into t? What you'll have to do is probably join to that to get the minimum id for that col.

    {Data in t table are incoming data in persistent table.

    The process is ETL. Because GROUP BY will change the granularity we need to have track what have been grouped in mapping. This will allow us build reporting.

    I know we can build additional queries to get that mapping. Actually, the insert query is not so simple as described here and tables are big. So that will add ETL duration time overhead}

  • Well, it looks to me like you are going to have to get your mapping by an aggregate query against the base table (or, if it is being pushed into "t" as a staging table, off of that). There is only so much that the OUTPUT clause can do, and a subquery in it is definitely out - it raises an error.

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply