Mapping group by

Question

Post reply

Mapping group by

riskaone

Grasshopper

Points: 17
More actions
November 2, 2010 at 5:00 am

#231188

Hello guys,
is there any tricky way to catch mapping of records in group by command? I need to keep mapping of record from input table to inserted table after group by. The mapping should be directly in query (e.g. somehow use OUTPUT clause). I can't run the same query due to performance reason. Please see sample below.
Thanks
table t
idcol
------------------
1A
2B
3B
4B
5B
6B
7B
8C
result table
r
idcol
-----------------
1A
2B
8C
required mapping:
beforeafter
11
22
32
42
52
62
72
88
Script for generating sample:
drop table t
drop table r
go
create table t
(
id int IDENTITY(1,1) PRIMARY KEY,
col NVARCHAR(100) NOT NULL
)
go
create table r
(
id int PRIMARY KEY,
col NVARCHAR(100) NOT NULL
)
go
insert t (col)
VALUES('A'),('B'),('B'),('B'),('B'),('B'),('B'),('C')
insert r (id, col)
SELECT
MIN(id) AS id,
col
from t
group by col
select * from t
select * from r

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

spaghettidba SSC Guru Points: 105732 More actions · Answer 1

I don't know if I understood correctly what you're after.

Give this a try:

DECLARE @t TABLE

(

id INT NOT NULL,

col NVARCHAR(100) NOT NULL

)

INSERT INTO @T

VALUES(1,'A'),(2,'B'),(3,'B'),(4,'B'),(5,'B'),(6,'B'),(7,'B'),(8,'C')

;WITH groupedResults AS (

SELECT MIN(id) AS id, col

FROM @t

GROUP BY col

)

UPDATE A

SET id = B.id

OUTPUT INSERTED.id AS id, INSERTED.col

FROM @t AS A

INNER JOIN groupedResults AS B

ON A.col = B.col

-- Gianluca Sartori

riskaone Grasshopper Points: 17 More actions · Answer 2

Thanks SSCrazy, this is good idea. But the point is that I have one big insert where is the aggregation and group by.

I have to get that mapping into one INSERT command. Your idea is UPDATE command source joined to result which is the second command and additional performance.

The table is big, let's say about 500k rows after "group by" the target table will be about 400k rows.

spaghettidba SSC Guru Points: 105732 More actions · Answer 3

You mentioned two tables, but I can't understand how they come into play.

Can you post:

a) sample initial data in your tables

b) your desidered output in both tables

-- Gianluca Sartori

riskaone Grasshopper Points: 17 More actions · Answer 4

Please see my initial post:

source: table t

target: table r

operation:

insert r (id, col)

SELECT

MIN(id) AS id,

col

from t

group by col

The mapping required:

required mapping:

beforeafter

1 1

2 2

3 2

4 2

5 2

6 2

7 2

8 8

spaghettidba SSC Guru Points: 105732 More actions · Answer 5

This is exactly what I don't get.

What do you mean with "mapping"? Should the minimum ID for each "col" be stored in an additional column in table "t"? Do you want to output this datum to a result set to be consumed on the app side?

-- Gianluca Sartori

riskaone Grasshopper Points: 17 More actions · Answer 6

Yes, the mapping need to be stored in table.

Mapping Table:

Map_t_to_r

--------------------------

t_IDr_ID

1.......1

2.......2

3.......2

4.......2

5.......2

6.......2

7.......2

8.......8

(Sorry I used dots instead spaces to distinguish columns. Trailing spaces or tabs don't work in this form)

I wanted to use OUTPUT for that but this is not trivial and SELECT doesn't have OUTPUT clause.

WayneS SSC Guru Points: 95460 More actions · Answer 7

From what I can gather, for each entry in t, you want to have a result of t.id, and the minimum t.id where t.col = inserted.col - correct?

In order to accomplish this in the OUTPUT clause, you would need to be able to perform a sub-query in the output clause - which you can't do.

What is the source of the data being inserted into t? What you'll have to do is probably join to that to get the minimum id for that col.

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

spaghettidba SSC Guru Points: 105732 More actions · Answer 8

You can't group results of an OUTPUT clause without storing intermediate results into a table.

In this case, it would be the same thing as re-issuing the group by query:

DECLARE @t TABLE

(

id INT NOT NULL PRIMARY KEY,

col NVARCHAR(100) NOT NULL

)

INSERT INTO @T

VALUES(1,'A'),(2,'B'),(3,'B'),(4,'B'),(5,'B'),(6,'B'),(7,'B'),(8,'C')

DECLARE @Map_t_to_r TABLE

(

id INT NOT NULL,

groupId INT NOT NULL

)

DECLARE @r TABLE

(

id int PRIMARY KEY,

col NVARCHAR(100) NOT NULL

)

;WITH groupedResults AS (

SELECT MIN(id) AS id, col

FROM @t

GROUP BY col

)

INSERT INTO @Map_t_to_r

SELECT A.id, B.id

FROM @t AS A

INNER JOIN groupedResults AS B

ON A.col = B.col

INSERT INTO @r

SELECT MIN(id) AS id, col

FROM @t

GROUP BY col

-- Gianluca Sartori

riskaone Grasshopper Points: 17 More actions · Answer 9

Please see my comments in brackets.

From what I can gather, for each entry in t, you want to have a result of t.id, and the minimum t.id where t.col = inserted.col - correct?

{Yes}

In order to accomplish this in the OUTPUT clause, you would need to be able to perform a sub-query in the output clause - which you can't do.

{Seems that yes, I'm looking for some tricky solution 🙂 }

What is the source of the data being inserted into t? What you'll have to do is probably join to that to get the minimum id for that col.

{Data in t table are incoming data in persistent table.

The process is ETL. Because GROUP BY will change the granularity we need to have track what have been grouped in mapping. This will allow us build reporting.

I know we can build additional queries to get that mapping. Actually, the insert query is not so simple as described here and tables are big. So that will add ETL duration time overhead}

WayneS SSC Guru Points: 95460 More actions · Answer 10

Well, it looks to me like you are going to have to get your mapping by an aggregate query against the base table (or, if it is being pushed into "t" as a staging table, off of that). There is only so much that the OUTPUT clause can do, and a subquery in it is definitely out - it raises an error.

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2