November 2, 2010 at 5:00 am
Hello guys,
is there any tricky way to catch mapping of records in group by command? I need to keep mapping of record from input table to inserted table after group by. The mapping should be directly in query (e.g. somehow use OUTPUT clause). I can't run the same query due to performance reason. Please see sample below.
Thanks
table t
idcol
------------------
1A
2B
3B
4B
5B
6B
7B
8C
result table
r
idcol
-----------------
1A
2B
8C
required mapping:
beforeafter
11
22
32
42
52
62
72
88
Script for generating sample:
drop table t
drop table r
go
create table t
(
id int IDENTITY(1,1) PRIMARY KEY,
col NVARCHAR(100) NOT NULL
)
go
create table r
(
id int PRIMARY KEY,
col NVARCHAR(100) NOT NULL
)
go
insert t (col)
VALUES('A'),('B'),('B'),('B'),('B'),('B'),('B'),('C')
insert r (id, col)
SELECT
MIN(id) AS id,
col
from t
group by col
select * from t
select * from r
November 2, 2010 at 5:14 am
I don't know if I understood correctly what you're after.
Give this a try:
DECLARE @t TABLE
(
id INT NOT NULL,
col NVARCHAR(100) NOT NULL
)
INSERT INTO @T
VALUES(1,'A'),(2,'B'),(3,'B'),(4,'B'),(5,'B'),(6,'B'),(7,'B'),(8,'C')
;WITH groupedResults AS (
SELECT MIN(id) AS id, col
FROM @t
GROUP BY col
)
UPDATE A
SET id = B.id
OUTPUT INSERTED.id AS id, INSERTED.col
FROM @t AS A
INNER JOIN groupedResults AS B
ON A.col = B.col
-- Gianluca Sartori
November 2, 2010 at 5:41 am
Thanks SSCrazy, this is good idea. But the point is that I have one big insert where is the aggregation and group by.
I have to get that mapping into one INSERT command. Your idea is UPDATE command source joined to result which is the second command and additional performance.
The table is big, let's say about 500k rows after "group by" the target table will be about 400k rows.
November 2, 2010 at 6:50 am
You mentioned two tables, but I can't understand how they come into play.
Can you post:
a) sample initial data in your tables
b) your desidered output in both tables
-- Gianluca Sartori
November 2, 2010 at 6:58 am
Please see my initial post:
source: table t
target: table r
operation:
insert r (id, col)
SELECT
MIN(id) AS id,
col
from t
group by col
The mapping required:
required mapping:
beforeafter
1 1
2 2
3 2
4 2
5 2
6 2
7 2
8 8
November 2, 2010 at 7:09 am
This is exactly what I don't get.
What do you mean with "mapping"? Should the minimum ID for each "col" be stored in an additional column in table "t"? Do you want to output this datum to a result set to be consumed on the app side?
-- Gianluca Sartori
November 2, 2010 at 7:16 am
Yes, the mapping need to be stored in table.
Mapping Table:
Map_t_to_r
--------------------------
t_IDr_ID
1.......1
2.......2
3.......2
4.......2
5.......2
6.......2
7.......2
8.......8
(Sorry I used dots instead spaces to distinguish columns. Trailing spaces or tabs don't work in this form)
I wanted to use OUTPUT for that but this is not trivial and SELECT doesn't have OUTPUT clause.
November 2, 2010 at 7:27 am
From what I can gather, for each entry in t, you want to have a result of t.id, and the minimum t.id where t.col = inserted.col - correct?
In order to accomplish this in the OUTPUT clause, you would need to be able to perform a sub-query in the output clause - which you can't do.
What is the source of the data being inserted into t? What you'll have to do is probably join to that to get the minimum id for that col.
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
November 2, 2010 at 7:40 am
You can't group results of an OUTPUT clause without storing intermediate results into a table.
In this case, it would be the same thing as re-issuing the group by query:
DECLARE @t TABLE
(
id INT NOT NULL PRIMARY KEY,
col NVARCHAR(100) NOT NULL
)
INSERT INTO @T
VALUES(1,'A'),(2,'B'),(3,'B'),(4,'B'),(5,'B'),(6,'B'),(7,'B'),(8,'C')
DECLARE @Map_t_to_r TABLE
(
id INT NOT NULL,
groupId INT NOT NULL
)
DECLARE @r TABLE
(
id int PRIMARY KEY,
col NVARCHAR(100) NOT NULL
)
;WITH groupedResults AS (
SELECT MIN(id) AS id, col
FROM @t
GROUP BY col
)
INSERT INTO @Map_t_to_r
SELECT A.id, B.id
FROM @t AS A
INNER JOIN groupedResults AS B
ON A.col = B.col
INSERT INTO @r
SELECT MIN(id) AS id, col
FROM @t
GROUP BY col
-- Gianluca Sartori
November 2, 2010 at 7:44 am
Please see my comments in brackets.
From what I can gather, for each entry in t, you want to have a result of t.id, and the minimum t.id where t.col = inserted.col - correct?
{Yes}
In order to accomplish this in the OUTPUT clause, you would need to be able to perform a sub-query in the output clause - which you can't do.
{Seems that yes, I'm looking for some tricky solution 🙂 }
What is the source of the data being inserted into t? What you'll have to do is probably join to that to get the minimum id for that col.
{Data in t table are incoming data in persistent table.
The process is ETL. Because GROUP BY will change the granularity we need to have track what have been grouped in mapping. This will allow us build reporting.
I know we can build additional queries to get that mapping. Actually, the insert query is not so simple as described here and tables are big. So that will add ETL duration time overhead}
November 2, 2010 at 8:22 am
Well, it looks to me like you are going to have to get your mapping by an aggregate query against the base table (or, if it is being pushed into "t" as a staging table, off of that). There is only so much that the OUTPUT clause can do, and a subquery in it is definitely out - it raises an error.
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply