Aggregation

Question

Aggregation

BSavoie

SSCertifiable

Points: 6309
More actions
January 22, 2010 at 3:13 pm

#136660

I'm confused about how to aggregate a value in SSIS. Let's say I have a task that simply exports the employees from the DB to a flat file. I also have to keep track of MAX(EmpoyeeId) so that the next time I do this, I start with all the new employees that have been added since the last time.
How/where do I handle that aggregate? I just want a way to get that value in a variable. My first instinct was to throw an aggregate on the data flow, but I don't think that's right. That will get executed once for every record in the source. That's not what I want. I could also just drop a SQL Task on the control flow and handle it that way, but technically speaking, that's an entirely different query. It's not actually based on the Employees in the data flow.
Maybe a better way to ask the question is how do I get a MAX value from a data source into a user variable.
.

Viewing 8 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply

Phil Parkin SSC Guru Points: 247202 More actions · Answer 1

Use a SQL Server statement as you suggest (you can get the result of a single-valued T-SQL query into a variable (say MaxID) directly from the Execute SQL task).

Then use that returned value in your source query for the dataflow (pseudo-code: SELECT f1, f2, ... FROM EMPLOYEE WHERE ID <= User::MaxID). That way you know that things are consistent ...

...but, getting to the crux of the matter, how does knowing the MaxID now (for the current extract) help you when the job next runs?

BSavoie SSCertifiable Points: 6309 More actions · Answer 2

Thanks Phil, this is really not about employees at all. I just thought that might make my example a little easier to explain. This is really about exporting data to a vendor once per month. Each month, I have to make sure I don't send anything that was sent the previous month, so I keep track of the MAX record id every month and use that as a starting point the next month.

I'll give your example a try. So anything I put in the data flow gets executed once for each record. Do I understand that correctly?

.

Phil Parkin SSC Guru Points: 247202 More actions · Answer 3

Phil Parkin

SSC Guru

Points: 247202

January 23, 2010 at 12:37 pm

#1107799

Correct,

Jeffrey Williams SSC Guru Points: 90351 More actions · Answer 4

BSavoie (1/23/2010)
Thanks Phil, this is really not about employees at all. I just thought that might make my example a little easier to explain. This is really about exporting data to a vendor once per month. Each month, I have to make sure I don't send anything that was sent the previous month, so I keep track of the MAX record id every month and use that as a starting point the next month.
I'll give your example a try. So anything I put in the data flow gets executed once for each record. Do I understand that correctly?

I think you need to consider this a bit more. In your example, what happens if you send that data on Employee A last month, that employee is terminated this month - but, since it has already been sent you won't be sending it again?

In other words, what about updates to the system that need to be updated in the downstream systems? How are you going to identify those?

Find the column that identifies the last updated date for that entity (or creation date). Then use that date to filter for any records that have been modified since the last time you extracted the data.

Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

― Charles R. Swindoll

How to post questions to get better answers faster
Managing Transaction Logs

BSavoie SSCertifiable Points: 6309 More actions · Answer 5

Thanks Jeffery. I was just using Employees as an example to make things easier. Actually what I'm really sending is all the Guest Reservations that have stayed at a hotel over the last month. So, it's not really a volitle entity like an employee. That was probably a bad example. Thanks for the feedback!

.

Tim Curtin Say Hey Kid Points: 665 More actions · Answer 6

If its as simple as a date range, why not put the following query? Or if you're looking to calc reservations over time, then group by month,

SELECT *
FROM Reservations
WHERE ReservationDate BETWEEN @StartDate AND @EndDate

Or, Rolling 12 months Trend.
SELECT YEAR(ReservationDate) AS ResYear, MONTH(ReservationDate) AS ResMonth, COUNT(*) As ResCount
FROM Reservations
WHERE ReservationDate > DATEDIFF(yy, -1,GETDATE())
GROUP BY YEAR(ReservationDate), Month(ReservationDate)
ORDER BY YEAR(ReservationDate), Month(ReservationDate)

Phil Parkin SSC Guru Points: 247202 More actions · Answer 7

Tim Curtin - Thursday, February 22, 2018 6:28 AM
If its as simple as a date range, why not put the following query? Or if you're looking to calc reservations over time, then group by month,
SELECT *
FROM Reservations
WHERE ReservationDate BETWEEN @StartDate AND @EndDate
Or, Rolling 12 months Trend.
SELECT YEAR(ReservationDate) AS ResYear, MONTH(ReservationDate) AS ResMonth, COUNT(*) As ResCount
FROM Reservations
WHERE ReservationDate > DATEDIFF(yy, -1,GETDATE())
GROUP BY YEAR(ReservationDate), Month(ReservationDate)
ORDER BY YEAR(ReservationDate), Month(ReservationDate)

Note that you are responding to a thread which is 8 years old 🙂