January 22, 2010 at 3:13 pm
I'm confused about how to aggregate a value in SSIS. Let's say I have a task that simply exports the employees from the DB to a flat file. I also have to keep track of MAX(EmpoyeeId) so that the next time I do this, I start with all the new employees that have been added since the last time.
How/where do I handle that aggregate? I just want a way to get that value in a variable. My first instinct was to throw an aggregate on the data flow, but I don't think that's right. That will get executed once for every record in the source. That's not what I want. I could also just drop a SQL Task on the control flow and handle it that way, but technically speaking, that's an entirely different query. It's not actually based on the Employees in the data flow.
Maybe a better way to ask the question is how do I get a MAX value from a data source into a user variable.
.
January 23, 2010 at 2:18 am
Use a SQL Server statement as you suggest (you can get the result of a single-valued T-SQL query into a variable (say MaxID) directly from the Execute SQL task).
Then use that returned value in your source query for the dataflow (pseudo-code: SELECT f1, f2, ... FROM EMPLOYEE WHERE ID <= User::MaxID). That way you know that things are consistent ...
...but, getting to the crux of the matter, how does knowing the MaxID now (for the current extract) help you when the job next runs?
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
January 23, 2010 at 12:28 pm
Thanks Phil, this is really not about employees at all. I just thought that might make my example a little easier to explain. This is really about exporting data to a vendor once per month. Each month, I have to make sure I don't send anything that was sent the previous month, so I keep track of the MAX record id every month and use that as a starting point the next month.
I'll give your example a try. So anything I put in the data flow gets executed once for each record. Do I understand that correctly?
.
January 23, 2010 at 12:37 pm
Correct,
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
January 23, 2010 at 2:08 pm
BSavoie (1/23/2010)
Thanks Phil, this is really not about employees at all. I just thought that might make my example a little easier to explain. This is really about exporting data to a vendor once per month. Each month, I have to make sure I don't send anything that was sent the previous month, so I keep track of the MAX record id every month and use that as a starting point the next month.I'll give your example a try. So anything I put in the data flow gets executed once for each record. Do I understand that correctly?
I think you need to consider this a bit more. In your example, what happens if you send that data on Employee A last month, that employee is terminated this month - but, since it has already been sent you won't be sending it again?
In other words, what about updates to the system that need to be updated in the downstream systems? How are you going to identify those?
Find the column that identifies the last updated date for that entity (or creation date). Then use that date to filter for any records that have been modified since the last time you extracted the data.
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
January 23, 2010 at 5:29 pm
Thanks Jeffery. I was just using Employees as an example to make things easier. Actually what I'm really sending is all the Guest Reservations that have stayed at a hotel over the last month. So, it's not really a volitle entity like an employee. That was probably a bad example. Thanks for the feedback!
.
February 22, 2018 at 6:28 am
If its as simple as a date range, why not put the following query? Or if you're looking to calc reservations over time, then group by month,
SELECT *
FROM Reservations
WHERE ReservationDate BETWEEN @StartDate AND @EndDate
Or, Rolling 12 months Trend.
SELECT YEAR(ReservationDate) AS ResYear, MONTH(ReservationDate) AS ResMonth, COUNT(*) As ResCount
FROM Reservations
WHERE ReservationDate > DATEDIFF(yy, -1,GETDATE())
GROUP BY YEAR(ReservationDate), Month(ReservationDate)
ORDER BY YEAR(ReservationDate), Month(ReservationDate)
February 22, 2018 at 6:42 am
Tim Curtin - Thursday, February 22, 2018 6:28 AMIf its as simple as a date range, why not put the following query? Or if you're looking to calc reservations over time, then group by month,SELECT *
FROM Reservations
WHERE ReservationDate BETWEEN @StartDate AND @EndDateOr, Rolling 12 months Trend.
SELECT YEAR(ReservationDate) AS ResYear, MONTH(ReservationDate) AS ResMonth, COUNT(*) As ResCount
FROM Reservations
WHERE ReservationDate > DATEDIFF(yy, -1,GETDATE())
GROUP BY YEAR(ReservationDate), Month(ReservationDate)
ORDER BY YEAR(ReservationDate), Month(ReservationDate)
Note that you are responding to a thread which is 8 years old 🙂
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply