March 31, 2010 at 12:54 pm
Ok Experts I need so help. I will try to explain this as best I can while also trying my best to keep it short.
We have a set of rown imported from one table to another. easy so far right! the records look something like this.
Account Meter Date Usage
1234 1234 02/01/10 500
1234 1234 01/01/10 600
1234 1234 12/01/10 400
So the data being brought over consists of the last 12 reads. more then one read can happen in a month so this may not represent the last 12 months exactly.
Here is were the challenge comes! In the destination there are month buckets for usage. The starting position for these buckets is the last month read so in the example above we would fill buck 2 with with Feb. and 1 with Jan. we would then start over at bucket 12 and fill with december of the previous year. so it would look like this
Account Meter BKT1 BKT2 ...........BKT12
1234 1234 600 500 400
Since more then on reading can exist in the same month I can not simply use the month as the bucket number. In the old DTS this was accomplished by running through a series of loops variables to fill the buskets and skipping the output until the last 12 reading had been read and then writting the variables out to the destination using the variables. This works great and I have no problem with the way it was done in DTS.
My question for the experts though is simply first is there an equivalant to DTSTransformStat_SkipInsert in SSIS. if there is I can keep this in a script task but my 3 hrs google exploration of this command has not rendered a result for SSIS.
Second if this can not be done in the script task how in the world can I accomplish this I have been wracking my brain on this for two days now and I have tried many failed ways to do this.
The only working solution I have come up with is to handle the entire operation in an SSIS script reading all data into and array and then outputting after each account. This however is horribly ugly and performance is terrible.
Dan
If only I could snap my figures and have all the correct indexes apear and the buffer clean and.... Start day dream here.
March 31, 2010 at 3:28 pm
Your example data shows 1 entry per month. Can you give me an example of what your data would look like when there are multiple entries in a given month? Please include an example of how your final data set should look with the new sample data showing the multi-read scenario.
March 31, 2010 at 3:42 pm
this is what the source data could look like. Notice there are two readings in february and June and no reading in september.
010012080 00068276 2010-02-26 00:00:00.000 440.00000
010012080 00068276 2010-02-01 00:00:00.000 650.00000
010012080 00068276 2010-01-04 00:00:00.000 589.00000
010012080 00068276 2009-12-01 00:00:00.000 713.00000
010012080 00068276 2009-11-02 00:00:00.000 996.00000
010012080 00068276 2009-10-01 00:00:00.000 949.00000
010012080 00068276 2009-08-28 00:00:00.000 1145.00000
010012080 00068276 2009-07-29 00:00:00.000 882.00000
010012080 00068276 2009-06-30 00:00:00.000 761.00000
010012080 00068276 2009-06-01 00:00:00.000 472.00000
010012080 00068276 2009-05-01 00:00:00.000 577.00000
010012080 00068276 2009-04-01 00:00:00.000 524.00000
This source would need to go to an output that looks like this
account meter bkt1 bkt2 bkt3 bkt4 bkt5 bkt6 bkt7 bkt8 bkt9 bkt10 bkt11 bkt12
010012080 00068276 650 440 524 577 472 761 882 1145 949 996 713 589
Dan
If only I could snap my figures and have all the correct indexes apear and the buffer clean and.... Start day dream here.
March 31, 2010 at 4:06 pm
Well, I won't get into a normalization bashing here, but In my opinion, you should store the data in some sort of 'Readings' table and create a view to pivot it out to show the 12 buckets.
If you plan to keep going the route that you are going and storing it in a pivoted table, you could use the example statement that I've created below inside your Source adapter for your Data flow.
DECLARE @Table TABLE (Account varchar(25) , Meter varchar(25), ReadDate datetime, Usage int)
INSERT INTO @Table
SELECT '010012080', '00068276', '2010-02-26 00:00:00.000', 440.00000 UNION ALL
SELECT '010012080', '00068276', '2010-02-01 00:00:00.000', 650.00000 UNION ALL
SELECT '010012080', '00068276', '2010-01-04 00:00:00.000', 589.00000 UNION ALL
SELECT '010012080', '00068276', '2009-12-01 00:00:00.000', 713.00000 UNION ALL
SELECT '010012080', '00068276', '2009-11-02 00:00:00.000', 996.00000 UNION ALL
SELECT '010012080', '00068276', '2009-10-01 00:00:00.000', 949.00000 UNION ALL
SELECT '010012080', '00068276', '2009-08-28 00:00:00.000', 1145.00000 UNION ALL
SELECT '010012080', '00068276', '2009-07-29 00:00:00.000', 882.00000 UNION ALL
SELECT '010012080', '00068276', '2009-06-30 00:00:00.000', 761.00000 UNION ALL
SELECT '010012080', '00068276', '2009-06-01 00:00:00.000', 472.00000 UNION ALL
SELECT '010012080', '00068276', '2009-05-01 00:00:00.000', 577.00000 UNION ALL
SELECT '010012080', '00068276', '2009-04-01 00:00:00.000', 524.00000
SELECT Account,
Meter,
SUM(CASE BucketNumber WHEN 1 THEN Usage END) as BKT1,
SUM(CASE BucketNumber WHEN 2 THEN Usage END) as BKT2,
SUM(CASE BucketNumber WHEN 3 THEN Usage END) as BKT3,
SUM(CASE BucketNumber WHEN 4 THEN Usage END) as BKT4,
SUM(CASE BucketNumber WHEN 5 THEN Usage END) as BKT5,
SUM(CASE BucketNumber WHEN 6 THEN Usage END) as BKT6,
SUM(CASE BucketNumber WHEN 7 THEN Usage END) as BKT7,
SUM(CASE BucketNumber WHEN 8 THEN Usage END) as BKT8,
SUM(CASE BucketNumber WHEN 9 THEN Usage END) as BKT9,
SUM(CASE BucketNumber WHEN 10 THEN Usage END) as BKT10,
SUM(CASE BucketNumber WHEN 11 THEN Usage END) as BKT11,
SUM(CASE BucketNumber WHEN 12 THEN Usage END) as BKT12
FROM (
SELECT ROW_NUMBER() OVER(Partition BY Account, Meter ORDER BY ReadDate) as BucketNumber,
t.*
FROM@Table t
) t
GROUP BY Account, Meter
April 1, 2010 at 7:22 am
John,
Thanks for the script. Unfortunatley it will not work for what I need since the bucket starting position is determined by the month of the last reading. So in the example I gave the first reading should be the reading on 2/26/10 and that would go in bucket 2. from there you would count backwards wrapping around to 12 after bucket 1.
I completely unerstand what you are saying about chaging the table structure but unfortunatley the destination table is on a server I do not manage. I can not change how or why they set it up the way they did all I can do is try to find a way to live with it. We have quite a few legacy systems that I am dealing with similiar issues on.
I do think I have a working solution though now. I am using a script task to basically output the row using variables and a loop. Each pass I fille the appropriate variable and then if the record is not compelte I pass a column out as a flag. the next step is then a conditional split in which the record is thrown out if the flag is not set to true. This allows me to build the record over 12 reads and then output the completed record to the destination.
Dan
If only I could snap my figures and have all the correct indexes apear and the buffer clean and.... Start day dream here.
April 1, 2010 at 10:00 am
Can you just swap the ordering around, this seems much simpler?
DECLARE @Table TABLE (Account varchar(25) , Meter varchar(25), ReadDate datetime, Usage int)
INSERT INTO @Table
SELECT '010012080', '00068276', '2010-02-26 00:00:00.000', 440.00000 UNION ALL
SELECT '010012080', '00068276', '2010-02-01 00:00:00.000', 650.00000 UNION ALL
SELECT '010012080', '00068276', '2010-01-04 00:00:00.000', 589.00000 UNION ALL
SELECT '010012080', '00068276', '2009-12-01 00:00:00.000', 713.00000 UNION ALL
SELECT '010012080', '00068276', '2009-11-02 00:00:00.000', 996.00000 UNION ALL
SELECT '010012080', '00068276', '2009-10-01 00:00:00.000', 949.00000 UNION ALL
SELECT '010012080', '00068276', '2009-08-28 00:00:00.000', 1145.00000 UNION ALL
SELECT '010012080', '00068276', '2009-07-29 00:00:00.000', 882.00000 UNION ALL
SELECT '010012080', '00068276', '2009-06-30 00:00:00.000', 761.00000 UNION ALL
SELECT '010012080', '00068276', '2009-06-01 00:00:00.000', 472.00000 UNION ALL
SELECT '010012080', '00068276', '2009-05-01 00:00:00.000', 577.00000 UNION ALL
SELECT '010012080', '00068276', '2009-04-01 00:00:00.000', 524.00000
SELECT Account,
Meter,
SUM(CASE BucketNumber WHEN 11 THEN Usage END) as BKT1,
SUM(CASE BucketNumber WHEN 12 THEN Usage END) as BKT2,
SUM(CASE BucketNumber WHEN 1 THEN Usage END) as BKT3,
SUM(CASE BucketNumber WHEN 2 THEN Usage END) as BKT4,
SUM(CASE BucketNumber WHEN 3 THEN Usage END) as BKT5,
SUM(CASE BucketNumber WHEN 4 THEN Usage END) as BKT6,
SUM(CASE BucketNumber WHEN 5 THEN Usage END) as BKT7,
SUM(CASE BucketNumber WHEN 6 THEN Usage END) as BKT8,
SUM(CASE BucketNumber WHEN 7 THEN Usage END) as BKT9,
SUM(CASE BucketNumber WHEN 8 THEN Usage END) as BKT10,
SUM(CASE BucketNumber WHEN 9 THEN Usage END) as BKT11,
SUM(CASE BucketNumber WHEN 10 THEN Usage END) as BKT12
FROM (
SELECT ROW_NUMBER() OVER(Partition BY Account, Meter ORDER BY ReadDate) as BucketNumber,
t.*
FROM @Table t
) t
GROUP BY Account, Meter
April 1, 2010 at 10:03 am
The problem is that the starting bucket is a moving target. for example since today is the first of april the starting bucket for anyone read to today will be bucket 4. but some of the accounts will be bucket 3 or bucket 1. Each account will have a different starting point within the same dataset.
Dan
If only I could snap my figures and have all the correct indexes apear and the buffer clean and.... Start day dream here.
April 1, 2010 at 10:19 am
Dan.Humphries (4/1/2010)
John,Thanks for the script. Unfortunatley it will not work for what I need since the bucket starting position is determined by the month of the last reading. So in the example I gave the first reading should be the reading on 2/26/10 and that would go in bucket 2. from there you would count backwards wrapping around to 12 after bucket 1.
Just curious, your sample data shows 2 readings from Feb. and none in March. Wouldn't the second Feb reading go to bucket 3?
April 1, 2010 at 10:26 am
Assuming any for of logic applied to the destination DB I would tend to agree with you but no infact the starting bucket number is the month of the first reading. So last reading in Feb is the starting bucket of 2 and the first reeading in Feb. would go in bucket 1. Jan actually ends up in Bucket 12 if you can follow that logic.
Dan
If only I could snap my figures and have all the correct indexes apear and the buffer clean and.... Start day dream here.
April 1, 2010 at 11:33 am
Well, this works to catch the moving target. You may stick with what you have, but this would still work inside of an OLE DB Source provided the source DB is a SQL Server system.
DECLARE @Table TABLE (Account varchar(25) , Meter varchar(25), ReadDate datetime, Usage int)
INSERT INTO @Table
SELECT '010012080', '00068276', '2010-02-26 00:00:00.000', 440.00000 UNION ALL
SELECT '010012080', '00068276', '2010-02-01 00:00:00.000', 650.00000 UNION ALL
SELECT '010012080', '00068276', '2010-01-04 00:00:00.000', 589.00000 UNION ALL
SELECT '010012080', '00068276', '2009-12-01 00:00:00.000', 713.00000 UNION ALL
SELECT '010012080', '00068276', '2009-11-02 00:00:00.000', 996.00000 UNION ALL
SELECT '010012080', '00068276', '2009-10-01 00:00:00.000', 949.00000 UNION ALL
SELECT '010012080', '00068276', '2009-08-28 00:00:00.000', 1145.00000 UNION ALL
SELECT '010012080', '00068276', '2009-07-29 00:00:00.000', 882.00000 UNION ALL
SELECT '010012080', '00068276', '2009-06-30 00:00:00.000', 761.00000 UNION ALL
SELECT '010012080', '00068276', '2009-06-01 00:00:00.000', 472.00000 UNION ALL
SELECT '010012080', '00068276', '2009-05-01 00:00:00.000', 577.00000 UNION ALL
SELECT '010012080', '00068276', '2009-04-01 00:00:00.000', 524.00000
DECLARE @CurrentDatedatetime,
@LastReadingDatedatetime,
@LastReadingBucketint,
@LastReadingRawOrderint,
@ShiftValueint
SET @CurrentDate = '2010-03-31'
SELECT @LastReadingDate= MAX(ReadDate),
@LastReadingBucket= MONTH(MAX(ReadDate))
FROM@Table
WHEREReadDate <= @CurrentDate
SELECT @LastReadingRawOrder = RawOrder
FROM(
SELECT ROW_NUMBER() OVER(Partition BY Account, Meter ORDER BY MONTH(ReadDate), DAY(ReadDate)) as RawOrder,
t.*
FROM @Table t
) t
WHERE ReadDate = @LastReadingDate
SET@ShiftValue = @LastReadingRawOrder - @LastReadingBucket
SELECT Account,
Meter,
SUM(CASE BucketNumber WHEN 1 THEN Usage END) as BKT1,
SUM(CASE BucketNumber WHEN 2 THEN Usage END) as BKT2,
SUM(CASE BucketNumber WHEN 3 THEN Usage END) as BKT3,
SUM(CASE BucketNumber WHEN 4 THEN Usage END) as BKT4,
SUM(CASE BucketNumber WHEN 5 THEN Usage END) as BKT5,
SUM(CASE BucketNumber WHEN 6 THEN Usage END) as BKT6,
SUM(CASE BucketNumber WHEN 7 THEN Usage END) as BKT7,
SUM(CASE BucketNumber WHEN 8 THEN Usage END) as BKT8,
SUM(CASE BucketNumber WHEN 9 THEN Usage END) as BKT9,
SUM(CASE BucketNumber WHEN 10 THEN Usage END) as BKT10,
SUM(CASE BucketNumber WHEN 11 THEN Usage END) as BKT11,
SUM(CASE BucketNumber WHEN 12 THEN Usage END) as BKT12
FROM(
SELECTCASE BucketNumber WHEN 0 THEN 12 ELSE BucketNumber END as BucketNumber,
t.Account,
t.Meter,
t.ReadDate,
t.Usage
FROM(
SELECT ROW_NUMBER() OVER(Partition BY Account, Meter ORDER BY MONTH(ReadDate), DAY(ReadDate)) as RawOrder,
t.*,
ROW_NUMBER() OVER(Partition BY Account, Meter ORDER BY MONTH(ReadDate), DAY(ReadDate)) - @ShiftValue as BucketNumber
FROM @Table t
) t
) t
GROUP BY Account, Meter
April 1, 2010 at 12:06 pm
That I must say is pure genious! I will deffinetly give that a try.
The only potential hang up I can think of is the sorce is SQL but it is SQL2000. Does 2000 support Row_number? If not I can always run on a 2005 box with a linked case to the 2000 box.
either way I love it. Huge thank you!! that should run in about a 10th of the time my conveluted way is currently running.
Dan
If only I could snap my figures and have all the correct indexes apear and the buffer clean and.... Start day dream here.
April 1, 2010 at 12:20 pm
April 1, 2010 at 12:23 pm
No it is also 2000 but the SSIS will deffinetly be running on a 2005 box so my thought is to just use the 2005 box as the source with a linked server setup. that would allow me to use the Row_number function while pulling from a 2000 box.
Dan
If only I could snap my figures and have all the correct indexes apear and the buffer clean and.... Start day dream here.
April 1, 2010 at 12:28 pm
Yea, that may work. If you set up a DB on your SSIS server to use for this. You could also do this in 2 data flows. Use the first to get the raw data and insert it into a staging table on your SSIS box (assuming you have the DB engine installed) and use the second data flow to grab the rows using the technique that we've discussed.
April 1, 2010 at 12:30 pm
yes we do have active databases on the Intergration server boxes.
Thank you so much again.
Dan
If only I could snap my figures and have all the correct indexes apear and the buffer clean and.... Start day dream here.
Viewing 15 posts - 1 through 15 (of 15 total)
You must be logged in to reply to this topic. Login to reply