December 27, 2018 at 8:45 am
Performing transformation in the SELECT is OK, but staging the data to a table and then performing UPDATE(s) is a waste of time and disk space.
"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho
December 27, 2018 at 8:50 am
below86 - Thursday, December 27, 2018 8:28 AMJeff Moden - Friday, December 21, 2018 2:06 PMOf course, there are those of us that don't use SSIS for any of this. 😀Sounds like heaven Jeff. 🙂 Got any openings? Maybe I could just stop by for a day, or two, maybe a week to try and soak up some of that knowledge. 🙂 I don't mind using SSIS for Extract and Load, but using the tools in SSIS to do the Transform just isn't something I want to do. Now I'll setup an 'Execute SQL task' and put all the SQL in it to do any transformation.
Many years ago, at my prior job, we used a different ETL tool, we had a lot of Transformation built into the tool.. When we switched to SSIS it was a huge pain to get all of that put into SSIS. Then several years later we had to upgrade to a newer version of SSIS and not all of the transformation converted over, another huge pain. I guess that's why I'm hesitant to build any transformation in SSIS. We are using 2008 SSIS now, there has been a project that's sitting out there for over 2 years for us to move to 2016 SSIS. Who knows when that will happen and what pains will come with that.Thanks to everyone for taking the time to read and answer my question.
I tend to use SSIS only as a mean to load the data itself and to control the flow and dependencies - all my packages are built automatically using biml/homegrown c# (using EzApi or SSIS object model directly).
All transforms are done in t-sql either through direct sql or stored procs. Few exceptions to this are transforms done in C# for particular cases (COBOL Sourced or multi record input files - or even capitalization of proper names - not as easily done in SQL)
December 27, 2018 at 9:15 am
Eric M Russell - Thursday, December 27, 2018 8:45 AMPerforming transformation in the SELECT is OK, but staging the data to a table and then performing UPDATE(s) is a waste of time and disk space.
Last I checked disk space is cheap. And I don't 'Update' the data. The data is loaded to a work table, usually all fields defined as VARCHAR(500) sometimes bigger. Then the SQL is used to cleanse, validate, transform the data before putting it into the final table.
Taking a pipe delimited text file with thousands if not hundreds of thousand rows and putting that into the work table then validating. Doing the transform in SQL makes it so much easier to find a field/record with say an invalid date than when you use SSIS to transform, then that data flow step blowing up when trying to transform the data. Finding the step it blows up in is easy, finding the record or maybe even the field could be a huge pain.
IMHO, many ways around the barn. 🙂
-------------------------------------------------------------
we travel not to escape life but for life not to escape us
Don't fear failure, fear regret.
December 27, 2018 at 10:58 am
below86 - Thursday, December 27, 2018 9:15 AMEric M Russell - Thursday, December 27, 2018 8:45 AMPerforming transformation in the SELECT is OK, but staging the data to a table and then performing UPDATE(s) is a waste of time and disk space.Last I checked disk space is cheap. And I don't 'Update' the data. The data is loaded to a work table, usually all fields defined as VARCHAR(500) sometimes bigger. Then the SQL is used to cleanse, validate, transform the data before putting it into the final table.
Taking a pipe delimited text file with thousands if not hundreds of thousand rows and putting that into the work table then validating. Doing the transform in SQL makes it so much easier to find a field/record with say an invalid date than when you use SSIS to transform, then that data flow step blowing up when trying to transform the data. Finding the step it blows up in is easy, finding the record or maybe even the field could be a huge pain.
IMHO, many ways around the barn. 🙂
In SSIS, you can use a Data Conversion task for the transformation. You can configure the Error Output stream to re-direct failed rows into a separate staging table, while the remaining valid rows are inserted into the destination table.
"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho
December 27, 2018 at 1:52 pm
To paraphrase your quote Eric, "The SSIS universe is complicated, but your SSIS life is only as complicated as you choose it to be."
I choose an uncomplicated SSIS life. 😀
-------------------------------------------------------------
we travel not to escape life but for life not to escape us
Don't fear failure, fear regret.
December 27, 2018 at 2:25 pm
below86 - Thursday, December 27, 2018 8:28 AMJeff Moden - Friday, December 21, 2018 2:06 PMOf course, there are those of us that don't use SSIS for any of this. 😀Sounds like heaven Jeff. 🙂 Got any openings? Maybe I could just stop by for a day, or two, maybe a week to try and soak up some of that knowledge. 🙂 I don't mind using SSIS for Extract and Load, but using the tools in SSIS to do the Transform just isn't something I want to do. Now I'll setup an 'Execute SQL task' and put all the SQL in it to do any transformation.
Many years ago, at my prior job, we used a different ETL tool, we had a lot of Transformation built into the tool.. When we switched to SSIS it was a huge pain to get all of that put into SSIS. Then several years later we had to upgrade to a newer version of SSIS and not all of the transformation converted over, another huge pain. I guess that's why I'm hesitant to build any transformation in SSIS. We are using 2008 SSIS now, there has been a project that's sitting out there for over 2 years for us to move to 2016 SSIS. Who knows when that will happen and what pains will come with that.Thanks to everyone for taking the time to read and answer my question.
😀 It would be my pleasure but most "experts" simply won't allow for the methods I use because they've been brainwashed into thinking those methods are a security risk. Someday I may just have to provide them with a user name and password as a test to give them the opportunity to prove to themselves that they should shut up. :hehe:
--Jeff Moden
Change is inevitable... Change for the better is not.
February 7, 2019 at 2:52 pm
Jeff Moden - Thursday, December 20, 2018 10:36 PMSince you're using SQL Server 2012, DATEFROMPARTS with substrings to isolate the parts might be the fastest.Of course my initial thought is to ask that they send the dates in a different format, but I don't know how open they will be to that.
The answer is always "No" unless you ask. If they do say "No", then explain to them how it would help them seem much more intelligent to potential new customers because they'd be using the coveted ISO 8601 format that is recognized world wide. 😉
Well I asked, it was still a No.
Exact response was: "As for as the question regarding the date format,since this is a “standard†spec used with numerous clients, we cannot modify any data elements."
I just wonder if their other clients are doing the same conversion I'll have to do. Oh well you win some you loose some.
-------------------------------------------------------------
we travel not to escape life but for life not to escape us
Don't fear failure, fear regret.
February 7, 2019 at 8:01 pm
below86 - Thursday, February 7, 2019 2:52 PMJeff Moden - Thursday, December 20, 2018 10:36 PMSince you're using SQL Server 2012, DATEFROMPARTS with substrings to isolate the parts might be the fastest.Of course my initial thought is to ask that they send the dates in a different format, but I don't know how open they will be to that.
The answer is always "No" unless you ask. If they do say "No", then explain to them how it would help them seem much more intelligent to potential new customers because they'd be using the coveted ISO 8601 format that is recognized world wide. 😉Well I asked, it was still a No.
Exact response was: "As for as the question regarding the date format,since this is a “standard†spec used with numerous clients, we cannot modify any data elements."I just wonder if their other clients are doing the same conversion I'll have to do. Oh well you win some you loose some.
Man... tough row to hoe. They perpetuate a problem by claiming they have a lot of users that use the format and the only reason why the users adopted it was because they likely were given no choice to begin with. 😀 I got the same thing when the company I was working for was importing files provided by Double-Click.net and Level3. Man, what a train wreck for data. Double-Click.net was sending us the data that looked like it had been exported from Excel and Level3 sent formatted Excel files. I'm sure that the target audience for both companies was for spreadsheet users but we were processing thousands of files per day.
We got through it, though. The original system took 45 minutes just to get ONE file ready for import (and, obviously, we were falling seriously behind). When we got done with it (all done in T-SQL instead of the junk the original system was running it through), we were importing, validating, and distributing the data at a rate of 8-10 files every 2 minutes.
That does bring me to a related subject... the original system used DTS (they were "stuck" on SQL Server 2000) for everything. They wanted to keep the part that found the files and did the file logging. We scrapped everything else and just had DTS call on stored procedures to do everything including the initial import, the validation, transformations, and final distribution of data. You could do the same if you needed to.
--Jeff Moden
Change is inevitable... Change for the better is not.
February 13, 2019 at 6:44 am
Jeff Moden - Thursday, December 27, 2018 2:25 PM😀 It would be my pleasure but most "experts" simply won't allow for the methods I use because they've been brainwashed into thinking those methods are a security risk. Someday I may just have to provide them with a user name and password as a test to give them the opportunity to prove to themselves that they should shut up. :hehe:
I can't begin to tell you how often I would have liked to be able to exercise the option to let someone prove to themselves that they need to shut up... Shame that the universe simply would not provide me the time, and still allow me to get paid.
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
February 22, 2019 at 2:33 pm
Just getting back to working with this, so I thought I would test this out again.
My first run on our Prod server had these results:
I then ran it again on Prod with these results: (I had changed this: TTDC_CHAR CHAR(500) from a CHAR(8))
I then ran on our Dev server: (tried a different version of Date from parts)
I then ran again on Dev, with a change (DECLARE @CHAR_BUCKET CHAR(500) = ''; ) and a different version of CONCAT.
Results are all over, not real sure if Date From Parts is still the best option. Here is the SQL I tested with.
DECLARE @SAMPLE_SIZE INT = 1000000;
CREATE TABLE #TBL_NON_ISO_DATE_CONVERTION
(
TTDC_ID INT IDENTITY(1,1) NOT NULL CONSTRAINT PK_DBO_TBL_NON_ISO_DATE_CONVERTION_TTDC_ID PRIMARY KEY CLUSTERED,
TTDC_CHAR CHAR(500) NOT NULL
);
WITH T(N) AS
( SELECT N
FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) AS X(N)),
NUMS(N) AS (SELECT TOP(@SAMPLE_SIZE) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N
FROM T T1,T T2,T T3,T T4,T T5,T T6,T T7,T T8,T T9),
DATE_STRINGS AS
(
SELECT CONVERT(VARCHAR(8),DATEADD(DAY,ABS(CHECKSUM(NEWID())) % 100000,0),112) AS STR_DATE
FROM NUMS NM
)
INSERT INTO #TBL_NON_ISO_DATE_CONVERTION(TTDC_CHAR)
SELECT CONCAT(RIGHT(DS.STR_DATE,4),LEFT(DS.STR_DATE,4))
FROM DATE_STRINGS AS DS
;
DECLARE @timer TABLE (T_TEXT VARCHAR(30) NOT NULL, T_TS DATETIME2(7) NOT NULL DEFAULT(SYSDATETIME()));
DECLARE @INT_BUCKET INT = 0;
DECLARE @TINYINT_BUCKET TINYINT = 0;
DECLARE @DATE_BUCKET DATE = CONVERT(DATE, GETDATE(),0);
DECLARE @CHAR_BUCKET CHAR(500) = '';
INSERT INTO @timer(T_TEXT) VALUES('DRY RUN');
SELECT @CHAR_BUCKET = TC.TTDC_CHAR
FROM #TBL_NON_ISO_DATE_CONVERTION AS TC;
INSERT INTO @timer(T_TEXT) VALUES('DRY RUN');
INSERT INTO @timer(T_TEXT) VALUES('CAST STUFF STUFF');
SELECT @DATE_BUCKET = CAST(STUFF(STUFF(TC.TTDC_CHAR, 3, 0, '/'), 6, 0, '/') AS DATE)
FROM #TBL_NON_ISO_DATE_CONVERTION AS TC;
INSERT INTO @timer(T_TEXT) VALUES('CAST STUFF STUFF');
INSERT INTO @timer(T_TEXT) VALUES('CAST RIGHT LEFT TRIM');
SELECT @DATE_BUCKET = CAST(RIGHT(RTRIM(TC.TTDC_CHAR), 4) + LEFT(LTRIM(TC.TTDC_CHAR), 4) AS DATE)
FROM #TBL_NON_ISO_DATE_CONVERTION AS TC;
INSERT INTO @timer(T_TEXT) VALUES('CAST RIGHT LEFT TRIM');
INSERT INTO @timer(T_TEXT) VALUES('CONCAT RIGHT LEFT');
SELECT @DATE_BUCKET = CONVERT(DATE,CONCAT(RIGHT(TC.TTDC_CHAR,4),LEFT(TC.TTDC_CHAR,4)),112)
FROM #TBL_NON_ISO_DATE_CONVERTION AS TC;
INSERT INTO @timer(T_TEXT) VALUES('CONCAT RIGHT LEFT');
INSERT INTO @timer(T_TEXT) VALUES('CONCAT RIGHT LEFT2');
SELECT @DATE_BUCKET = CONCAT(RIGHT(TC.TTDC_CHAR,4),LEFT(TC.TTDC_CHAR,4))
FROM #TBL_NON_ISO_DATE_CONVERTION AS TC;
INSERT INTO @timer(T_TEXT) VALUES('CONCAT RIGHT LEFT2');
INSERT INTO @timer(T_TEXT) VALUES('DATEFROMPARTS');
SELECT @DATE_BUCKET = DATEFROMPARTS(
CONVERT(INT,SUBSTRING(TC.TTDC_CHAR,5,4),0)
,CONVERT(INT,SUBSTRING(TC.TTDC_CHAR,1,2),0)
,CONVERT(INT,SUBSTRING(TC.TTDC_CHAR,3,2),0)
)
FROM #TBL_NON_ISO_DATE_CONVERTION AS TC;
INSERT INTO @timer(T_TEXT) VALUES('DATEFROMPARTS');
INSERT INTO @timer(T_TEXT) VALUES('DATEFROMPARTS2');
SELECT @DATE_BUCKET = DATEFROMPARTS(
SUBSTRING(TC.TTDC_CHAR,5,4)
,SUBSTRING(TC.TTDC_CHAR,1,2)
,SUBSTRING(TC.TTDC_CHAR,3,2)
)
FROM #TBL_NON_ISO_DATE_CONVERTION AS TC;
INSERT INTO @timer(T_TEXT) VALUES('DATEFROMPARTS2');
INSERT INTO @timer(T_TEXT) VALUES('Substring');
SELECT @DATE_BUCKET = CONCAT(SUBSTRING(TC.TTDC_CHAR,5,4), SUBSTRING(TC.TTDC_CHAR,1,4))
FROM #TBL_NON_ISO_DATE_CONVERTION AS TC;
INSERT INTO @timer(T_TEXT) VALUES('Substring');
SELECT T.T_TEXT,
DATEDIFF(MICROSECOND,MIN(T.T_TS),MAX(T.T_TS)) AS DURATION
FROM @timer T
GROUP BY T.T_TEXT
ORDER BY DURATION;
-------------------------------------------------------------
we travel not to escape life but for life not to escape us
Don't fear failure, fear regret.
February 25, 2019 at 1:54 pm
New week, new results. I ran the SQL on our Dev and Prod servers. At this point the 'DateFromParts2' is my winner. I'll start with that and see what happens.
-------------------------------------------------------------
we travel not to escape life but for life not to escape us
Don't fear failure, fear regret.
Viewing 11 posts - 16 through 25 (of 25 total)
You must be logged in to reply to this topic. Login to reply