January 4, 2011 at 1:39 pm
I've got a flat file that was intended to be loaded into SAS, but we don't have SAS and need to analyze it somehow, so our first stab will be SSAS (edit: notice the difference, SAS vs. SSAS). Since the data is already pivoted, it seems ripe for loading into a cube.
As far as I can tell,
1. SSAS has no limit on the number of columns a table can have (unlike MSSQL's 1024 column limit).
2. SSIS appears to be the only way to load flat file data directly into SSAS
3. I don't understand exactly what Data Mining Model Training task is in SSIS, and whether I should use it or not
Questions I have are:
(1) Do my assumptions make sense? I've not used SSAS specifically and my previous experiences with SSIS were not that great.
(2) What is the right way to load flat file data into a cube? Do I have to do some workaround like first load 800 columns and 800 columns into two different SQL Server database tables, and then build the cube off that?
January 5, 2011 at 3:24 am
First of all, forget about the data mining task. It has nothing to do with your requirements.
You must first load the data from the flat files into a relational database (preferably SQL Server of course), since you cannot load the data directly into SSAS. You do this with SSIS. Preferably you do some data cleansing in your SSIS package.
Then create a SSAS project and create a datasource to your database. In the data source view you can select the tables and columns that you want. After this you can create your dimensions and cubes.
So, better crank up your SSIS and SSAS skills 🙂
If it seems a hassle since your data is already been pivoted, you can try to load it directly into Excel and do your analysis there.
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
January 13, 2011 at 2:06 pm
Thanks for the data mining task tip.
I ended up being able to avoid SSIS, and not seeing any usefulness to SSIS.
I just did BULK INSERT with the filename, and no format file, and then once the data was loaded in a staging table, I used a Tally Table and a pivot to split the columns. This gave me precise control and ability to pivot the data so that ~800 columns where in one table and ~800 columns were in another table, both under the 1024 column limit. From there I could normalize to my heart's content by unpivoting both tables.
As for trying to use SSIS, it doesn't help that one of the books I own on SSIS says things like:
"By the time that SQL Server 2000 was released, DTS had a strong following of DBAs and developers. Microsoft included in the release new features like the Dynamic Properties task to help you dynamically alter the package at runtime. It also had extended logging and broke a transformation into many phases, called the multiphase data pump. Usability studies still showed that at this point developers had to create elaborate scripts to extend DTS to do what they wanted. For example, if you wanted DTS to conditionally load data based on the existence of a file, you would have to use the ActiveX Script task and VBScript to dynamically do this. The problem here was that most DBAs didn't have this type of scripting experience.
It sort of sucks as a developer when you open up a book you bought on Amazon and the first thing it says is a red flag that indicates you are not its primary audience. In other words, I read this paragraph as, "DBAs aren't software engineers, and need hand holding." I think that lets the reader down, and is a cop out, and the authors were completely lazy in their justification. The harder question to answer is: What sort of scripts did DBAs have to write, and why were they so time consuming and are they actually very uniform and fit canned patterns that deserve automation? That question isn't insulting to DBAs and annoying to developers.
January 14, 2011 at 12:04 am
johnzabroski (1/13/2011)
As for trying to use SSIS, it doesn't help that one of the books I own on SSIS says things like:"By the time that SQL Server 2000 was released, DTS had a strong following of DBAs and developers. Microsoft included in the release new features like the Dynamic Properties task to help you dynamically alter the package at runtime. It also had extended logging and broke a transformation into many phases, called the multiphase data pump. Usability studies still showed that at this point developers had to create elaborate scripts to extend DTS to do what they wanted. For example, if you wanted DTS to conditionally load data based on the existence of a file, you would have to use the ActiveX Script task and VBScript to dynamically do this. The problem here was that most DBAs didn't have this type of scripting experience.
It sort of sucks as a developer when you open up a book you bought on Amazon and the first thing it says is a red flag that indicates you are not its primary audience. In other words, I read this paragraph as, "DBAs aren't software engineers, and need hand holding." I think that lets the reader down, and is a cop out, and the authors were completely lazy in their justification. The harder question to answer is: What sort of scripts did DBAs have to write, and why were they so time consuming and are they actually very uniform and fit canned patterns that deserve automation? That question isn't insulting to DBAs and annoying to developers.
They are talking about DTS, the predecessor of SSIS. The line about DBAs not having scripting experience is about ActiveX and VBScript. I thinks that is largely correct. Most DBAs are well versed in TSQL and maybe Powershell, but those other two languages are something totally else.
SSIS is intended as more of an Enterprise ETL solution. So DTS should actually never be compared with SSIS.
Developers (in the sense of coding applications in .NET or another language) can certainly work with SSIS and they are in familiar territory since the development is done in Visual Studio, but SSIS requires another way of thinking. It is mostly a graphical tool, instead of purely coding.
Now, the answer the hard question 🙂
DBAs, the core audience of DTS, had to write ActiveX and VBScript scripts - languages that a certain part of those DBAs were not comfortable with - in order to do certain tasks in DTS. These tasks can now easily be done in SSIS without too much coding effort.
I think those paragraph was intended to convince people to abandon DTS and to embrace SSIS.
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply