Building BI from Scratch!

  • Hey guys

    I need some expert advice in building up our BI infrastructure from scratch, the raw data come from the 3rd party every week and we need to push this data into the SQL server after cleanup process. We have to apply some business rules before pushing the data into the database, and only relevant data needs to be stored that adhere reporting.

    So I am wondering, do we have to use another programming language( VB.net, ASP, C#, Python etc) to write our business logic or we can do this all within SQL server. Don't know much about this piece, What kinds of criteria should help drive the decision about whether to put application logic in the database or in the application code.

    Also, please suggest some BI tools( Qlickview, Tableau) that is easy to work with SQL Server.

    Thanks a lot in advance.

    Cheers

    A

  • The entire ETL process can be done using SSIS. For OLAP cubes, SSAS, and for reporting and dashboards, SSRS.

    I'd reccomend this book, which coveres everything you described end to end at a beginner to intermediate level:

    Microsoft SQL Server 2014 Business Intelligence Development Beginners Guide Paperback - by Reza Rad

    http://www.amazon.com/Microsoft-Business-Intelligence-Development-Beginners/dp/1849688885

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Can't really answer all your questions...

    If you have something other than a free version of SQL Server, use SSIS to clean your data prior to writing it to your database.

    Tools... kind of depends on what your users like, to some degree. If your users are Excel experts, you could use it.

    If you want to build a quick prototype, you could build a data model in Excel or PowerBI Desktop - they both use DAX under the covers, so Rob Collie's book would be a good investment.

  • Thanks for the response...

    I’VE have been advised to stay away from the stored procedures meaning the business logic should NOT be stored in stored procedures, except where really really necessary as it lacks OO functionality.

  • singhamitpal (4/19/2016)


    Thanks for the response...

    I’VE have been advised to stay away from the stored procedures meaning the business logic should NOT be stored in stored procedures, except where really really necessary as it lacks OO functionality.

    Was the person doing the advising a C# programmer, by any chance?

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Phil Parkin (4/19/2016)


    singhamitpal (4/19/2016)


    Thanks for the response...

    I’VE have been advised to stay away from the stored procedures meaning the business logic should NOT be stored in stored procedures, except where really really necessary as it lacks OO functionality.

    Was the person doing the advising a C# programmer, by any chance?

    I suspect he/she probably was.



    Alvin Ramard
    Memphis PASS Chapter[/url]

    All my SSC forum answers come with a money back guarantee. If you didn't like the answer then I'll gladly refund what you paid for it.

    For best practices on asking questions, please read the following article: Forum Etiquette: How to post data/code on a forum to get the best help[/url]

  • singhamitpal (4/19/2016)


    Thanks for the response...

    I’VE have been advised to stay away from the stored procedures meaning the business logic should NOT be stored in stored procedures, except where really really necessary as it lacks OO functionality.

    Where have you been advised that business logic should exist?

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • I have spoke to some people who claim themselves to be SQL Guru's. Anyways, below is what I found...

    1.Very few developers will be able to create a consistent stored procedure interface that works easily across applications. Usually this is because certain assumptions are made of that calling application

    2.Same goes for documenting all of those stored procedures

    3.Database servers are generally bottlenecked enough as it is. Putting the unnecessary load on them just further narrows this bottleneck. Intricate load balancing and beefy hardware will be required for anything with a decent amount of traffic

    4.SQL is just barely a programming language. I once had the pleasure of maintaining a scripting engine written as a T-SQL stored procedure. It was slow, nearly impossible to understand, and took days to implement what would have been a trivial extension in most languages

    5.What happens when you have a client that needs their database to run a different SQL server? You'll basically have to start from scratch -- You're very tied to your database. Same goes for when Microsoft decides to deprecate a few functions you use a couple hundred times across your stored procedures

    6.Source control is extremely difficult to do properly with stored procedures, more so when you have a lot of them

    7.Databases are hard to keep in sync. What about when you have a conflict of some sort between 2 developers that are working in the database at the same time? They'll be overwriting each others code not really aware of it, depending on your "development database" setup

    8.The tools are definitely less easy to work with, no matter which database engine you use.

  • Gee.... this last post would warrant a holy war and a thread on its own.

    Regarding the original post.

    Before deciding on a ETL strategy you need to focus on what are the requirements of the ETL process.

    things to consider

    Number of sources

    type of sources - Flat files, Soap requests, direct database extracts etc

    Volume of data per source - and consider if always full loads or incremental loads

    Data validation

    Data cleaning - what needs to be cleaned and how - this is business rule

    Data enhancement - what needs to be enhanced and how - this is business rule

    SLA to deliver the final data - this may have a significant impact on the tool to use

    And...

    is this to be a in-house solution or a tool to supply to other clients which may choose a different relational database vendor?

    Application logic - some of the above may also determine whether you need a totally database agnostic layer or if you need to "bite the bullet" and stick with a tool that attaches you to a particular database vendor.

    From a ETL perspective the application logic is normally, but not always, quite simple when compared to a full user application but without full details of what needs to be done and volumes its hard to say if you should do some or all of the logic on the database side (using SQL) or on the application side.

    Regarding BI tools to use - again you need to decide if you are going to be agnostic of the database or not - there are plenty of BI tools on the market and you already mentioned 2 of them. Mainly targeting SQL Server you could also use Power BI (Desktop or otherwise)

    Now to your last post a few comments of my own.

    1.Very few developers will be able to create a consistent stored procedure interface that works easily across applications. Usually this is because certain assumptions are made of that calling application

    Same can be said to any developer in any other language - if the design is badly done then the interface with the other applications will be poor - again many C# developers for example make assumptions about the applications that will call the objects they design.

    If someone is developing a stored procedure to be called by an application then whoever is doing the SP needs to discuss with the application developer and agree the design and interface needs.

    2.Same goes for documenting all of those stored procedures

    Hum... as if ALL C# developers also document their code.

    lack of documentation of code is common to developers in ANY language - stating this about SQL developers is just lame.

    I've seen lots of code done by different developers, including some here on this forum, that was extremely well documented including examples of how to call each proc and its different options.

    3.Database servers are generally bottlenecked enough as it is. Putting the unnecessary load on them just further narrows this bottleneck. Intricate load balancing and beefy hardware will be required for anything with a decent amount of traffic.

    Servers can be bottlenecked regardless of being a database server or not - load balancing is also used for application layers, and I have seem more instances where this was used for application layer than database layer. Not an excuse not to do ETL on the database server - and take in consideration that in many heavy ETL setups there is a database server just to do the ETL process (even know of a company with a farm of ETL servers)

    4.SQL is just barely a programming language. I once had the pleasure of maintaining a scripting engine written as a T-SQL stored procedure. It was slow, nearly impossible to understand, and took days to implement what would have been a trivial extension in most languages

    Again many of us have seen C# and other code that was extremely complex and could have been easily done in T-SQL. One should use the correct tool for the task at hand, and sometimes SQL is that tool.

    5.What happens when you have a client that needs their database to run a different SQL server? You'll basically have to start from scratch -- You're very tied to your database. Same goes for when Microsoft decides to deprecate a few functions you use a couple hundred times across your stored procedures

    This can also be said about any solution you pick to do the ETL. For example you decide to code in C# using LINQ - what happens when support for LINQ is removed or changed and your application crashes? What if the client decides to move to Unix with Oracle or SQL Server and your code fails to work if not running on a Windows machine? or you decide to use a third party control and the supplier stops support of that tool?

    6.Source control is extremely difficult to do properly with stored procedures, more so when you have a lot of them

    Why is it difficult? its just a text file as any .cs or .java or .js script? what makes it harder to control?

    7.Databases are hard to keep in sync. What about when you have a conflict of some sort between 2 developers that are working in the database at the same time? They'll be overwriting each others code not really aware of it, depending on your "development database" setup

    Hum... if you design your sync process properly they are as easy as it can be - regarding the conflicts key word here is "setup" the databases environments so conflicts wont happen - same way you can have code conflicts if you developing web applications and all developers need to deploy their code to the same IIS server. again this is just a excuse for not setting up the environments properly.

    8.The tools are definitely less easy to work with, no matter which database engine you use.

    no real comment on this one - some tools are easy some are difficult regardless of what the tool is used for

  • singhamitpal (4/19/2016)


    I have spoke to some people who claim themselves to be SQL Guru's. Anyways, below is what I found...

    1.Very few developers will be able to create a consistent stored procedure interface that works easily across applications. Usually this is because certain assumptions are made of that calling application

    2.Same goes for documenting all of those stored procedures

    3.Database servers are generally bottlenecked enough as it is. Putting the unnecessary load on them just further narrows this bottleneck. Intricate load balancing and beefy hardware will be required for anything with a decent amount of traffic

    4.SQL is just barely a programming language. I once had the pleasure of maintaining a scripting engine written as a T-SQL stored procedure. It was slow, nearly impossible to understand, and took days to implement what would have been a trivial extension in most languages

    5.What happens when you have a client that needs their database to run a different SQL server? You'll basically have to start from scratch -- You're very tied to your database. Same goes for when Microsoft decides to deprecate a few functions you use a couple hundred times across your stored procedures

    6.Source control is extremely difficult to do properly with stored procedures, more so when you have a lot of them

    7.Databases are hard to keep in sync. What about when you have a conflict of some sort between 2 developers that are working in the database at the same time? They'll be overwriting each others code not really aware of it, depending on your "development database" setup

    8.The tools are definitely less easy to work with, no matter which database engine you use.

    I find it difficult to believe that any real 'SQL Guru' would say such things about the very product they have invested years of their time in. Your statements are all generalisations which refer to scenarios which are possible, but most unlikely under the stewardship of a 'SQL Guru'.

    Database servers are generally bottlenecked enough as it is

    Not in my experience.

    I once had the pleasure of maintaining a scripting engine written as a T-SQL stored procedure

    I wouldn't write an UPDATE query in Assembly language either. That doesn't make Assembly language bad. Use the right tool for the job. T-SQL is a database manipulation language, not a 'scripting engine' (whatever you mean by that).

    What happens when you have a client that needs their database to run a different SQL server?

    This is a valid design decision. If you need your solution to run on lots of different database platforms, you should minimise reliance on platform-specific features.

    Same goes for when Microsoft decides to deprecate a few functions you use a couple hundred times across your stored procedures

    Can't remember this ever being a problem, and I've been using SQL Server since 2001.

    Source control is extremely difficult to do properly with stored procedures, more so when you have a lot of them

    This is just plain wrong. Stored proc source control is a piece of cake, with all of the tooling now available in Visual Studio.

    What about when you have a conflict of some sort between 2 developers that are working in the database at the same time?

    Correct use of source control avoids this.

    The tools are definitely less easy to work with, no matter which database engine you use.

    Less easy than what?

    Overall, I find it strange that you started the thread with a neutral, open question and then weighed in with a whole raft of opinionated anti-SQL Server statements afterwards. Is there a hidden agenda here?

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • singhamitpal (4/19/2016)


    Thanks for the response...

    I’VE have been advised to stay away from the stored procedures meaning the business logic should NOT be stored in stored procedures, except where really really necessary as it lacks OO functionality.

    Within the context of ETL, meta-data driven business rules (ie: user configurable and versioned reference tables) are what you want. In my opinion, the ETL process should not be functionally dependent on the application or middle tier. The application should simply be a tool for meta-data configuration, dashboards, and reporting.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • singhamitpal (4/19/2016)


    I have spoke to some people who claim themselves to be SQL Guru's. Anyways, below is what I found...

    1.Very few developers will be able to create a consistent stored procedure interface that works easily across applications. Usually this is because certain assumptions are made of that calling application

    2.Same goes for documenting all of those stored procedures

    3.Database servers are generally bottlenecked enough as it is. Putting the unnecessary load on them just further narrows this bottleneck. Intricate load balancing and beefy hardware will be required for anything with a decent amount of traffic

    4.SQL is just barely a programming language. I once had the pleasure of maintaining a scripting engine written as a T-SQL stored procedure. It was slow, nearly impossible to understand, and took days to implement what would have been a trivial extension in most languages

    5.What happens when you have a client that needs their database to run a different SQL server? You'll basically have to start from scratch -- You're very tied to your database. Same goes for when Microsoft decides to deprecate a few functions you use a couple hundred times across your stored procedures

    6.Source control is extremely difficult to do properly with stored procedures, more so when you have a lot of them

    7.Databases are hard to keep in sync. What about when you have a conflict of some sort between 2 developers that are working in the database at the same time? They'll be overwriting each others code not really aware of it, depending on your "development database" setup

    8.The tools are definitely less easy to work with, no matter which database engine you use.

    None of the arguments you've presented above would apply to SQL Server or relational database implementation in general. If your organization struggles to consistently write or docmuent stored procedures, or if your database can't be versioned, scaled across multiple servers, kept in sync, or are generally perceived as a bottleneck, then you need to fire your developers and replace them with folks who know proper database architecture and design. SQL suffer from the same limitation as HTML or english: there are many who use it daily but there are few who understand how to properly apply the language. I personally do things with SQL Server that are broadly considered impossible, and if I can do it, then so can anyone else. I know I probably come across as a SQL bigot.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • singhamitpal (4/19/2016)


    I have spoke to some people who claim themselves to be SQL Guru's. Anyways, below is what I found...

    1.Very few developers will be able to create a consistent stored procedure interface that works easily across applications. Usually this is because certain assumptions are made of that calling application

    2.Same goes for documenting all of those stored procedures

    3.Database servers are generally bottlenecked enough as it is. Putting the unnecessary load on them just further narrows this bottleneck. Intricate load balancing and beefy hardware will be required for anything with a decent amount of traffic

    4.SQL is just barely a programming language. I once had the pleasure of maintaining a scripting engine written as a T-SQL stored procedure. It was slow, nearly impossible to understand, and took days to implement what would have been a trivial extension in most languages

    5.What happens when you have a client that needs their database to run a different SQL server? You'll basically have to start from scratch -- You're very tied to your database. Same goes for when Microsoft decides to deprecate a few functions you use a couple hundred times across your stored procedures

    6.Source control is extremely difficult to do properly with stored procedures, more so when you have a lot of them

    7.Databases are hard to keep in sync. What about when you have a conflict of some sort between 2 developers that are working in the database at the same time? They'll be overwriting each others code not really aware of it, depending on your "development database" setup

    8.The tools are definitely less easy to work with, no matter which database engine you use.

    Sounds like you got bad advice from people who don't know much about modern database systems.



    Alvin Ramard
    Memphis PASS Chapter[/url]

    All my SSC forum answers come with a money back guarantee. If you didn't like the answer then I'll gladly refund what you paid for it.

    For best practices on asking questions, please read the following article: Forum Etiquette: How to post data/code on a forum to get the best help[/url]

  • Thanks everyone for your responses, my intentions was not to offend anyone here. Just needed a direction in picking the right path. Thanks for sharing your knowledge & expertise. Appreciate the time out to answer all my questions.

    Cheers,

    A

  • I've built BI solutions from the ground up for years. One thing I'd like to point out relevant to the previous conversation is that the project I've been on for the last year has involved re-writing of many ETL processes in SSIS that were originally done in C#. These processes went from taking hours to minutes and are much easier to maintain and understand in SSIS. Here's what I'd recommend for a new BI build:

    1. Create packages to move data from source to source similar staging tables or ODS database in SQL Server.

    2. Develop dimensional model that fit your analytics/reporting requirements.

    3. Create package(s) to move data from staging to your target model. This is where you put the business logic. The main difference between OLTP models and OLAP type dimensional models is where the business logic lives. In a traditional OLTP system that's been normalized, the business rules are enforced by the database design, schema, constraints, etc. In a dimensional model, there are still relationships between the tables (mainly for the query optimizer's benefit), but the bulk of your business rules are implementing in the ETL layer. The Transform portion of the ETL acronym represents the application of your business logic/rules on the data before it hits your analytics platform. You want the bulk of your data manipulations to happen in ETL as opposed to in queries, views, or cubes that hit your reporting/analytics model.

    4. Create analytics/reporting off model. Here's where you decide how you want to manager reporting/analytics/data mining. The sky's the limit as to what/how you want to do this, but since you're using SQL Server, I'd recommend looking into the entire Microsoft BI Stack offering. Get familiar with PowerPivot, SSAS, SSRS, SharePoint, PerformancePoint, Power BI. Your final solution will likely consist of two or more items from this list.

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

Viewing 15 posts - 1 through 15 (of 27 total)

You must be logged in to reply to this topic. Login to reply