Building BI from Scratch!

  • singhamitpal (4/19/2016)


    I have spoke to some people who claim themselves to be SQL Guru's. Anyways, below is what I found...

    1.Very few developers will be able to create a consistent stored procedure interface that works easily across applications. Usually this is because certain assumptions are made of that calling application

    2.Same goes for documenting all of those stored procedures

    3.Database servers are generally bottlenecked enough as it is. Putting the unnecessary load on them just further narrows this bottleneck. Intricate load balancing and beefy hardware will be required for anything with a decent amount of traffic

    4.SQL is just barely a programming language. I once had the pleasure of maintaining a scripting engine written as a T-SQL stored procedure. It was slow, nearly impossible to understand, and took days to implement what would have been a trivial extension in most languages

    5.What happens when you have a client that needs their database to run a different SQL server? You'll basically have to start from scratch -- You're very tied to your database. Same goes for when Microsoft decides to deprecate a few functions you use a couple hundred times across your stored procedures

    6.Source control is extremely difficult to do properly with stored procedures, more so when you have a lot of them

    7.Databases are hard to keep in sync. What about when you have a conflict of some sort between 2 developers that are working in the database at the same time? They'll be overwriting each others code not really aware of it, depending on your "development database" setup

    8.The tools are definitely less easy to work with, no matter which database engine you use.

    Just to summarize, no SQL "Guru" that I know would come to such conclusions. This sounds like a person who doesn't actually know much about databases of the front-end.

    Shifting gears, what really needs to happen is the folks on both sides of the fence need to cool their jets and start working with each other. I used to be a Front End Developer, went through being a Database Developer, have been in various management positions including Development Manger and Director of MIS, and have finally settled with my first love... large databases. I've seen it all and everyone needs to understand what the other folks do and start working together instead of all this damned self-serving posturing. I've never tolerated such arrogance and silly quibbling from either side in any of my shops.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • [font="Comic Sans MS"]"Database servers are generally bottlenecked enough as it is ..."[/font]

    That is an all-encompassing statement. Which has its limits.

    Forcing DB servers to disgorge the entire contents of a large table and relying on the client to select only the few rows of interest instead of requiring the server to return only the rows of interest *was* a sure way to bring the server-network-client/server app system to a crawl.

    I remember the bad old days where Crystal Reports in a client-server app did not allow passing parameters up to the SQL Server in order allow the SQL Server to return the few records of interest from a "large" table; CR would download the entire table and then itself filter out the unwanted rows (mostly all of them which were downloaded just to be discarded). Even for a table containing only a lowly 10,000 rows, the whole setup required *minutes* to generate a report - leaving the user to conclude that the system was "frozen" - unusable. Modernized version of CR which included proper parameter passing turned the report generation delay to mere seconds.

    Having said that, C# includes very efficient dictionary structures that can be used to process data way faster than the SQL Server can, if the network can handle the traffic load.

    As in most things in life, "it depends" is a more cautious approach to devising a properly-working system.

  • singhamitpal (4/20/2016)


    Thanks everyone for your responses, my intentions was not to offend anyone here. Just needed a direction in picking the right path. Thanks for sharing your knowledge & expertise. Appreciate the time out to answer all my questions.

    Cheers,

    A

    I don't think you offended anyone. People on this forum are just passionate about SQL which is more powerful than many people give it credit for.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Alan.B (4/21/2016)


    singhamitpal (4/20/2016)


    Thanks everyone for your responses, my intentions was not to offend anyone here. Just needed a direction in picking the right path. Thanks for sharing your knowledge & expertise. Appreciate the time out to answer all my questions.

    Cheers,

    A

    I don't think you offended anyone. People on this forum are just passionate about SQL which is more powerful than many people give it credit for.

    Agree.



    Alvin Ramard
    Memphis PASS Chapter[/url]

    All my SSC forum answers come with a money back guarantee. If you didn't like the answer then I'll gladly refund what you paid for it.

    For best practices on asking questions, please read the following article: Forum Etiquette: How to post data/code on a forum to get the best help[/url]

  • j-1064772 (4/21/2016)


    Having said that, C# includes very efficient dictionary structures that can be used to process data way faster than the SQL Server can, if the network can handle the traffic load.

    Would you care to elaborate on this?

    The way I'm reading it is that you believe that C# code is faster at processing data than SQL Server.

    Always? Or in certain circumstances?

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • The C# bit I got from a colleague at the office who has considerably more experience than I do.

    He was referring to the specific cases a dictionary structure so that already is not an absolute "C# is always better" thing.

    It will take me a while to get into more detail. Like a day or so.

  • Phil Parkin (4/21/2016)


    j-1064772 (4/21/2016)


    Having said that, C# includes very efficient dictionary structures that can be used to process data way faster than the SQL Server can, if the network can handle the traffic load.

    Would you care to elaborate on this?

    The way I'm reading it is that you believe that C# code is faster at processing data than SQL Server.

    Always? Or in certain circumstances?

    I guess it depends on what we mean by "processing data". Mass updating very large datasets is not what SQL Server excels at, regardless of the database recovery model. There have been situations in the past my ETL staging consisted of C++ programs performing transforms on GB sized text files with fixed width records, where only the final post-transform step involved bulk inserting the records into SQL Server. Doing this type of bulk update staging outside SQL Server minimizes transaction logging, which is the real performance killer for any RDMS.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • j-1064772 (4/21/2016)


    The C# bit I got from a colleague at the office who has considerably more experience than I do.

    He was referring to the specific cases a dictionary structure so that already is not an absolute "C# is always better" thing.

    It will take me a while to get into more detail. Like a day or so.

    It's OK, I know that there will be certain cases where C# performs better.

    I wanted to check whether you were being absolute, just in case. Could have been fun 🙂

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Just to argue the other side of the fence here since this is a community of amazing data professionals. 😀

    I am a data architect who build and manages a large data warehouse for analytical reporting. I frequently create small applications that pull data from third-parties and ingest said data into a data warehouse that eventually goes into a data mart for end user reporting.

    It's always interesting to hear the debates of where the logic should live. Personally, I think people are too harsh on one another and too controlling over things rather than just looking at the balance. Yes. All logic living in one place and cataloged nicely is going to be easier to manage and maintain. But, sometimes doing that means one system may be doing all the work where other systems could help.

    I typically explain this as having a team of people building a house. If I have all my workers just feeding wood to one person actually building the house, then it may take me a lot longer to get that house built if I had maybe 3 people working on different parts of the house, 1 person feeding wood and maybe the other ordering more wood to be used.

    That's why I like to distribute the process across a couple of systems based on the roles I want them to play on that team. Python in my particular case helps me prep the data that I know is likely not going to change. This is going to be the working data that most of us get from maybe the OLTP database we are pushing into our warehouse. Using Python to help with the following also helps me augment the ETL to make processing the data as it scales a lot faster:

    Examples

      Transforming data from JSON

      Dynamically pivoting data

      Flatting out the data

      Validation (e.g.: data was received)

      Distributed processing (SQL Server has no distributed processing)

      Advanced algorithmic processing

    Once that's done, having the critical business logic makes a lot of sense to be handled by other systems like the database. But of course, not having all that in control of one person is the con. Not having that one person do everything is the pitfall we always face. This is because as a team, we have created walls that don't allow us to work together seamlessly such as the ever old dev versus DBA sillyness we always find ourselves getting into.

    You will not see my crying if someone handed me a piece of data that was already processed and ready to go. You will only see me cry when we can't work together to reprocess the data as a team. If that's the only thing stopping you from allowing multiple systems to help process the data, then it's not the system that's the problem. It's the people.

  • Phil Parkin (4/21/2016)


    j-1064772 (4/21/2016)


    Having said that, C# includes very efficient dictionary structures that can be used to process data way faster than the SQL Server can, if the network can handle the traffic load.

    Would you care to elaborate on this?

    The way I'm reading it is that you believe that C# code is faster at processing data than SQL Server.

    Always? Or in certain circumstances?

    TBH, you can script up the world in SSIS or create some very complex SQL scripts, functions and whatnot. It's going to be horrid. Sometimes it's easier, more efficient and yes, even faster to do this outside of your database or ETL tools unless they can be integrated as so many are doing today with Python, R and so forth.

    Like for example, having to take an array of numbers ranging from a length of 5 characters to 1,000 characters, sort those numbers from highest -> lowest in sequence starting from positions left -> right where each sequential sort is logged to a table until all numbers have been sorted. An algorithm developed outside the ETL in a much suited language that can process that data faster may be a better approach to getting the business logic applied to the data where your database is ingested the final results.

  • singhamitpal (4/19/2016)


    Also, please suggest some BI tools( Qlickview, Tableau) that is easy to work with SQL Server.

    I personally have enjoyed MicroStrategy Server and Tableau Server. Both of the communities are pretty strong. PowerBI is gaining a lot of good features too if you want to if you want to stick close to Microsoft. We went with MicroStrategy because it allows us to provide reporting through a web portal and also allows similar MOLAP cubes to be developed that can be cached for faster reporting. Tableau does similar with caching now I think with REDIS.

    At the end of the day, they all have their pros and cons. The most important focus for most is picking something that allows you to centralize your reporting much like you would with a database for you data so someone can manage them while reducing the risk to your business.

  • About BI tools:

    I've run SSRS and been working 6 months with Tableau. They are (and are intended to be) very different products. Tableau wins hands down for interactive data visualizations that can be rapidly developed. I haven't worked much lately with Power BI, but it gets much more powerful every day it seems.

    SSRS, IMO, is fantastic for generating operational reports that are closely coupled to the database, where everyone agrees what the report is displaying.

    Tableau is much better for data discovery and exploring relationships in your data, for learning from what your data can tell you about your business.

    One aspect of Tableau that I cannot overstate: they have done a stellar job at bringing BI out of the IT shop and out to the end users -- the people who know the data best. They have outstanding support and a vast user community, public visualizations you can download and play with to replicate using your own data. I would think seriously about who you anticipate doing most of your BI report/visualization work.

    Good luck!

    Rich

  • Shifting gears a bit, it doesn't matter how effective any BI is if the term "BI" is an oxymoron when it comes to management.

    I worked for a company in the early days of PCs and had interfaced with their main frames to make life easier for downloads. I also setup the charts and graphs that they used to generate manually. To make a much longer story shorter, I saw a trend and took it up the chain of command even to the GM. The trend (and related predictions as to contract wins) said that we'd have to lay half the people off on a given month and year.

    No one listened because I wasn't a qualified CPA nor data analyst. Why I didn't even have a degree. What could I possibly know?

    Two years later, on the month and year that I predicted, the first wave of layoffs started. If I wasn't raising a family at the time, I'd have quit just on principle. It was so preventable but none of the people that were supposed to be doing the analysis ever gave any bad news.

    Like Granny used to say, "Figures can lie and liars figure".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 13 posts - 16 through 27 (of 27 total)

You must be logged in to reply to this topic. Login to reply