July 29, 2013 at 9:51 pm
Comments posted to this topic are about the item Flying high on the Big Data hot-air
Best wishes,
Phil Factor
July 30, 2013 at 1:03 am
I agree that we need to manage industries enthusiasm. However, the article is vague, inconclusive and non-informative.
July 30, 2013 at 1:54 am
For me personally, the part about integration of SQL Server and the R language was an eye opener.
Thanks a lot for that!
Wim
July 30, 2013 at 4:20 am
Sorry about that. Actually it was an editorial. I'm due to write a series of articles on the subject of Data Science to give details of statistical techniques for data professionals.
Best wishes,
Phil Factor
July 30, 2013 at 4:25 am
@Phil Factor
Where will these articles about data science be published?
I am quite interested
July 30, 2013 at 4:27 am
@wim.bekkens
Thanks for that.
For a simple example, take a look at this series that is now coming out on Simple-talk. It walks you through an example application that involves using R to report KPIs in a SQL Server database.
Creating a Business Intelligence Dashboard with R and ASP.NET MVC: Part 1[/url]
Creating a Business Intelligence Dashboard with R and ASP.NET MVC: Part 2[/url]
Best wishes,
Phil Factor
July 30, 2013 at 4:29 am
@wim.bekkens
SQLServerCentral, as a Stairway to Data Science. I'm working on it.
Best wishes,
Phil Factor
July 30, 2013 at 5:45 am
Phil - I thought it was a great article. 100% with you.
There is still a massive need for good concise summary of existing systems. I am continually disappointed by many members of the standard professions. They seem unable to see the advantages that could accrue to them if they could just tighten up their processes and the summary statistics they should be monitoring constantly.
cloudydatablog.net
July 30, 2013 at 6:04 am
Be it article or editorial, it's still a good summarization of the "third wave" so to speak of the database world. Right now the role of the data scientist is pretty specialized, but businesses are looking for ways to obtain that function without necessarily hiring someone with the academic credentials. They're looking for tools that enable employees who may not have all the formal training of a data scientist to perform data analysis. Imagine something akin to BIDS that can create flows for data analysis.
This is both a good and bad thing. It opens up more career opportunities for people who have a knack for analyzing data who may have little to no interest in being a full-fledged DBA. On the other hand, there are fundamentals of statistics and the associated inferential process (to say nothing of good data management) that simply must be understood to analyze data correctly. It isn't terribly difficult to calculate a standard deviation, a confidence interval or a p-value (in fact R makes them pathetically easy), but you really can't evaluate your results without a firm understanding of what those values are telling you about your data.
In addition to Phil's excellent comments, I would also suggest that even database professionals who aren't necessairly involved in the analysis end of things should at minimum be conversant with these fundamentals not only just to understand what the heck the data whiz-kid is talking about, but also to be able to better design and administer the back-end data systems needed for analysis.
I, too, look forward to the additional material on this subject.
____________
Just my $0.02 from over here in the cheap seats of the peanut gallery - please adjust for inflation and/or your local currency.
July 30, 2013 at 6:36 am
Some time back (too long, alas) I presented a method for running R functions within the database engine (sql). Alas, this was done with Postgres and the PL/R extension. I could find none for SQL Server.
Since Joe Conway's PL/R is relatively little code, and attaches to the Postgres engine through its embedded C/C++ user defined function extension (while at the sql function level it's identified as R language, it's really just C to Postgres).
At the time, and ever since, I've been puzzled why other engines don't provide similar?? Joe's created PL/R pretty much in his spare time. Not to denigrate the effort, by any means, only to point out that a full-blown development team isn't needed to provide such a hook to a database engine. Where are the others?
Recently, Oracle has done so. It's kind of buried, at least to those of us not Oracle hounds. Here's a link: https://blogs.oracle.com/R/entry/analyzing_big_data_using_the1
You'll need to visit, since they've disabled copy/paste. Proprietary software!!
Come on Steve, get with it!! (The Steve in Redmond, just to be clear).
July 30, 2013 at 7:13 am
Looking forward to the Stairway series. As an "accidental" DBA who loves this field I am always looking to learn more.
July 30, 2013 at 8:00 am
I am not a huge fan of random big data projects for useless over caffeinated marketing ideals.
I do like projects like a certain Health Industry project started in 2006 that has become invaluable and very relevant with our current reforms.
http://www.advisory.com/Technology/Crimson
I would dare to think that other industries, consumers, and regulators could find a use for similar collections of data and it's analysis. π
Anything worth doing is worth doing right since most things done wrong are worthless.
July 30, 2013 at 8:25 am
Phil Factor (7/30/2013)
@wim.bekkensThanks for that.
For a simple example, take a look at this series that is now coming out on Simple-talk. It walks you through an example application that involves using R to report KPIs in a SQL Server database.
Creating a Business Intelligence Dashboard with R and ASP.NET MVC: Part 1[/url]
Creating a Business Intelligence Dashboard with R and ASP.NET MVC: Part 2[/url]
Phill,
R was cool but unless you are working at the NIS isn't it dated?
When compared to some of the newer and highly maintained Graphical Stat Display tools such as the free Sigma Plot MySystat http://www.systat.com/MystatProducts.aspx R feels like an amber screen from the 80's.
Donβt get me wrong. It was awesome and I used it.
Now there are IMHO better tools that require less heavy lifting.
If anyone is interested in a free large scale Database solution used by Netflix, Twitter, eBay, reddit, Cisco, etc... http://cassandra.apache.org/ π
July 30, 2013 at 9:07 am
@rod At Work
The R programming language is a GNU open-source project designed for statistical analysis of data. It supports a huge number of extensions in user-created packages that give it a bewildering versatility.
See http://en.wikipedia.org/wiki/R_programming_language
Although I can sympathise with PHYData DBA I reckon that it is important to take R seriously, whatever else you use as well. Not only is it valuable purely for analysis of variance and factor analysis but it is widely used in universities for the teaching of parametric statistics. There are a huge number of resources, Books, samples and videos around. It isn't going away!
Best wishes,
Phil Factor
Viewing 15 posts - 1 through 15 (of 24 total)
You must be logged in to reply to this topic. Login to reply