Analyzing Formula 1 results in R

  • Comments posted to this topic are about the item Analyzing Formula 1 results in R

  • Thomas Hütter - Monday, August 20, 2018 11:28 PM

    Comments posted to this topic are about the item Analyzing Formula 1 results in R

    Nice article, Thomas.
    I must admit that I was expecting/hoping to see something different, though.
    I was hoping that you would 
    a) Import the data into a SQL Server table
    b) Use sp_execute_external_script to run the various R scripts against the imported data.

    I'd be particularly interested to know whether the plotting options available in R Studio can somehow be used on data held in SQL Server, without the need for moving the data out of SQL Server.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • Phil Parkin - Tuesday, August 21, 2018 6:00 AM

    Nice article, Thomas.
    I must admit that I was expecting/hoping to see something different, though.
    I was hoping that you would 
    a) Import the data into a SQL Server table
    b) Use sp_execute_external_script to run the various R scripts against the imported data.

    I'd be particularly interested to know whether the plotting options available in R Studio can somehow be used on data held in SQL Server, without the need for moving the data out of SQL Server.

    I'm out of practice Phil, but I'm sure it can, you used to be able to use an ODBC connection to do just that. the package I was using for that went bust and I haven't found a new one that works for me with my version yet, but I don't play with it very much.

    -------------------------------------------------------------------------------------------------------------------------------------
    Please follow Best Practices For Posting On Forums to receive quicker and higher quality responses

  • jonathan.crawford - Tuesday, August 21, 2018 7:39 AM

    Phil Parkin - Tuesday, August 21, 2018 6:00 AM

    Nice article, Thomas.
    I must admit that I was expecting/hoping to see something different, though.
    I was hoping that you would 
    a) Import the data into a SQL Server table
    b) Use sp_execute_external_script to run the various R scripts against the imported data.

    I'd be particularly interested to know whether the plotting options available in R Studio can somehow be used on data held in SQL Server, without the need for moving the data out of SQL Server.

    I'm out of practice Phil, but I'm sure it can, you used to be able to use an ODBC connection to do just that. the package I was using for that went bust and I haven't found a new one that works for me with my version yet, but I don't play with it very much.

    As far as I know, the ODBC connection from R Studio to SQL Server facilitates the import of SQL Server data into R Studio prior to running R scripts against that data.

    But my knowledge is just at beginner level and I may be wrong. If I am wrong, and R Studio can query SQL Server data directly, it almost makes redundant the work which MS did to integrate R into SQL Server.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • Phil Parkin - Tuesday, August 21, 2018 6:00 AM

    Nice article, Thomas.
    I must admit that I was expecting/hoping to see something different, though.
    I was hoping that you would 
    a) Import the data into a SQL Server table
    b) Use sp_execute_external_script to run the various R scripts against the imported data.

    I'd be particularly interested to know whether the plotting options available in R Studio can somehow be used on data held in SQL Server, without the need for moving the data out of SQL Server.

    Thanks Phil, and I apologize for raising false hopes there. I use RStudio for 99% of my R stuff, for three reasons:

    1. the use of "sp_execute_external_script" in SQL Server Management Studio is a bit clumsy for my taste,
    2. I don't use R functionality in any production environment so far, so I haven't had the need to handle "big" data,
    3. for private purposes and for sidekick projects, I use a Mac (which doesn't natively run the full-featured SQL Server).
    Do I understand you correctly: you would like to see my example code run inside SSMS, including the plot output?

  • Hi Thomas,

    Very nice article, thank you.
    I just wanted to add my voice to the others who wanted to see all of this done from within SSMS, i.e. making use of the R integration.

  • Thomas Hütter - Tuesday, August 21, 2018 8:16 AM

    Phil Parkin - Tuesday, August 21, 2018 6:00 AM

    Nice article, Thomas.
    I must admit that I was expecting/hoping to see something different, though.
    I was hoping that you would 
    a) Import the data into a SQL Server table
    b) Use sp_execute_external_script to run the various R scripts against the imported data.

    I'd be particularly interested to know whether the plotting options available in R Studio can somehow be used on data held in SQL Server, without the need for moving the data out of SQL Server.

    Thanks Phil, and I apologize for raising false hopes there. I use RStudio for 99% of my R stuff, for three reasons:

    1. the use of "sp_execute_external_script" in SQL Server Management Studio is a bit clumsy for my taste,
    2. I don't use R functionality in any production environment so far, so I haven't had the need to handle "big" data,
    3. for private purposes and for sidekick projects, I use a Mac (which doesn't natively run the full-featured SQL Server).
    Do I understand you correctly: you would like to see my example code run inside SSMS, including the plot output?

    It would be the icing on the cake for your article, yes. But I do not think that it can be done, based on my current rudimentary knowledge.

    If there is any way at all of being able to run R scripts against data stored natively in SQL Server & then plotting the results, that would be a huge win. Without importing, of course.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • OK folks, please take into account that I am in a different time zone, and have to finish my day work first. 😀
    I'll get back to you with a SQL script (or probably two) in the evening (European time...).

  • Maybe a second article?

  • Thomas Hütter - Tuesday, August 21, 2018 8:30 AM

    OK folks, please take into account that I am in a different time zone, and have to finish my day work first. 😀
    I'll get back to you with a SQL script (or probably two) in the evening (European time...).

    Thomas, please don't feel the need to rush things ... I did not intend to appear at all demanding.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • Ha ha, no worries - I was prepared to answer some questions today. And rest assured, I enjoy what I am doing here. 😉

    OK, so to make my example work in SSMS, and answer Phil's questions, I produced two (+ one) SQL scripts. I had to rename them to .txt to upload them here, so you'll probably have to rename them back to the .sql extension before using them. And there are some caveats: you have to install the needed R packages into your SQL Server R service first (use my installpkg script). But in order to accomplish that, you have to have write permissions to the library folder, in my case of a standard installation, that is "C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\R_SERVICES\library". I'll leave it up to you to sort that out.

    Then, you can run f1getdata in SSMS. If all is well, this will produce a SQL table called f1table, that holds the brushed up long version of the formula 1 data. Check if you have 189 records in there. The INSERT INTO statement on the SQL side takes the output of the "EXEC sp_execute_external_script" call. That is the way to persist data from your R script into SQL Server. (As I ran into some character encoding issues in the second script, I implemented a workaround converting umlauts - that's the two UPDATE statements at the bottom of the first script. )

    The f1plotdata script needed some additions to the R code. This is because I do not know a feasible, easy way to show R plots within SSMS. I prepare the two plots to be saved in separate .png files in the Temp directory. I hope it is easy to spot the principle how to embed a SELECT statement into a variable and then use it as input to the R script by assigning it to the @input_data_1 parameter. After running this script, you should find Plot1.png and Plot2.png in your C:\Temp folder.

    I hope I could shed some light on how R scripts work in SSMS...?

  • Phil Parkin - Tuesday, August 21, 2018 8:26 AM

    Thomas Hütter - Tuesday, August 21, 2018 8:16 AM

    Phil Parkin - Tuesday, August 21, 2018 6:00 AM

    Nice article, Thomas.
    I must admit that I was expecting/hoping to see something different, though.
    I was hoping that you would 
    a) Import the data into a SQL Server table
    b) Use sp_execute_external_script to run the various R scripts against the imported data.

    I'd be particularly interested to know whether the plotting options available in R Studio can somehow be used on data held in SQL Server, without the need for moving the data out of SQL Server.

    Thanks Phil, and I apologize for raising false hopes there. I use RStudio for 99% of my R stuff, for three reasons:

    1. the use of "sp_execute_external_script" in SQL Server Management Studio is a bit clumsy for my taste,
    2. I don't use R functionality in any production environment so far, so I haven't had the need to handle "big" data,
    3. for private purposes and for sidekick projects, I use a Mac (which doesn't natively run the full-featured SQL Server).
    Do I understand you correctly: you would like to see my example code run inside SSMS, including the plot output?

    It would be the icing on the cake for your article, yes. But I do not think that it can be done, based on my current rudimentary knowledge.

    If there is any way at all of being able to run R scripts against data stored natively in SQL Server & then plotting the results, that would be a huge win. Without importing, of course.

    Ideally, what you would do is run the R scripts on the data in SQL Server in order to extract the coefficients (the model) to a physical table. Then you can combine this with the data in SQL Server via a view and plot the data directly. Far as I know, Azure DB nor say, Azure ML has this ability.

    And just for clarity, you don't just do this because it's "Big Data". You do this so you can make dynamic forecasting more fluid on the source data as well allow you to repeat the process in a more consistent fashion (i.e.: move from ad-hoc R analysis to enterprise).

    You also would move away from ggplot and move more into something like Power BI or Tableau. RStudio is only good for script/adhoc analysis.

  • xsevensinzx - Tuesday, August 21, 2018 5:33 PM

    RStudio is only good for script/adhoc analysis.

    Thanks for confirming - that's exactly what my article is about: giving a small, yet reproducible example for data analysis using R.
    No model, no predictions, no data that would be too big for my MacBook's RAM. 😉

  • Be good to see a second article doing this in SQL Server 😉

    If any of you would like to see it,  encourage Thomas to write more.

  • Good work! I am quite an enthusiast of both F1 and SQL Server. Guess which one pays the bills. My company does use R, yet I have not had the opportunity to work myself into an R project. Looking forward to future articles.

    I happened to be quite pleased that the topic was in fact based around Formula 1. Then, a bit disappointed that you did not somehow uncover share a mountain of F1 telemetry data to play with.

Viewing 15 posts - 1 through 15 (of 22 total)

You must be logged in to reply to this topic. Login to reply