Automating a daily download from GitHub

  • Does anyone have any experience automating a daily download of csv files from a HitHub repo?  Any guidance and pointers would be welcome as I begin work on a project to down and import csv files from a GitHub repo into a SQL Server 2017 database I have been tasked with creating to support an R application reporting on COVID-19.

    Please note that my current exposure to GitHub (Git in general) is severely limited and to using a GUI application so pointers and ideas are welcome.

  • git command line would be the way to go - I have limited experience so can't really give you good details but have a look at https://docs.gitlab.com/ee/gitlab-basics/start-using-git.html

    from a brief reading you will need

    git checkout branchname --to change to that branch locally

    git pull remote branchname - to get the latest change from the remote branch to the local one

  • You'll have to clone the GitHub repo locally first. In the process of doing that, you'll also need to work out how your script will authenticate with GitHub (haven't used it for a few years, so can't help with specifics – maybe SSH).

    From there, Frederico's commands should be a good start.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Well, the GitHub repo I need to access is a public repo so i am able to connect without having to authenticate.  The R guys here are doing some kind of screen scraping/scripting process.  Not sure if they automated the process or run it manually.

     

  • Even easier with a public repo. Can't see you having any trouble at all getting this automated, but post back if you do and we can probably knock up a script for you.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • If I run into a problem I would welcome the help with a script, right now I would just like a push in the right direction to see if I can figure it out.  I learn more by doing, put would like to avoid spinning my wheels too much looking down the wrong rabbit holes.

  • OK, here's a nudge.

    1. Download and install Git for Windows (https://git-scm.com/download/win) (default installation options are fine)
    2. In GitHub, find the repo you are interested in and click on the clone or download button and copy the URL (eg, https://github.com/github/covid-19-repo-data.git)
    3. Open the Git CMD command shell window and navigate to the folder where you like to create a local clone of the public repo
    4. Enter the following command

    git clone https://github.com/github/covid-19-repo-data.git

    This should create a local clone of the public repo, in the folder you have selected. All other Git commands are available to you now.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Phil Parkin wrote:

    OK, here's a nudge.

     

      <li style="list-style-type: none;">

    1. Download and install Git for Windows (https://git-scm.com/download/win) (default installation options are fine)

     

      <li style="list-style-type: none;">

    1. In GitHub, find the repo you are interested in and click on the clone or download button and copy the URL (eg, https://github.com/github/covid-19-repo-data.git)

     

      <li style="list-style-type: none;">

    1. Open the Git CMD command shell window and navigate to the folder where you like to create a local clone of the public repo

     

      <li style="list-style-type: none;">

    1. Enter the following command

     

    git clone https://github.com/github/covid-19-repo-data.git

    This should create a local clone of the public repo, in the folder you have selected. All other Git commands are available to you now.

     

    I may have been making this harder than it actually is (my normal mo by the way).  I think I was coming around to what you were saying.

    Basically, install Git on our server, create a local copy of the repo from which we are pulling our data, schedule a daily cmdline job to pull new/updated files, use that local repo as the source to import data from the CSV files.

    Am I on the right track?

     

  • Am I on the right track?

    Bang on!

    Possible future refinement: if it's a large repo, you can optimise the PULL such that you only get the folder(s) you are interested in.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • I really need to learn to start with the KISS Principle, but sometimes it is hard for this old dog to learn better habits.

    Thank you Phil for keeping me from spinning my wheels more than I needed to on this.

     

  • Lynn Pettis wrote:

    I really need to learn to start with the KISS Principle, but sometimes it is hard for this old dog to learn better habits.

    Thank you Phil for keeping me from spinning my wheels more than I needed to on this.

    Happy to help.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply