Automating a daily download from GitHub

Question

Automating a daily download from GitHub

Lynn Pettis

SSC Guru

Points: 442467
More actions
March 30, 2020 at 7:33 pm

#3739082

Does anyone have any experience automating a daily download of csv files from a HitHub repo? Any guidance and pointers would be welcome as I begin work on a project to down and import csv files from a GitHub repo into a SQL Server 2017 database I have been tasked with creating to support an R application reporting on COVID-19.
Please note that my current exposure to GitHub (Git in general) is severely limited and to using a GUI application so pointers and ideas are welcome.

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

frederico_fonseca SSCoach Points: 16197 More actions · Answer 1

git command line would be the way to go - I have limited experience so can't really give you good details but have a look at https://docs.gitlab.com/ee/gitlab-basics/start-using-git.html

from a brief reading you will need

git checkout branchname --to change to that branch locally

git pull remote branchname - to get the latest change from the remote branch to the local one

Phil Parkin SSC Guru Points: 247039 More actions · Answer 2

You'll have to clone the GitHub repo locally first. In the process of doing that, you'll also need to work out how your script will authenticate with GitHub (haven't used it for a few years, so can't help with specifics – maybe SSH).

From there, Frederico's commands should be a good start.

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 3

Well, the GitHub repo I need to access is a public repo so i am able to connect without having to authenticate. The R guys here are doing some kind of screen scraping/scripting process. Not sure if they automated the process or run it manually.

Phil Parkin SSC Guru Points: 247039 More actions · Answer 4

Even easier with a public repo. Can't see you having any trouble at all getting this automated, but post back if you do and we can probably knock up a script for you.

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 5

If I run into a problem I would welcome the help with a script, right now I would just like a push in the right direction to see if I can figure it out. I learn more by doing, put would like to avoid spinning my wheels too much looking down the wrong rabbit holes.

Phil Parkin SSC Guru Points: 247039 More actions · Answer 6

OK, here's a nudge.

Download and install Git for Windows (https://git-scm.com/download/win) (default installation options are fine)
In GitHub, find the repo you are interested in and click on the clone or download button and copy the URL (eg, https://github.com/github/covid-19-repo-data.git)
Open the Git CMD command shell window and navigate to the folder where you like to create a local clone of the public repo
Enter the following command

git clone https://github.com/github/covid-19-repo-data.git

This should create a local clone of the public repo, in the folder you have selected. All other Git commands are available to you now.

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 7

Phil Parkin wrote:

OK, here's a nudge.

<li style="list-style-type: none;">
Download and install Git for Windows (https://git-scm.com/download/win) (default installation options are fine)

<li style="list-style-type: none;">
In GitHub, find the repo you are interested in and click on the clone or download button and copy the URL (eg, https://github.com/github/covid-19-repo-data.git)

<li style="list-style-type: none;">
Open the Git CMD command shell window and navigate to the folder where you like to create a local clone of the public repo

<li style="list-style-type: none;">
Enter the following command

git clone https://github.com/github/covid-19-repo-data.git
This should create a local clone of the public repo, in the folder you have selected. All other Git commands are available to you now.

I may have been making this harder than it actually is (my normal mo by the way). I think I was coming around to what you were saying.

Basically, install Git on our server, create a local copy of the repo from which we are pulling our data, schedule a daily cmdline job to pull new/updated files, use that local repo as the source to import data from the CSV files.

Am I on the right track?

Phil Parkin SSC Guru Points: 247039 More actions · Answer 8

Am I on the right track?

Bang on!

Possible future refinement: if it's a large repo, you can optimise the PULL such that you only get the folder(s) you are interested in.

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 9

I really need to learn to start with the KISS Principle, but sometimes it is hard for this old dog to learn better habits.

Thank you Phil for keeping me from spinning my wheels more than I needed to on this.

Phil Parkin SSC Guru Points: 247039 More actions · Answer 10

Lynn Pettis wrote:

I really need to learn to start with the KISS Principle, but sometimes it is hard for this old dog to learn better habits.
Thank you Phil for keeping me from spinning my wheels more than I needed to on this.

Happy to help.