March 30, 2020 at 7:33 pm
Does anyone have any experience automating a daily download of csv files from a HitHub repo? Any guidance and pointers would be welcome as I begin work on a project to down and import csv files from a GitHub repo into a SQL Server 2017 database I have been tasked with creating to support an R application reporting on COVID-19.
Please note that my current exposure to GitHub (Git in general) is severely limited and to using a GUI application so pointers and ideas are welcome.
March 30, 2020 at 8:12 pm
git command line would be the way to go - I have limited experience so can't really give you good details but have a look at https://docs.gitlab.com/ee/gitlab-basics/start-using-git.html
from a brief reading you will need
git checkout branchname --to change to that branch locally
git pull remote branchname - to get the latest change from the remote branch to the local one
March 30, 2020 at 8:26 pm
You'll have to clone the GitHub repo locally first. In the process of doing that, you'll also need to work out how your script will authenticate with GitHub (haven't used it for a few years, so can't help with specifics – maybe SSH).
From there, Frederico's commands should be a good start.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
March 30, 2020 at 8:38 pm
Well, the GitHub repo I need to access is a public repo so i am able to connect without having to authenticate. The R guys here are doing some kind of screen scraping/scripting process. Not sure if they automated the process or run it manually.
March 30, 2020 at 8:45 pm
Even easier with a public repo. Can't see you having any trouble at all getting this automated, but post back if you do and we can probably knock up a script for you.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
March 31, 2020 at 3:52 am
If I run into a problem I would welcome the help with a script, right now I would just like a push in the right direction to see if I can figure it out. I learn more by doing, put would like to avoid spinning my wheels too much looking down the wrong rabbit holes.
March 31, 2020 at 2:14 pm
OK, here's a nudge.
git clone https://github.com/github/covid-19-repo-data.git
This should create a local clone of the public repo, in the folder you have selected. All other Git commands are available to you now.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
March 31, 2020 at 5:19 pm
OK, here's a nudge.
<li style="list-style-type: none;">
- Download and install Git for Windows (https://git-scm.com/download/win) (default installation options are fine)
<li style="list-style-type: none;">
- In GitHub, find the repo you are interested in and click on the clone or download button and copy the URL (eg, https://github.com/github/covid-19-repo-data.git)
<li style="list-style-type: none;">
- Open the Git CMD command shell window and navigate to the folder where you like to create a local clone of the public repo
<li style="list-style-type: none;">
- Enter the following command
git clone https://github.com/github/covid-19-repo-data.git
This should create a local clone of the public repo, in the folder you have selected. All other Git commands are available to you now.
I may have been making this harder than it actually is (my normal mo by the way). I think I was coming around to what you were saying.
Basically, install Git on our server, create a local copy of the repo from which we are pulling our data, schedule a daily cmdline job to pull new/updated files, use that local repo as the source to import data from the CSV files.
Am I on the right track?
March 31, 2020 at 5:23 pm
Am I on the right track?
Bang on!
Possible future refinement: if it's a large repo, you can optimise the PULL such that you only get the folder(s) you are interested in.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
March 31, 2020 at 5:48 pm
I really need to learn to start with the KISS Principle, but sometimes it is hard for this old dog to learn better habits.
Thank you Phil for keeping me from spinning my wheels more than I needed to on this.
March 31, 2020 at 10:02 pm
I really need to learn to start with the KISS Principle, but sometimes it is hard for this old dog to learn better habits.
Thank you Phil for keeping me from spinning my wheels more than I needed to on this.
Happy to help.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply