Topic 7: Introduction to using Git and GitHub with RStudio

1 Git…? GitHub…?

1.1 What is Git?

Git is a version control system. It helps you:

  • Keep track of every change you make to your files
  • Go back to older versions if something breaks
  • Collaborate with others without overwriting each other’s work

Really, Git is a command line tool. That is, it was originally designed to be used by typing text command into the command line. However, RStudio has some Git integration, that gives us a point and click interface that makes the whole think more friendly.

1.2 What is GitHub?

GitHub is a website that hosts your Git repositories online. It lets you:

  • Store your project backups in the cloud
  • Share your work or collaborate with others
  • Access your code from anywhere

So, in short, if Git is the version control tool, GitHub is like Google Drive for your Git repositories.

1.3 Using Git and GitHub

The following guide will walk you through the process of setting up a repo on GitHub.

The only prerequisite for this is that you have already signed up for a Github account. If you have not already created one, do so now https://github.com/signup.

GitHub is an on-line platform for storing code or other text based projects, in repositories (repos). Each week you will create a new repository for your work in the tutorials. We will go through the process of setting up a repo step by step.

This is the process you should follow every week in your pair programming tutorials. Choose one person who will do this, on their own computer, and work through the steps together. Only do this on that one person’s computer. Don’t worry, all members of the team will get all work at the end, and all individuals will get a chance to do these steps on their own computer in future weeks.

Git has a whole vocabulary of its own. There is a glossary available that you can have open as you work through these notes Git glossary

2 Github

2.1 Creating a new repo in GitHub

  • Sign into your GitHub account here https://github.com.
  • Navigate to the repositories tab in your GitHub account.
Figure 1: Github Repositories tab
  • Click the big green button to create a new repository.
Figure 2: Click “New” to begin creating a repo

This will take you to the repo initialization page.

2.2 Initializing your repo

You can ignore the “template” section. If you are a keen bean, you can explore this later. It will allow you to create a template repo for all future weeks based on this week’s repo. But we will not go over it here.

  • Give your repo a meaningful name.
    • tutorial-repo sounds good, right? 🤨…?
Figure 3: Naming your repo. Don’t forget to give a description.
  • Also fill in the Description.
    • It says “optional”, but fill it in now for benefits later.
    • More detail = more benefit.

If we follow the defaults, we will create a completely blank repo. That’s fine, and sometimes may be what you want. But lets make GitHub do some of the tedious bits for us (Figure 4).

  • Tick the add README file box
  • click the add .gitignore drop-down and find your language of choice (if you need a clue, its R).
  • We can leave the license as none for now.
    • But you should totally consider adding a license for personal projects, and definitely for any future research projects.
Figure 4: Add a readme - whatever you wrote in your description above will be placed here automatically. Also, select the .gitignore file appropriate for R.

And click the big green button at the bottom.

Figure 5: Click “Create repository”

2.3 Preparing to “clone” the repository

After clicking the green button, you will be taken to a new page. This is your new repo. However, it so far is only accessible from GitHub. To make use of it we need to obtain a copy of it on our local machine. the process for doing this is called cloning.

  • Locate the next big green button - it says Code - and click.
  • You will see a web URL with a copy icon next to it. Click the copy icon.

3 The terminal 😨

3.1 Don’t panic - only one command

Decide on a good place to save all your course resources. It is up to you where you want these to live, but make sure it makes sense to you, e.g.:

  • Desktop = bad
  • Somewhere in Documents = better

See Figure 23 at the end for a suggested set-up. You can create a new directory (folder) using whichever file manager you like, e.g. Windows Explorer, Mac Finder, terminal, etc.

  • Here I created folder in Documents called data-science-tutorials.
  • I then right-clicked inside the folder in Windows explorer and clicked Open in terminal
  • …and pasted the web URL copied from GitHub
Figure 7: Paste the repo URL into your terminal. The repo will be cloned in whichever directory you are in when you do this, so make sure you are in the directory you want to be in!

Press enter, to perform the clone. Once completed you will be shown a short summary. What it says is not important, but go ahead and read it as an optional exercise; see what you can glean.

Figure 8: Pressing enter will instruct Git to download the repo.

Success! And that’s the only interaction with the terminal we need.


4 RStudio

At this point we now have a repo on the GitHub cloud storage platform.
And we have clone (copied) it to our local computer.
Now we want to write our R code for the repo inside an RStudio project.
This will be mostly familiar, but follow along anyway, because there may be a few points that are slightly different.

4.1 Setting Up the project in RStudio

  • Create a new project
    • Either File > New project
    • Or click the New project drop-down Figure 9
Figure 9: The Project drop-down menu can be used for creaing new projects.
  • In the menu that opens, select Existing Directory.
Figure 10: We already have a project directory. It just has no project in it. So select “Existing Directory” for creating your project.

Remember, we cloned our repo somewhere, possibly in Documents/

  • Use the Browse button to find your repo.
  • When you have found it click Create Project.
Figure 11: Use the browse button to find the location of your cloned your repo.

4.2 Check out our new Git tracked R project

Open the Files pane, and you will see a few things that are different from the usual state of a new project.

  • A .git folder
    • Don’t mess with this!
    • If you’re one of the aforementioned keen beans, by all means, have a sniff around it, but know that most mortals have little need to tamper with anything in there.
  • A README.md file
    • This is the readme file from GitHub
    • You can edit this if you like - it’s just a text file and it uses markdown syntax
  • A .gitignore file
    • We will take a closer look at this shortly.
Figure 12: Your project already contains the files you used to initialize it in GitHub

4.3 RStudio-Git integration

Now that we are using Git with RStudio, we should be able to find a Git pane (Figure 13). By default, I think it appear in the top right panel of Rstudio. It may be somewhere else, so if you don’t see it, check the other panels

The desire outcome here, is:

  1. Identify which files we want Git to track
    • We don not have to track all files.
  2. Mark them as files of interest
    • Called “adding them to the index”.
    • This is the point at which Git notices changes to files - it has started tracking them.
  3. Commit them
    • It is only now that changes to those files become part of the permanent recorded history.

One you have found the Git pane, you should see two files listed with some yellow squares yellow question mark and a check-box next to them.

This is how we tell Git to track any changes to files. We can choose which files we want Git to track by ticking the check-box.

In this case we do want to track both files, so tick both check-boxes and the untracked yellow question mark should become green squares with A inside added green A.

Figure 13: The Git pane only appears if you are in a Git tracked directory. So you may have never seen it before. It can usually be found in the top right panel of RStudio.
  • Then click the Commit button.
  • Add a commit message
    • No need to be laconic here.
    • Detail is good. Make it meaningful.
    • Click Commit.
Figure 14: Add a descriptive commit message. More detail is good here, especially as your project matures. It will allow you to make sense of historical commits, like if you… need to find that time… you did that thing…
  • You will then be given a short summary of the commit
  • Click close
  • Check out the Git pane again - it should now be empty?
Figure 15: If your commit succeeds you will be given a short summary. If it fails, you may need to call in some support.

4.4 Let’s change something

  • Find the .gitignore file in your Files pane, and click on it.
  • Add two lines at the end of the file.
    • .DS_Store
    • thumbs.db

No matter what computer you are using, add both lines. Your .gitignore file should now look like Figure 16

Figure 16: .DS_Store files often crop up in repos. They are Mac only files and they are a nuisance for your repo. Add them to your .gitignore every time. Windows has a similar file (thumbs.db), but it seems to be less pervasive.

4.5 Commit cycle

Once again, take a look at the Git pane. We should see that the .gitignore file has appeared again. This time instead of untracked yellow question mark, it has a blue square with M inside modified blue M inside.

This is Git saying

You know that file you asked me to track? Well it’s changed

Go through the process again of adding the file, writing a commit message, and committing the changes.

We have just completed the “commit cycle”. This is the most basic and important workflow to learn in Git. Committing is a bit like saving - it’s something you should do often!

Figure 17: Get used to the Git commit cycle. Work > Save > Commit > back to work. And do it often. but note there is one step that we have covered not shown here. Can you remember?

4.6 Push: Lets share our work with the world

Make sure you have committed all changes that you need to make sure your team mates will get the most up to date version. This is the same as Figure 14, but with an extra step: click Push.

Figure 18: Another commit message. Do not use the same commit message as shown here. A commit message needs to describe the changes that happened. What might be more meaningful here?
Figure 19: The summary you see after a push tells you that your have pushed the changes on HEAD (your local version) to main the remote version.

But what does that actually do? It sends (pushes) our work to the GitHub cloud platform. And now, because our repos are public, the whole world can access them!

Pushing allows a break from the commit cycle, in that once you push, your work is now backed up remotely, as well as saved locally.

Commit cycle with the added Push step

5 Forking: How do I get the work?

Maybe the world doesn’t want our work… But our team mates probably do. The process for getting it is similar to the clone operation we did earlier. With a clone we copy the remote repo to our own machine, and our local copy maintains a “hard link” to the original. This means that if you make any changes locally, when you push, those changes will be sent to the original repo.

An alternative is to create a “fork”. Doing this means we still make a copy of the repo, but instead of copying to our local machine, we can create our own copy in our own GitHub account. We can then clone our own copy of the repo, and any changes we push will update the copy in our own account, instead of the original.

So, to create a fork we navigate to the original repo, and find the fork button. Note that anyone can fork any public repo… you can go and fork the ggplot2 repo if you like!

Figure 20: The fork button in the GutHub navbar. Located just above the green code button we use to clone a repo.

Clicking the fork button will take us to a screen very similar to the one we saw when we cloned earlier. We need to choose an “owner” and give the fork a name.

  • Change the owner to yourself, if it does not already show your name.
  • Name your fork
    • You can choose any name you like
    • The default is to use the same name as the original, which is fine.

Changing the name might be more useful if you are forking a project that you intend to develop in a divergent direction from the original.

Figure 21: Choose a name for the fork. the same name as the original repo is fine.

Click the green Create fork button at the bottom of the page. There is a checkbox asking if you want to “copy the main branch only”. By default it is ticked. And usually that is what you will want.

Figure 22: Green “create fork button”. Click it!

5.1 Cloning our fork

Now we have our own copy on our own Github account, to get our own copy of this onto our local machine we do the same clone process as outlined in steps from 2.3 to 3.1

6 Final set-up tips

Here is a suggested directory structure to think about. You do not have to do this, but it is clear, simple, and easy to compare with your team mates.

Figure 23: A possible setup for your work, and the workflow for a pair-programming session.