Tutorial: GitHub for Data Scientists without the Terminal

Git and GitHub are indispensable tools for anyone analysing data, developing software or disseminating results. Originally designed for software engineers, GitHub is now widely used in many disciplines, especially for researchers in academia. Having a source code management software such as GitHub to host your code and have detailed project documentation is a huge step towards ensuring research is reproducible. It also makes it easier for others to build upon the work you have already done which leads to more efficient use of research time, not to mention your citation count will increase.

Learning Git and GitHub can be a daunting task, especially if you’re not familiar or used to working with the command line (a.k.a terminal). With this in mind we created a new introductory tutorial, catered towards data scientists using R, titled:

GitHub for Data Scientists without the terminal

We provide step-by-step instructions and detailed screenshots to guide you along the way. You will learn about:

  1. Installing Git
    2. Signup for a GitHub account and a Hello World tutorial
    3. Installing GitHub Desktop
    4. Version control R code using an example of PCA
    5. Create a branch, pull request and merge
    6. Introduction to Git functionality in RStudio
    7. Create and publish an R Markdown document
    8. Create an online CV

It is not uncommon now for employers to prioritize your GitHub portfolio over your CV. This tutorial demonstrates how simple it is to get up and running with GitHub. In addition to having an easy-to-use interface, it allows you to easily create websites and host dynamic documents. I encourage you to adopt this workflow, whether you work in industry or academia, to showcase your work, increase efficiency and ensure reproducibility.