A Git guide

There a huge number of introductions and tutorials about git on the internet. Some good examples:

And there are some handy manpages, too:

man gittutorial
man gittutorial-2
man gitworkflows

2. Example of how development could be done

DataSHIELD code is stored on Github. Github has its own useful guides for development and contributing to open-source projects. Here you can find more information about 'pull requests', which is one way of contributing to DataSHIELD development.

Some basic setup

First, ensure git knows who you are:

git config --global user.name "Your Name"
git config --global user.email "me@example.com"

1. Pull from the repository

Make sure are up-to-date with the latest changes (and remember you may want to pull all the repositories). The first time you need to clone the repository where the repository origin is its web address. Subsequent updates can be done with pull.

git clone https://github.com/....
git pull --rebase origin

The repository will be cloned or pulled into the directory from which the command is run.

2. Create a private branch off a public branch.

By default you will be on whichever branch you were last working on. You can check what branches exist by running the following in the copy of the repository on your computer:

git branch -a

You'll see that this lists both local and remote branches. The remote branches belong to origin.

We shouldn't work on master. Instead switch to dev:

git checkout dev

Note, this assumes there is a dev branch. If dev doesn't exist, then you can create it and move to it in one command by

git checkout -b dev

In fact, working on dev might get messy. So create a new branch that is a copy of dev, that you can safely make a mess of:

git checkout -b messydev

3. Regularly commit your work to this private branch.

Now make all the changes you want, test it works etc etc. If you finish some part of what you're doing, or you want to 'save' your edits up to that point, you can save (that is, 'commit') the change. But first, you might want to check exactly which files you've changed:

git status

Or, to see exactly which parts of the files you've changed:

git diff HEAD

You don't have to commit everything at once, if you've made multiple changes that are not logically related to each other. To pick the files you want (i.e. to 'stage' them):

git add <file1> <file2>


git add --all

If you want to see the difference between only those files that you have staged (i.e. run `git add` on) then simply pass the staged flag:

git diff HEAD --staged

This can be very useful when using the commands below to stage only some of the changes within a file.

If there are multiple, logically independent changes within a single file then you can split those by:

git add --patch

Then, to commit those files you've added:

git commit -s -m '<explanation of changes>'

The '-s' option adds a nice little "signed-off by: <your name>" at the bottom of the commit message.

The '-m' flag is not optional, it is the message that describes the commit.

If you want to add everything and commit it in one step, you can:

git commit -am '<commit message>'

And if you want to edit the message you wrote for the last commit you made:

git commit --amend
  • Note, to amend the messages of older commits you can use git rebase -i, see below.

4. Once your code is perfect, clean up its history.

You may have made lots of changes and lots of commits on your messydev branch; and so it will be messy. To see just how messy:

git log

If you want more detail about the changes in each commit:

git log -p

If you want less detail and just a summary then:

git log --pretty=oneline

or, for a really pretty summary:

git log --pretty=format:"%C(yellow)%h%Cred%d\\ %Creset%s%Cblue\\ [%cn]" --decorate

You might want to re-order some of these commits, or squash a series of commits together into one, more logical commit:

git rebase -i HEAD~3

This selects the last three commits; you can then choose how to manipulate them. You can specify any number of commits counting backwards from HEAD. However, don't manipulate any of the commits from the dev branch, just those that you have added on your messydev branch. Rebasing is a way of changing the history of what has been done.

Alternatively, you can specify a particular commit to rebase to. So if commit 054966b is 5 commits back from HEAD, then you can select them all 5 with:

git rebase -i 054966b^

Note: The caret '^' at the end of the commit means this commit itself is included in the rebase. If you omitt the caret, then only those commits up to this one will be included in the rebase.

5. Merge the cleaned-up branch back into the public branch.

First, switch back to the dev branch:

git checkout dev

Before the merge, you might want to check there have not been other changes made to dev upstream:

git pull origin dev

The simplest way to do the merge is:

git merge messydev

Also, you can merge in different ways. For example, you can squash all the changes you made on your private messydev branch into a single commit and just merge that:

git merge --squash messydev

Or you can ensure that the merge creates a new commit keeping a record of the merge in the commit history:

git merge --no-ff messydev

6. Push to the shared repository and delete your private branch.

If you cloned the git repositories, then by default you will have a remote defined (origin). You can view this by:

git remote -v

You may or may not have permission to 'push' your changes back to this remote repository. For example:

git checkout dev
git push origin dev

Even if you have permission, never push to the master branch.

If you don't have permission to push to the remote repository, then you can either:

  • Push to your own personal github repository (if you have one, if not, 'fork' the datashield repository you're working on) and submit a 'pull' request (i.e. a request for the DataSHIELD team to pull in your changes).
  • Create a patch for your changes, and email these to the DataSHIELD team. For example, using git format-patch and git send-mail

After you have pushed your changes, delete your private branch:

git branch -d messydev

7. Let someone else do the housekeeping

Alternatively, push your private branch to github and let someone else do the housekeeping with the history and manage the merging of the branches, and who will then delete your private branch from the shared repository on github.

git checkout messdev
git push origin messydev
git branch -D messydev