Skip to content

Git Basics: Version Control for Research Workflows

What is Version Control?

Version control is a system that tracks and manages changes to files over time. It allows you to:

  • Track history: See what changed, when, and why
  • Revert changes: Go back to previous versions if something breaks
  • Collaborate: Multiple people can work on the same project simultaneously
  • Understand context: Commit messages document the "why" behind changes

Imagine you're writing a paper and you save versions like: paper_v1.docx, paper_v1_revised.docx, paper_v1_final.docx, paper_v1_FINAL_ACTUAL.docx. Version control automates this and makes it sane.

xkcd version control

Why version control matters (source: xkcd #1296)

Distributed Version Control with Git

Git is a distributed version control system, meaning:

  • Every developer has a complete copy of the project history on their machine
  • No single point of failure (unlike centralized systems)
  • You can work offline and sync later
  • Multiple developers can work on the same codebase without interfering

Key Concepts

Repository (Repo): A folder containing your project and its entire history.

Commit: A snapshot of your project at a point in time, with a message explaining what changed.

Branch: A parallel version of your code. You can work on features independently without touching the main branch.

Remote: A version of your repository hosted elsewhere (e.g., GitHub).

Push/Pull: Push sends your local changes to the remote; pull fetches changes from the remote.

Setting Up Git

Installation

Linux/macOS:

brew install git        # macOS
sudo apt install git    # Ubuntu/Debian

Windows: Download from git-scm.com

Your First Repository

Initialize a Local Repository

mkdir my_project
cd my_project
git init

Initial Configuration

git branch -M main  #Rename master to main for modern naming convention 
git config user.name 'Tutorial User'
git config user.email 'tutorial@example.com'
git config --list | grep -E 'user.name|user.email'

This creates a hidden `.git` folder containing all version control information.

### Add and Commit Files

```bash
# Create a file
echo 'print("Hello, World!")' > script.py

# Check status
git status

# Stage the file for commit
git add script.py
git status
git commit -m 'Initial commit: add hello world script'

Understanding the Workflow

Working Directory  →  Staging Area  →  Repository
   (your files)      (git add)       (git commit)
  1. Working Directory: Your actual files
  2. Staging Area: Files you've marked for commit
  3. Repository: Committed snapshots stored in .git/

Git staging area

The three states of Git (source: git-scm.com)

Viewing History

git log
git log --oneline
git log --graph --oneline --all

Understanding Commits

Each commit has: - Hash: Unique identifier (e.g., a1b2c3d) - Author: Who made the change - Date: When it was committed - Message: Description of changes

A good commit message follows this structure:

[type](scope): brief summary

Detailed explanation of why this change was made.
What problem does it solve?

Fixes #123

Types: feat, fix, docs, style, refactor, test, chore

Example:

feat(calculations): add distance calculation function

Implements the Haversine formula for calculating great-circle
distances between two points on a sphere given their longitudes
and latitudes. This is useful for geographic calculations.

Fixes #15

Making Changes

Modifying Files

echo 'def add(a, b):' > math_ops.py && echo '    return a + b' >> math_ops.py
echo 'def subtract(a, b):' > utils.py && echo '    return a - b' >> utils.py
git status
git add math_ops.py
git status
git diff
git diff --staged
echo 'print("Math operations module")' >> math_ops.py
git diff math_ops.py
git add .
git commit -m 'feat(math): add math operations and utils modules'

Reverting Changes

echo 'This is an unwanted change' >> script.py
git status
git diff script.py
git checkout -- script.py
cat script.py

Branching

Branches allow parallel development without interfering with your main code.

Git branching diagram

Branching strategy: develop features on separate branches, merge to main (source: nobledesktop.com)

Create and Switch Branches

git branch
git branch feature/new-calculation
git branch
git checkout feature/new-calculation
git branch

Branch Naming Conventions

Good branch names are descriptive and follow patterns: - feature/user-authentication - new features - fix/memory-leak - bug fixes - docs/api-reference - documentation - refactor/database-layer - code reorganization - experiment/ml-model-v2 - exploratory work

Merging Branches

When your feature is ready:

echo 'def multiply(a, b):' > advanced_math.py && echo '    return a * b' >> advanced_math.py
git add advanced_math.py
git commit -m 'feat(math): add multiply function'
git log --oneline
git checkout main
git branch
git log --oneline | head -3
git merge feature/new-calculation -m 'Merge feature/new-calculation into main'
ls -1 *.py
git log --oneline

Merge vs Rebase (for advanced users):

git checkout -b feature/division
echo 'def divide(a, b):' > division.py && echo '    return a / b if b != 0 else "Error"' >> division.py
git add division.py && git commit -m 'feat(math): add divide function'
git log --oneline -5
git rebase main
git log --oneline -5
git checkout main
git merge feature/division --ff-only
git log --oneline --graph -10

Advanced Topics (Intermediate to Expert)

Powerful Log Viewing

Understand your project history better:

git log --graph --oneline --all --decorate
git log --oneline --author='Tutorial User'
git log --oneline | head -5
git log -p -1 | head -20
git log --stat | head -25

Undoing Things (Advanced)

git reflog | head -10
git blame script.py | head -3
git show HEAD | head -30
git show HEAD:script.py

Advanced Branching

echo 'test feature 1' > feature1.py
git add feature1.py && git commit -m 'test commit to reset'
git log --oneline | head -2
git reset --soft HEAD~1
git status
git reset HEAD feature1.py
git status
rm feature1.py

Stashing Work in Progress

echo 'work in progress' > wip.py
git add wip.py
git stash
git status
git stash list
git stash pop
git status
rm wip.py

Cherry-Picking Specific Commits

git checkout -b feature/power
echo 'def power(a, b):' > power.py && echo '    return a ** b' >> power.py
git add power.py && git commit -m 'feat(math): add power function'
echo 'def square(a):' > square.py && echo '    return a ** 2' >> square.py
git add square.py && git commit -m 'feat(math): add square function'
git log --oneline | head -3
git checkout main
git log --oneline | head -3
git cherry-pick $(git log feature/power --oneline | grep 'add power function' | cut -d' ' -f1)
git log --oneline | head -4
ls -1 *.py | grep power

.gitignore Best Practices

cat > .gitignore << 'EOF'
# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
venv/
env/
.venv

# IDE
.vscode/
.idea/
*.swp
*.swo
*~

# OS
.DS_Store
Thumbs.db

# Build/Distribution
build/
dist/
*.egg-info/
.eggs/

# Testing
.pytest_cache/
.coverage
htmlcov/

# Secrets
.env
.secrets
secrets.json
EOF
cat .gitignore
git add .gitignore && git commit -m 'Add .gitignore file'

Best Practices

  1. Commit Often: Small, logical commits are easier to understand and revert if needed
  2. Write Clear Messages: Explain the "why", not just the "what"
  3. Never Commit Secrets: Use .gitignore for sensitive files
  4. Branch for Features: Keep main stable; develop features on branches
  5. Review Before Committing: Use git diff to verify changes

Next Steps

You now understand Git fundamentals! Next, we'll explore GitHub for collaboration and hosting your repositories online.

Resources