meet the commit

commits never change

once you've made a commit, it's set in stone:

the files in it never change
its diff never changes
its history never changes
the message/author never change

commit hashes

commits never change because their ID is calculated from their contents

Illustration of a box, labelled:

sha1

hash

On the left of the box, with arrows flowing into it:

every file
parent(s)
message
author
timestamp

On the right of the box, coming out of it:
3530a42...

you can think of commits as a pile of diffs

Illustration of a stack of boxes. The one on the bottom is labelled "START", and the three boxes above it are each labelled "snapshot". The top box is also labelled "current".

if you combine all the diffs together, you'll get the current state of the project!

(not how Git works, but a VERY useful way to think about commits!)

you can also think of commits as a pile of snapshots

Illustration of a stack of boxes. The bottom one is labelled "START", and all the ones above it are labelled "snapshot". The top box is also labelled "current".

this is how Git is implemented!

Illustration of two stick people talking. One is bald and looks nonplussed, the other has short curly hair and is smiling.

person 1: is git saving a NEW COPY, EVERY TIME??

person 2: not quite! it has some tricks!

(on the next page!)

diffs are calculated from snapshots

Illustration of two boxes labelled "snapshot", connected with a line.

the diff is the difference between a commit and its parent

smiling stick figure with short curly hair: hey what's the diff for 1b8e29?

git, represented by a box with a smiley face: ooh let me calculate that REALLY FAST!

things git can do with a commit

(file icon) get the files in the commit (like git checkout)
(plus and minus signs) calculate the diff from its parent (like git show)
(two converging arrows) merge it with another commit (like git merge)
(series of dots connected by lines) look at its parents, grandparents, etc. (like git log)

inside the commit

you can see for yourself how git is storing your files!

You just need one command: git cat-file -p

First, get a commit ID. You can get one from git log

1. read the commit

git cat-file -p 3530a4 tree 22b920 parent 56cfdc author Julia 1697682215 -0500 committer Julia 1697682215 -0500 commit message goes here

22b920 is the directory ID

I just use git cat-file for fun and learning, never to get things done

2. read the directory

$ git cat-file -p 22b920 100644 blob 4fffb2 .gitignore 100644 blob e351d9 404.html 100644 blob cab416 Cargo.toml 100644 blob fe442d hello.html 040000 tree 9de29f src

(fe442d is a file ID)
(IDs are actually 40 characters)

3. read a file


$ git cat-file -p fe442d
<!DOCTYPE html>
<html lang="en"
<body>
<h1>Hello!</h1>
</body>
</html></p>

4. and we're done!

fe442d is the sha1 hash of the contents of the file. It's called a "blob id". Commit and tree IDs re hashes too.

Using a hash to identify each file is how git avoids duplication: if the file's contents don't change, the hash won't change, so git doesn't need to store a new version!

the diff algorithm

git is CONSTANTLY showing you diffs

smiling stick figure with short curly hair: git show COMMIT_ID

git, represented by a box with a smiley face: here's the diff!

and it makes it seem like git thinks in terms of diffs

have you ever noticed your git diffs don't make sense?

git: deleted... added...

person: but I didn't DELETE that file, I MOVED it

in git, moving a file is the same as deleting the old one and adding the new one

git mv old.py new.py

is the same as

cp old.py new.py git rm old.py git add new.py

git is just guessing about your intentions

person: git mv old.py new.py git commit

git: well the OLD version has old.py and the NEW version has new.py and they have the same contents... so I guess you moved it

diff is an algorithm

the algorithm:

takes 2 versions of the code
compares them
tries to summarize it in a human readable way

(but it doesn't always do a great job)

git has many diff algorithms

person: I've been trying out histogram because I don't like how the default algorithm displays the diff when I rearrange code

how to try it out:

git diff --histogram

the staging area

git has a 2-stage commit process

tell git what you want to stage (git add, git rm, git mv, etc.)
make the commit with git commit

Diagram showing two boxes, labelled "untracked files" and "unstaged changes". They converge into a box labelled "stage" via git add. They then flow into a box labelled "committed", which has a heart and smiley face beside it, via git commit.

git uses 3 terms interchangeably for the staging area

staged (like --staged)
cache (like --cached)
index (like --keep-index)

it's total chaos but they're all the same thing

tiny illustration of a sad stick figure with curly hair: why

tip: you can use `git add -p` to commit only certain parts of a file

person: I only want to commit my actual changes, not all the random debugging code I put in

gotcha: `git diff` only shows unstaged changes

You can use:

git diff HEAD to see ALL changes you haven't committed yet
git diff --cached to see staged changes

gotcha: `git commit -a` doesn't automatically add new files

person: I CONSTANTLY forget to add new files and then get confused about why they didn't get committed

meet the branch

theoretically you could use git without branches

You could keep track of your commit IDs manually:

Illustration of a smiling stick figure with medium-length straight hair.

person: hmm, what was I working on? oh yes, a38b997!

But most people use branches.

every branch has 3 things

a name (like main)
a latest commit (like 2e9ffc)
a reflog of how that branch has evolved over time (page 26)

Branches also sometimes have a corresponding remote branch which they "track"

branches are core to how git stores your work

If your commits are "lost" (not on a branch) (page 13):

(sad face) git's garbage collection will eventually delete them
(sad face) they'll become incredibly difficult to find

the only difference between the main branch and any other branch is how you treat them

For example: it's common to never commit to main directly, and instead commit to other branches which you merge into main when you're done.

all changes to a branch are recorded in its reflog

The reflog records every rebase, amended commit, pull, merge, reset, commit, etc. You can look at the reflog like this:

git reflog BRANCHNAME

reflog stands for "reference log" (not re-flog ) (smiley face)

git will let you do literally anything with a branch

when you push/pull a branch, the local branch name doesn't have to match the remote branch name
you can remove commits from a branch with git reset

Git often won't protect you from messing up your branch!

what's a branch?

You can think about a Git branch in 3 different ways.

Each of the three ways is illustrated with a diagram of a vertical line divided up into four nodes, labelled “main”. A diagonal line with three nodes is coming off the second node from the bottom, labelled “armadillo”.

way 1: just the commits that “branch” off

This is how I usually think about branches: armadillo branches off main

In this diagram, the part that branches off is highlighted in red, and labelled, "I think of the armadillo branch as these two commits."

How this shows up in git:

Git DOESN'T KNOW that armadillo is branched off of main: for all it knows, main could be branched off of armadillo! You need to tell it when you merge or rebase, for example:

git checkout main git merge armadillo

way 2: every previous commit

Even though git doesn't treat the main branch in any special way, I think of main differently from other branches.

In this diagram, the "main" vertical line is highlighted red, and labelled, "I think of my main branch as these 4 commits"

How this shows up in git:

It's what git log BRANCHNAME shows you! How git log main works:

Diagram of three dots in a vertical line. The top one is labelled main (start here). The two below it are each labelled "parent"

way 3: just the commit at the end

This is how branches are actually implemented in git.

In this diagram, the dot at the end of the armadillo branch is red, and labelled "the latest commit on the branch".

How this shows up in git:

It's how branches are stored internally: a branch is fundamentally a name for a commit ID.

.git/refs/heads/main a276f62

(main is the branch name, a276f62 is the ID of the latest commit on the branch)

knowing where you are

many git disasters are caused by accidentally running a command while on the wrong branch…

Illustration of a stick figure with a neutral expression.

person: git commit

person, thinking: UGH I didn’t mean to do that on main

… or by forgetting you’re in the middle of a multistep operation

smiling stick figure with curly hair: la la la just writing code

same person, now distressed and surrounded by exclamation marks: OMG I FORGOT I WAS IN THE MIDDLE OF A MERGE CONFLICT

I always keep track of 2 things

am I on a branch, or am I in detached HEAD state? (page 12)
am I in the middle of some kind of multistep operation? (rebase, merge, bisect, etc)

I keep my current branch in my shell prompt

~/work/homepage (main) $

to me it’s as important as knowing what directory I’m in

git comes with a script to do this in bash/zsh called git-prompt.sh, but there are tons of ways to get this info (run git status a lot! use a GUI! use a different shell prompt!)

decoder ring for the default git shell prompt

(main)

on a branch, everything is normal

((2e832b3...)) ((v1.0.13))

the double brackets (( )) mean detached HEAD state. this prompt can only happen if you explicitly git checkout a commit/tag/remote-tracking branch

(main|CHERRY-PICK) (main|REBASE 1/1) (main|MERGING) (main|BISECTING)

in the middle of a cherry-pick/rebase/merge/bisect

detached HEAD state

how git knows what your current branch is: `.git/HEAD`

.git/HEAD is a file where git stores either:

a branch name: the current branch
a commit ID

this means you don't have a current branch. git calls this "detached HEAD state"

by itself, `.git/HEAD` being a commit ID is okay

Illustration of a smiling stick figure with short curly hair.

person: it's a great way to look at an old version of your code!

I don't do it often, but it's super useful!

git does it internally during a rebase!

the only problem is that new commits you make can get "lost" (page 13)

Illustration of five dots in a vertical stack, connected by lines. The top dot is labelled "main" and the bottom dot is labelled "HEAD". There is a dotted line branching off from "HEAD". The dot at the end of the dotted line is labelled "new commit will go here. danger! it won't be on any branch!"

ways you can end up in detached HEAD state

You will end up in detached HEAD state if you checkout:

a tag
$ git checkout v1.3
a remote-tracking branch
$ git checkout origin/main
a commit ID
$ git checkout a3ffab9

if you accidentally create commits in detached HEAD state, it's SUPER easy to avoid losing them

just create a new branch!

git checkout -b oops

(you can also create a branch with git switch -c if you prefer)

git has a little language for referring to commits

the current commit: HEAD
the previous commit: HEAD^
3 commits ago: HEAD^^^
3 commits ago: HEAD~3

The full documentation is at:
man gitrevisions

references

git often uses the term “reference” in error messages

$ git switch asdf fatal: invalid reference: asdf $ git push To github.com:jvns/int-exposed ! [rejected] main -> main error: failed to push some refs to 'github.com:jvns/int-exposed'

“ref” and “reference” mean the same thing

Illustration of a tiny worried-looking stick person with a thought bubble reading “!”

“reference” often just means “branch”

in those two error messages, you can replace “reference” with “branch”

fatal: invalid reference: asdf error: error: failed to push some branches to 'github.com:jvns/int-exposed'

in my experience, it’s:

94% “branch”
3% “tag”
3% “HEAD”
0.01% something else

it’s an umbrella term

Illustration of git, represented by a box with a smiley face

git, thinking: “well, I COULD check if the thing we failed to push is a branch or tag or what, and customize the error message based on that….”

git, thinking: “seems complicated, let’s just print out “reference””

sad person: “why?”

reference: the definition

References are files. They're almost all in .git/refs.

Here’s a list of every type of git reference that I have ever used:

HEAD: .git/HEAD
branches: .git/refs/heads/$BRANCH
tags: .git/refs/tags/$TAG
remote-tracking branches: .git/refs/remotes/$REMOTE/$BRANCH
stash: .git/refs/stash

all of these files except HEAD contain a commit ID, but the way that commit ID is used depends on what type of reference it is

(stash is a weird reference: when you run git stash, git creates a "temporary" commit. Git stores the commits you have stashed in the stash's reflog: .git/logs/refs/stash)

git’s garbage collection starts with references

the algorithm is:

find all references, and every commit in every reference’s reflog
find every commit in the history of any of those commits
delete every commit that wasn’t found

git's garbage collection won't delete commits for at least 90 days by defualt.

lost commits

commits in git are usually saved forever

But even if git still has your commits, they're not always easy to find.

Some ways commits get "lost":

git commit --amend
git rebase
deleting an unmerged branch
git stash drop

the three levels of losing commits

annoying: the commit isn't in the history of any branch/tag, but it's relatively easy to find
nightmare: you need to search every single commit to find it
disaster: it's been deleted

how commits can get lost: `git commit --amend`

before:
Diagram of two boxes side by side, labelled "main branch". The one on the left is labelled "parent". The one on the right is labelled "fix color buug" (typo!).

after:
The same diagram as above, but the initial two boxes are now labelled "Now it's "lost"!". Also branching off of "parent" is a third box, labelled "fix color bug". That branch is now labelled "main branch".

how commits can get lost: `git rebase`

before:
Two boxes side-by-side, connected by a line. These are labelled "main branch". Also branching off of the leftmost box are two further boxes, one labelled with a heart, and one with a star. These are labelled "feature branch".

after:
An initial box with two lines of boxes coming off of it. The topmost line of boxes is a blank box, followed by a heart, then a star. The blank box is labelled "main branch". The heart and star boxes are labelled "feature branch". The lower line of boxes have a heart and a star and are highlighted in red and labelled "now these two are "lost!".

how commits can get lost: `git stash drop`

before:
Three boxes in a horizontal row. The left two boxes are blank. The middle box is labelled "main branch". The rightmost box has a star, and is labelled "stashed commit".

after:
The same diagram as above, but now the rightmost box is labelled "now it's "lost"!".

stash is the only way I've seen the "nightmare" situation happen.

you can find lost commits

I find it very comforting to know that git keeps my lost commits around. How to find them:

annoying: use the reflog (page 26)
nightmare: use git fsck
disaster: impossible (but this has never happened to me)

inside git

Here's an overview of the main parts of the .git folder! Don't worry if you don't understand all this yet. We'll get to it.

HEAD

HEAD is a tiny file that just contains the name of your current branch

.git/HEAD
ref: refs/heads/main

HEAD can also be a commit ID, that’s called “detached HEAD state”

branches

a branch is stored as a tiny file that just contains 1 commit ID. It’s stored in a folder called refs/heads.

.git/refs/heads/main

75bbae4 - (actually 40 characters)

tags are in refs/tags , the stash is in refs/stash

commit

a commit is a small file containing its parent(s), message, tree, and author

.git/objects/75/bbae4

tree c4e6559 parent 037ab87 author Julia 1697682215 committer Julia 1697682215 commit message goes here

the files in /objects/ are compressed, the best way to see objects is with git cat-file -p HASH

regular commits have 1 parent, merge commits have 2+ parents

trees

trees are small files that list the permissions, type, ID, and name of every file in a directory. The files in it are called “blobs”

.git/objects/c4e6559

100644 blob e351d93 404.html 100644 blob cab4165 hello.py 040000 tree 9de29f7 lib

if you recognize 644 and 755 as unix permissions: beware that they’re super restricted, only 644 and 755 are allowed

blobs

blobs are the files that contain your actual code

.git/objects/ca/b4165 print("hello world!!!!")

storing a new blob with every change can get hib, so git gc periodically packs them for efficiency in .git/objects/pack

reflog

the reflog stores the history of every branch, tag, and HEAD

.git/logs/refs/heads/main

2028ee0 c1f9a4c (before/after commit IDs)
Julia Evans (user)
1683751582 (timestamp)
commit: no ligatures in code (log message)

each line of the reflog has:

before/after commit IDs
user +
timestamp
log message

remote-tracking branches

remote-tracking branches store the most recently seen commit ID for a remote branch

.git/refs/remotes/origin/main a9bbcae

when git status says “you’re up to date with origin/main”, it’s just looking at this. More on page 23.

`.git/config`

.git/config is a config file for the repository. it’s where git stores the configuration for your remotes (and other local config settings.)

.git/config

[remote "origin"] url = [email protected]: jvns/int-exposed fetch = +refs/heads/*: refs/remotes/origin/* [branch "main"] remote = origin merge refs/heads/main

git has and local global settings, the local settings are here and the global ones are in ~/.gitconfig

hooks

hooks are optional scripts that you can set up to run (eg before a commit) to do anything you want

.git/hooks/pre-commit

#!/bin/bash any-commands-you-want

the staging area

the staging area stores files when you’re preparing to commit

.git/index (binary file)

the index is one of the only things in git that doesn't have a plain text format. You can see its contents with: git ls-files --stage (though in practice I just use git status)

meet the merge

merging is a huge thing in git

But the terminology around merging is a bit confusing:

git merge
isn't the only way to combine branches: you can also use git rebase!
merge conflicts (surrounded by sad faces)
can happen if you do any of these:
- git merge
- git rebase
- git cherry-pick
- git revert
- git stash pop
merge commits
are only created by git merge

Illustration of two stick figures talking, one is bald and looks unhappy, the other has curly hair and is smiling.

person 1: ... and what the heck is "fast forward"?

person 2: let's talk about it!

there are 3 situations when combining branches

easy: no divergence ("fast-forward")
Diagram of a box with a heart in it, labelled "main". Branching off it in a horizontal line, are three boxes with a star, a hash symbol, and a squiggle. The squiggle box is labelled "panda".
git merge moves the main branch forward to where the panda branch is, like this:
Same diagram as above, except now the squiggle box is labelled "main" as well as "panda".
harder: diverged branches, no conflicts
Diagram of two boxes in a horizontal line, one with a heart, and one with a star. Branching off of the star box are two boxes, one with a hash symbol and one with a spiral. These two boxes are labelled "editing different code". you have to decide whether to merge or rebase, but it'll succeed
hardest: diverged branches with merge conflicts
The same diagram as above, except now the two final boxes are labelled "editing the same code", and there is a sad stick figure standing beside it.
you have to decide whether to merge or rebase, AND fix a merge conflict

git merge checks for these 3 situations in order

is this the "easy" situation?
if no, run the merge
if yes, fast forward!
run the merge. Is there a merge conflict?
if yes, tell you to manually resolve the conflict
if no, done!
tell you to manually resolve the conflict

git pull needs to combine branches too

git pull
will ONLY fast forward (easy mode) by default. If it can't, it'll ask you to specify if you want to rebase or merge.
git pull --rebase
runs git rebase
git pull --no-rebase
runs git merge

combining diverged branches

there are 3 options for combining branches

merge
rebase
squash

for example, let’s say we’re combining these 2 branches:

Diagram: A box with a heart. To its right is a box with a star. From here, it branches out into branch 1, which consists of one box with a hash symbol, and branch 2, which consists of a branch with a spiral, followed by a branch with a squiggle.

panel 2:

git rebase
Diagram: A box with a heart. To its right is a box with a star. From here, it branches out into branch 1, which consists of a box with a hash symbol, followed by a branch with a spiral, then a box with a squiggle. Branch 2 consists of a box with a spiral, followed by a box with a squiggle. Branch 2 is made up of dotted lines and labelled “lost”.
git merge
Diagram: A box with a heart. To its right is a box with a star. From here, it branches out into branch 1, which consists of a box with a hash symbol. Branch 2 consists of a box with a spiral, followed by a box with a squiggle. Branches 1 and 2 both lead into a new box, with a diamond.
git merge --squash
Diagram: A box with a heart. To its right is a box with a star. From here, it branches out into branch 1, which consists of a box with a hash symbol, followed by a new box containing both a squiggle and a spiral. Branch 2 consists of a box with a spiral, followed by a box with a squiggle. Branch 2 has a box with a spiral, followed by a branch with a squiggle.

all 3 methods result in the EXACT SAME FILES

some differences are:

the diff git shows you for the final commit
the commit ids
the specific flavour of suffering the method causes

rebase

pro: you can keep your git history simple:

Diagram: a git history that is just a series of boxes in a straight line.

pain:

harder to learn [sad face]
harder to undo [sad face]
easier to mess up [sad face]

(I love rebase though!)

merge

pro: if you mess something up, the original commits are still in your branch’s history

pain: when I look at histories like this I feel dread [sad face]

Diagram: a complicated git history with a number of different branches.

squash

pro: have 20 messy commits? nobody needs to know!

And it’s pretty simple to use.

pain: “ugh, someone squashed their 3000-line branch into 1 commit” [sad face]

merge conflicts (three sad faces)

merge conflicts happen because both branches edited the same lines of code

An illustration showing a merge algorithm, represented by a box with a mischievous expression. It has a thought bubble with three sequences of symbols. One reads dot, triangle, circle with top half filled in. This one has arrows coming out from it pointing to the two other sequences of symbols: one is plus sign, triangle, circle with top half filled in, the second is dot, triangle, circle with right half filled in. The merge algorithm is generating a sequence of: plus sign AND dot, with question marks around it, triangle, circle with right half filled in.

some ways to resolve merge conflicts

edit the weird text file by hand
often the easiest way!
use a dedicated merge conflict tool
you can configure git so that git mergetool opens conflicts in your favourite tool. I like meld on Linux.
abort the merge and rewrite the code you were merging from scratch
might be easier if there was a big refactor! You can do this with git merge --abort
if the conflict is in an autogenerated file, delete and regenerate it
great for package-lock.json in node!
go have a conversation with the other person about what to do

the weird text file

Git merge conflicts are confusing because they're not displayed in a consistent way:

(sad face) the code from the branch you started on is:

at the top if you merged
at the bottom if you rebased

(sad face) git often won't give you the branch name that the code comes from

Tiny illustration of a sad stick figure saying "why?"

<<<<<<< HEAD def parse(input): return input.split("\n") ||||||| b9447fc def parse(input): return input.split("\n\n") ======= def parse(text): return text.split("\n\n") >>>>>>> a29b3cf

the first three lines are the top

the part after the "=======" is the bottom

the rest is the original (configure merge.conflictstyle diff3 to get this)

finishing up

To finish, you need to run one of:

git commit
(for git merge)
git rebase --continue
(for git rebase)
git cherry-pick --continue
(git cherry-pick)
git revert --continue
(for git revert)

Before that, I might:

look at my changes with git diff main
check for unresolved conflicts with git diff --check

merge commits

merging 2 diverged branches creates a commit

git checkout main git merge mybranch

Diagram of two boxes in a row, one with a heart, and one with a star. From the star, it branches out into a branch with a hash symbol, labelled main. The other branch coming off of the star has a box with a spiral followed by a box with a spiky symbol. The two branches converge in a box with a diamond symbol, labelled “merge commit!”.

merge commits have a few surprising gotchas!

gotcha: merging isn’t symmetric

these merges result in the same code, but the first parent of the merge commit is different: it's the current commit you had checked out when you merged.

merge mybranch into main
git checkout main git merge mybranch
merge main into mybranch

git checkout mybranch git merge main

A merge commit with the "wrong" first parent makes HEAD^ or HEAD^^^^ behave in an unexpected way: ^ refers to the first parent.

gotcha: you can keep coding during a merge

If you forget you’re doing a merge, it’s easy to accidentally keep writing code and add a bunch of unrelated changes into the merge commit.

I use my prompt (page 10) to remind me.

gotcha: `git show` doesn’t tell you what the merge commit did

It’ll often just show the merge commit as “empty” even if the merge did something important (like discard changes from one side).

Illustration of a tiny sad stick person with curly hair

person: why

tip: see what a merge did with git `show --remerge-diff`

git show --remerge-diff COMMIT_ID

will re-merge the parents and show you the difference between the original merge and what’s actually in the merge commit

meet the remote

any repository you're pushing to / pulling from is called a "remote"

remotes can be:

hosted by GitHub/GitLab/etc.
on your own server
just a folder on your computer

remotes are where the drama happens

Smiling stick figure with short curly hair: I spent 3 hours working on cats.py

person: git pull

git, represented by a box with a smiley face: fun fact! your coworker totally rewrote that file!

remotes are configured in `.git/config`

every remote has a name and URL


[remote "origin"]

url = [email protected]:jvns/myrepo

branch ["main"]

remote = origin

merge = refs/heads/main

"origin" is the name, "[email protected]:jvns/myrepo" is the URL.

this sets up "tracking" between local main remote main on origin so that git knows what to push to when you run git push or git pull

`git push` syntax

(same for git pull)

git push origin main

"origin" is the remote name, "main" is the remote branch.

the default name for a remote is origin but you can name it anything

tip! I like to configure push.autoSetupRemote true to automatically set up tracking the first time I push a new branch

example: I use 2 remotes when contributing to open source projects

Diagram of a box labelled "local repo". Local repo has an arrow labelled "push to here", pointing to a box labelled "My personal GitHub fork". That box has an arrow labelled "pull request", pointing to a box labelled "main project repo name: "origin"". That box has an arrow labelled "pull from here", pointing back to the "local repo" box.

protocols

Git has 3 main protocols for remotes. The protocol is embedded in the URL.

HTTP (I use this if I only want to pull)
https://github.com/jvns/myrepo
SSH (I use this if I need to push)
[email protected]:jvns/myrepo
local
file:///home/bork/myrepo

diverged remote branches

when pushing/pulling, the hardest problems are caused by diverged branches


! [rejected]

main -> main

(non fast-forward



fatal: Not possible to fastforward, aborting



fatal: Need to specify how to reconcile divergent branches.

(each of these three messages is in a spiky bubble, and they are all surrounded by numerous sad faces.)

what are diverged branches?

both sides have commits that the other doesn't, like this:

An illustration of two boxes in a row, connected by a line. The first one has a star, the second has a heart. Branching out from the heart are a box with a hash symbol, labelled "local main", and a box with a squiggle, labelled "remote main".

I like to fix my diverged branches before making more commits.

there are 4 possibilities with a remote branch

up to date
Illustration of three boxes in a row, connected by lines. The final box is labelled both "local" and "remote".
need to pull
Illustration of four boxes in a row, connected by lines. The second box is labelled "local" and the fourth one is labelled "remote".
need to push
Illustration of four boxes in a row, connected by lines. The second box is labelled "remote" and the fourth one is labelled "local".
DIVERGED (need to decide how to solve it)
Illustration of two boxes in a row, connected by lines. Diverging from the second box are two branches. One has one box in it and is labelled "remote". The other one has two boxes and is labelled "local".

Illustration of a smiling stick figure with short curly hair.

person: when I have a diverged branch, I usually just run git pull --rebase and move on. On the next page we'll talk about some other options though!

how to tell if your branches have diverged: `git status`


$ git fetch

$ git status

Your branch and 'origin/main' have diverged, and have 1 and 1 different commits each,
respectively.

(use "git pull" to merge the remote branch into yours)

(use git fetch to get the latest remote state first)

`git fetch` and `git pull`

git fetch just fetches the latest commits from the remote branch git pull origin main has 2 parts:

run git fetch origin main
run git merge origin/main (or sometimes rebase)

(More about how to tell git pull to merge/rebase on page 16!)

fixing diverged remotes

ways to reconcile two diverged branches

Illustration of a sequence of boxes joined with lines. The first box is a star, the second box is a heart, and then it branches out into two boxes, one with a hash symbol and one with a squiggle. Hash symbol box is labelled “local main” and squiggle box is labelled “remote main”

combine the changes from both with (1) rebase or (2) merge!
throw out your local changes (3) after breaking your local branch!
throw out the remote changes (4) to get rid of something you accidentally pushed (be REAL careful with this one)

reasons to throw away changes

I’ll throw away local changes if I accidentally committed to main instead of a new branch
I’ll throw away remote changes if I want to amend a commit after pushing it, and I’m the only one working on that branch

1. rebase

git pull --rebase

git push

Illustration of four boxes (star, heart, squiggle, hash) in a straight line, labelled “local main” and “remote main”

Many people like to configure git config pull.rebase true to make this the default when they run git pull

2. merge

git pull --no-rebase git push

Illustration of two boxes (star and heart) that then diverge into two branches (hash and squiggle) then reconvene into a fifth box, with a diamond in it, labelled “local main” and “remote main”

3. throw away local changes

git switch -c newbranch git switch main git reset --hard origin/main

(the first line is labelled “optional: save your changes on main to newbranch so they’re not orphaned)

Illustration of two boxes (star and heart) that then diverge into two branches (hash and squiggle), which are labelled “new branch” and “local main, remote main” respectively.

4. throw away remote changes (DANGER!)

git push --force

Illustration of two boxes (star and heart) that then diverge into two branches one with a hash symbol, labelled “local main, remote main”, and one with a squiggle, whose box is a dotted line, and that’s labelled “orphan”.

I ONLY do this if there's nobody else working on the branch.

remote branch caching

the “up to date” in git status is misleading

$ git status Your branch is up to date with origin/main

this does NOT mean that you’re up to date with the remote main branch. But why not???

some old version control systems only worked if you were online

Illustration of a sad stick figure with short curly hair.

person (thinking): my internet went out, guess I can’t work

git works offline

Illustration of a git developer, represented by a smiling stick figure with short straight hair.

git developer (thinking): I want to be able to code on a train with no internet

git developer (thinking): NOTHING in git will use the internet except git pull, git push, and git fetch

this makes git status weird

git developer (thinking): we need to tell people if their branch is up to date… with NO INTERNET??? how?

solution: CACHING

Every remote branch has a local cache named like origin/mybranch (origin is the remote name, mybranch is the branch name)

Git doesn’t call it a cache though, it calls it a “remote tracking branch”

local branch: mybranch

cache: origin/mybranch (only updated on git pull, git push, git fetch)

remote branch: origin mybranch (git push origin mybranch updates this)

(git has no easy way to see when origin/mybranch was last updated)

losing your work

people are always saying:

Illustration of two stick figures talking. One is bald and smiling, the second has long curly hair and is frowning.

person 1: don’t worry! it’s impossible to lose your work in git!

person 2 (thinking): my lost work says otherwise

but some parts of git are MUCH safer than others

commits on a branch / tag

(lock icon) never change

Illustration of a smiling stick figure with curly hair. Their speech bubble is surrounded by hearts and stars.

person: you can ALWAYS use the commit ID to get your work back!

unreachable commits (page 13)

(lock icon) never change, except…

[sad face] they're hard to find

[sad face] they’ll eventually get deleted by git’s garbage collection (page 12)

(usually not for a few months though)

branches and HEAD

(unlocked lock icon) change ALL THE TIME

(clock going backwards icon) BUT there’s a history of all the changes in the reflog

Tiny cute illustration of a smiling stick figure with curly hair.

person: the reflog is NOT easy to use but at least it’s there

staging area

(unlocked lock icon) changes ALL THE TIME

(crossed out clock going backwards icon) no history

(sad face) just gotta be careful

the stash

(crossed out clock going backwards icon) git stash pop deletes entries forever

... but you can technically get them back by scrolling up in your terminal to find the commit ID (if you're lucky) or by using git fsck (if not)

(I only really use git stash to throw away work)

git reset

git has no undo

there's no

unadd
uncommit
unmerge
unrebase

instead, git has a single dangerous command for undoing:

git reset

most git commands move the current branch forwards

git commit
Illustration of three boxes in a row, connected by lines. There is an arrow pointing from the second box to the third box.
git merge
Illutration of two boxes in a row, connected by lines. From the second box, two lines diverge to two other boxes, and from those two, lines converge back into a final box. There is an arrow pointing from one of the diverged boxes into the final merged box.
git pull
Illustration of five boxes in a row, connected by lines. There is an arrow pointing from the second box to the fifth box.

(though rebase is a sideways move)

git reset can move the current branch anywhere

backwards!
forwards!
"sideways"!

Illustration of five boxes, connected with lines into two branches, with arrows pointing in all directions amongst them.

this makes it possible to undo, but you can also really mess up your branch

how git reset works

git reset HEAD^

finds the commit ID corresponding to HEAD^ (for example a2b3c4)
forces your current branch to point to a2b3c4
unstages all changes

`--hard`: the danger option

git reset $COMMIT_ID
Keeps all the files in your working directory exactly the same.

git reset --hard $COMMIT_ID
Throws away all your uncommitted changes. Useful but dangerous.

problems reset can cause

(sad face) it's easy to "lose" commits, especially if you move a branch backwards
(sad face) if you use --hard, you can permanently lose your uncommitted changes

the reflog

a reflog is a log of commit IDs

I use the reflog to find "lost" commits: it contains every commit ID that the branch/tag/HEAD has ever pointed to.

some differences between `git log main` and `git reflog main`

the reflog only contains activity from the last 90 days (by default)
the reflog can show you where your branch was before a rebase. git log can't
the reflog isn't shared between repositories. git log is.
if I'm looking at the reflog, I'm having a bad day

which reflog to use?

The main two I use are:

git reflog

every single commit you've ever had checked out
has everything but very noisy
it's the reflog for HEAD

git reflog BRANCH

just the history for that branch, might be less noisy

how to use the reflog

run git reflog
sadly stare at output until you find a log message that looks right
look at the commit
git show $COMMIT_ID git log $COMMIT_ID
repeat until you find the thing
use something like
git reset --hard $COMMIT_ID or
git branch $NAME $COMMIT_ID
to put the commit on a branch

the reflog kind of sucks

(sad face) if you delete a branch, git deletes its reflog
(sad face) if you drop a stash entry, you can't use the reflog to get it back
(sad face) reflog entries don't correspond exactly to git commands you ran

But it's the best we have.

`git fsck`: the last resort

If a commit isn't in the reflog (for example if you "lost" it with git stash drop), there's still hope!

You can use git fsck to list every commit ID that's unreferenced.

I've never done this though: I try to avoid getting into this situation.

thanks for reading

As always, my favourite way to learn more about git is to experiment ("experiment" is in a spiky bubble) Make a new repository for testing! Make branches in it! Try a rebase! See what happens!

There are also a million tools that can make git easier, for example:

a shell prompt. I use the one built into fish
an editor integration. I use vim-gitgutter
a merge conflict tool. I use meld
tools to display diffs, like delta
a GUI, like lazygit or GitUp on Mac OS

Illustration of a smiling stick figure with curly hair.

person: there are TONS of great tools out there. try some out to see what's right for you!

This zine comes with a printable cheat sheet! It's here:

https://wizardzines.com/git-cheat-sheet.pdf

acknowledgements

Cover illustration: Vladimir Kašiković

Pairing: Marie Claire Leblanc Flanagan

Technical review: James Coglan

Copy editing: Gersande La Flèche

and thanks to all 68 beta readers

How Git Works

By Julia Evans

about this zine

table of contents

commits

branches

inside git

merging

remotes

dealing with disasters

meet the commit

commits never change

commit hashes

you can think of commits as a pile of diffs

you can also think of commits as a pile of snapshots

diffs are calculated from snapshots

things git can do with a commit

inside the commit

you can see for yourself how git is storing your files!

1. read the commit

2. read the directory

3. read a file

4. and we're done!

the diff algorithm

git is CONSTANTLY showing you diffs

have you ever noticed your git diffs don't make sense?

in git, moving a file is the same as deleting the old one and adding the new one

git is just guessing about your intentions

diff is an algorithm

git has many diff algorithms

the staging area

git has a 2-stage commit process

git uses 3 terms interchangeably for the staging area

tip: you can use git add -p to commit only certain parts of a file

gotcha: git diff only shows unstaged changes

gotcha: git commit -a doesn't automatically add new files

meet the branch

theoretically you could use git without branches

every branch has 3 things

branches are core to how git stores your work

the only difference between the main branch and any other branch is how you treat them

all changes to a branch are recorded in its reflog

git will let you do literally anything with a branch

what's a branch?

way 1: just the commits that “branch” off

How this shows up in git:

way 2: every previous commit

How this shows up in git:

way 3: just the commit at the end

How this shows up in git:

knowing where you are

many git disasters are caused by accidentally running a command while on the wrong branch…

… or by forgetting you’re in the middle of a multistep operation

I always keep track of 2 things

I keep my current branch in my shell prompt

decoder ring for the default git shell prompt

detached HEAD state

how git knows what your current branch is: .git/HEAD

by itself, .git/HEAD being a commit ID is okay

the only problem is that new commits you make can get "lost" (page 13)

ways you can end up in detached HEAD state

if you accidentally create commits in detached HEAD state, it's SUPER easy to avoid losing them

git has a little language for referring to commits

references

git often uses the term “reference” in error messages

“reference” often just means “branch”

it’s an umbrella term

reference: the definition

git’s garbage collection starts with references

lost commits

commits in git are usually saved forever

the three levels of losing commits

how commits can get lost: git commit --amend

how commits can get lost: git rebase

how commits can get lost: git stash drop

you can find lost commits

inside git

HEAD

branches

commit

tip: you can use `git add -p` to commit only certain parts of a file

gotcha: `git diff` only shows unstaged changes

gotcha: `git commit -a` doesn't automatically add new files

how git knows what your current branch is: `.git/HEAD`

by itself, `.git/HEAD` being a commit ID is okay

how commits can get lost: `git commit --amend`

how commits can get lost: `git rebase`

how commits can get lost: `git stash drop`

`.git/config`

gotcha: `git show` doesn’t tell you what the merge commit did

tip: see what a merge did with git `show --remerge-diff`

remotes are configured in `.git/config`

`git push` syntax

how to tell if your branches have diverged: `git status`

`git fetch` and `git pull`

`--hard`: the danger option