A git commit that includes the link to itself, a self-referential git. Yesterday I realized it was time to implement one fun and pointless idea I have had for years, the self-referential git commit.

The challenge with making a self-referential git, is that the commit hash is generated from the commit itself. Basically you need to create data that includes sha1(data), so in theory you get into pretty confusing territory. But luckily for us, it is common practice to accept truncated sha1 commit hashes. Usually 7 hex characters (28 bits) are used as short hashes, which is pretty easy to brute force.

Table of Contents:

Git Commit Objects

A git commit object follows the following formatting:

HEADER | CONTENT

Git Commit Object Header

The git commit header is formatted as follows:

"commit " | CONTENT.length().to_string() | "\x00"

e.g. commit 234<NUL> for 234 bytes of content.

Example hex-dump:

636f 6d6d 6974 2032 3334 00  commit 234.

Git Commit Content

The git commit object content is string object with \x0a (Line-Feed) terminators, e.g.

tree b47711ad33b5a8d10584ba92ba6a801228bb8e3d
author blaufish <blaufish@users.noreply.github.com> 1758323860 +0200
committer blaufish <blaufish@users.noreply.github.com> 1758323860 +0200

commit-message

Example hex-dump:

636f 6d6d 6974 2032 3334 0074 7265 6520  commit 234.tree
6234 3737 3131 6164 3333 6235 6138 6431  b47711ad33b5a8d1
3035 3834 6261 3932 6261 3661 3830 3132  0584ba92ba6a8012
3238 6262 3865 3364 0a61 7574 686f 7220  28bb8e3d.author
626c 6175 6669 7368 203c 626c 6175 6669  blaufish <blaufi
7368 4075 7365 7273 2e6e 6f72 6570 6c79  sh@users.noreply
2e67 6974 6875 622e 636f 6d3e 2031 3735  .github.com> 175
3833 3233 3836 3020 2b30 3230 300a 636f  8323860 +0200.co
6d6d 6974 7465 7220 626c 6175 6669 7368  mmitter blaufish
203c 626c 6175 6669 7368 4075 7365 7273   <blaufish@users
2e6e 6f72 6570 6c79 2e67 6974 6875 622e  .noreply.github.
636f 6d3e 2031 3735 3833 3233 3836 3020  com> 1758323860
2b30 3230 300a 0a73 656c 662d 7265 6665  +0200..self-refe
7265 6e74 6961 6c2d 6769 742f 636f 6d6d  rential-git/comm
6974 2f62 6533 3133 3337 0a0a 3138 3336  it/be31337..1836
3438 3533 0a                             4853.

The logic of the brute force is to perform a brute force based on a previous commit object generated by git.

Full code: self-referential-git/self.py

We just add a counter to the end of the commit message, and keep updating it until SHA1(commit) starts with what we want it to:

def hack(git_object, counter_start, counter_increment, goal, destination_clear, destination_compressed):
    # ...
    header, content = git_object.split(b'\x00', 1)
    hack_str = content.decode("utf-8")
    counter = counter_start;
    while True:
        # ...
        commit_content = hack_str + "\n" + str(counter) + "\n"
        commit_content_bytes = commit_content.encode("utf-8")
        commit_content_bytes_len = len(commit_content_bytes)
        header = f"commit {commit_content_bytes_len}\x00"
        header_content = header + commit_content
        to_be_hashed = header_content.encode("utf-8")
        s = hashlib.sha1(to_be_hashed).hexdigest()
        if s.startswith(goal):
            # ...
            print(f"SHA1: {s}", file=sys.stderr);
            # ...
            return
        counter = counter + counter_increment

Upon a successful match, I write out the commit object, in plaintext:

            print(f"Write to: {destination_clear} (plain)", file=sys.stderr);
            with open(destination_clear, "wb") as f:
                f.write(to_be_hashed)

And zlib compressed:

            print(f"Write to: {destination_compressed} (compressed)", file=sys.stderr);
            with open(destination_compressed, "wb") as f:
                bb = zlib.compress(to_be_hashed)
                f.write(bb)

And we want to use all CPUs to speed up, so

lock = threading.Lock()
terminate = False;

def hack(git_object, counter_start, counter_increment, goal, destination_clear, destination_compressed):
    global lock
    global terminate
    # ...
    while True:
        with lock:
            if terminate:
                return
        # ...
        if s.startswith(goal):
            with lock:
                terminate = True
        # ...
filename = sys.argv[1]
compressed_contents = open(filename, 'rb').read()
decompressed_contents = zlib.decompress(compressed_contents)

cpus = len(os.sched_getaffinity(0))
print(f"\rCPUs detected: {cpus}", file=sys.stderr)

threads = [threading.Thread(target=hack, args=(decompressed_contents,i,cpus,sys.argv[2], sys.argv[3], sys.argv[4])) for i in range(0, cpus)]
for thread in threads:
    thread.start()

for thread in threads:
    thread.join()

All of this is put together in self-referential-git/self.bash:

GUESS="be31337"
FIRST=$(git rev-parse HEAD | grep -o '^..')
LAST=$(git rev-parse HEAD | grep -o '......................................$')

# ...

./self.py .git/objects/$FIRST/$LAST $GUESS hack.msg hack.msg.zlib

set -x

SHA=$(shasum hack.msg | awk '{print $1}' )
FIRST=$(shasum hack.msg | awk '{print $1}' | grep -o '^..')
LAST=$(shasum hack.msg | awk '{print $1}' | grep -o '......................................$')

# ...

cp -- hack.msg.zlib ".git/objects/$FIRST/$LAST"

# terminate with error if git cannot read .git/objects/$FIRST/$LAST
set -e
git cat-file -p "$SHA"

# switch head!
shasum hack.msg > .git/HEAD

Fire off the brute force and have the solution in seconds:

 ./self.bash
rm: cannot remove 'hack.msg.4': No such file or directory
rm: cannot remove 'hack.msg.trunc': No such file or directory
CPUs detected: 32
hack(..., 0, 32, be31337)
hack(..., 1, 32, be31337)
hack(..., 2, 32, be31337)
hack(..., 3, 32, be31337)
hack(..., 4, 32, be31337)
hack(..., 5, 32, be31337)
hack(..., 6, 32, be31337)
hack(..., 7, 32, be31337)
hack(..., 9, 32, be31337)
hack(..., 11, 32, be31337)
hack(..., 8, 32, be31337)
hack(..., 14, 32, be31337)
hack(..., 12, 32, be31337)
hack(..., 13, 32, be31337)
hack(..., 10, 32, be31337)
hack(..., 15, 32, be31337)
hack(..., 17, 32, be31337)
hack(..., 18, 32, be31337)
hack(..., 20, 32, be31337)
hack(..., 19, 32, be31337)
hack(..., 21, 32, be31337)
hack(..., 23, 32, be31337)
hack(..., 25, 32, be31337)
hack(..., 22, 32, be31337)
hack(..., 24, 32, be31337)
hack(..., 27, 32, be31337)
hack(..., 26, 32, be31337)
hack(..., 16, 32, be31337)
hack(..., 28, 32, be31337)
hack(..., 29, 32, be31337)
hack(..., 31, 32, be31337)
hack(..., 30, 32, be31337)
SHA1: be31337eb47bb4b74208bfae23fc8699d14ab1a7
Write to: hack.msg (plain)
Write to: hack.msg.zlib (compressed)
++ shasum hack.msg
++ awk '{print $1}'
+ SHA=be31337eb47bb4b74208bfae23fc8699d14ab1a7
++ shasum hack.msg
++ awk '{print $1}'
++ grep -o '^..'
+ FIRST=be
++ shasum hack.msg
++ awk '{print $1}'
++ grep -o '......................................$'
+ LAST=31337eb47bb4b74208bfae23fc8699d14ab1a7
+ git rev-parse HEAD
ee4e5189bfa1c3140dceba538f0ea98b9d4bd71d
+ mkdir -p .git/objects/be
+ file hack.msg.zlib
hack.msg.zlib: zlib compressed data
+ cp -- hack.msg.zlib .git/objects/be/31337eb47bb4b74208bfae23fc8699d14ab1a7
+ set -e
+ git cat-file -p be31337eb47bb4b74208bfae23fc8699d14ab1a7
tree b47711ad33b5a8d10584ba92ba6a801228bb8e3d
author blaufish <blaufish@users.noreply.github.com> 1758323860 +0200
committer blaufish <blaufish@users.noreply.github.com> 1758323860 +0200

self-referential-git/commit/be31337

18364853
+ shasum hack.msg
+ git rev-parse HEAD
be31337eb47bb4b74208bfae23fc8699d14ab1a7

An easier and slower solution

Earlier simpler approaches can be performed without knowing git internals, if you are happy with just a few characters in your commit hash. And accept a lot of disk writes and weird git problems…

I demo this in:

Brute Force with git commit

You can just loop git commit --reset-author --amend -m ... until you get a match. It is very slow, it hammers your disk, you’ll get weird git errors…

But it works and is easier than understanding git internals!

GUESS="e1337"
counter=0
counter2=0
while true
do
	COMMIT=$(git rev-parse HEAD | grep -o -m1 ^.....)
	if grep -q -- $COMMIT README.md
	then
		echo "$COMMIT: success"
		exit
	else
		echo "$COMMIT not found in README.md, targetting $GUESS... $counter2"
	fi

	NONCE=$(head -c 4 /dev/urandom | xxd -p)
cat > README.md <<EOF
# self-referential-git $GUESS

[self-referential-git/commit/$GUESS](https://github.com/blaufish/self-referential-git/commit/$GUESS)

I just wanted to create a silly git that references itself...
EOF
	git add README.md
	git commit --reset-author --amend -m "self-referential-git/commit/$GUESS

$NONCE" README.md

	# ...
	counter2=$((counter2+1))
done

Expire Reflog and Garbage Collect with Prune

If you let your script perform thousands or millions of git commit --amend ... in my experience you will start getting weird errors…

These are solved by:

  • git reflog expire --expire-unreachable=now --all throw away all commits no longer tracked to a path.
  • git gc --prune=now let git garbage collect anything no longer referenced.
counter=0
# ...
while true
do
    # ...
	if [[ "$counter" -gt 1000 ]]
	then
		counter=0
		git reflog expire --expire-unreachable=now --all
		git gc --prune=now
	else
		counter=$((counter+1))
	fi
    # ...
done

Technical references

Inspirational references

Prior achievements, trailblazer etc. that inspired me to do this silliness: