Object files stored in .git/annex/objects are each put in their own directory. This allows the write bit to be removed from both the file, and its directory, which prevents accidentially deleting or changing the file contents.

The reasoning for doing this follows:

Normally with git, once you have committed a file, editing the file in the working tree cannot cause you to lose the committed version. This is an important property of git. Of course you can rm -rf .git and delete commits if you like (before you've pushed them). But you can't lose a committed version of the file because of something you do with the working tree version.

It's easy for git to do this, because committing a file makes a copy of it. But git-annex does not make a local copy of a file added to it, because the file could be very large.

So, it's important for git-annex to find another way to preserve the expected property that once committed, you cannot accidentially lose a file. The most important protection it makes is just to remove the write bit of the file. Thus preventing programs from modifying it.

But, that does not prevent any program that might follow the symlink and delete the symlinked file. This might seem an unlikely thing for a program to do at first, but consider a command like: tar cf foo.tar foo --remove-files --dereference

When I tested this, I didn't know if it would remove the file foo symlinked to or not! It turned out that my tar doesn't remove it. But it could have easily went the other way.

Rather than needing to worry about every possible program that might decide to do something like this, git-annex removes the write bit from the directory containing the annexed object, as well as removing the write bit from the file. (The only bad consequence of this is that rm -rf .git doesn't work unless you first run chmod -R +w .git)


It's known that this lockdown mechanism is incomplete. The worst hole in it is that if you explicitly run chmod +w on an annexed file in the working tree, this follows the symlink and allows writing to the file. It would be better to make the files fully immutable. But most systems either don't support immutable attributes, or only let root make files immutable.

I'm using a git-annex to store build artefacts on a remote bare repo. Some of these artefacts are used in subsequent builds, which clone the artefacts repo, and use 'git annex get' to retrieve the artefacts of interest.

Unfortunately, I've had to add a little kludge along the following lines to my build script fragment:

git annex get ${file}
find .git/annex/objects -type d -exec chmod +w {} \;

This is necessary because I need to ensure that the cloned git repo is able to be deleted at all times (I'm using yocto/openembedded and it may want to delete the clone for a variety of reasons).

setcap cap_linux_immutable+ep /usr/bin/git-annex

After doing that, git-annex is able to make files immutable, so the additional directory is not needed any more. Even on file systems / in environments where that is not possible, in some situations file lookup speed is way more important than not being able to delete the target of a symlink.

I have no idea how to code in Haskell, so if somebody else could add an appropriate always/never/only-when-necessary config option I'd be very happy, and my media server would not have any more hiccups when switching songs …

Comments on this page are closed.