Recent comments posted to this site:

@davidriod you can do things like this with special remotes, as long as the special remotes are not encrypted.

I don't really recommend it. With such a shared special remote R and two disconnected git repos -- call them A and B, some confusing situations can occur. For example, the only copies of some files may be on special remote R and git repo B. A knows about the copy in R, so git-annex is satisfied there is one copy of the file. But now, B can drop the content from R, which is allowed as the content is in B. A is then left unable to recover the content of the files at all, since they have been removed from R.

Better to connect the two repositories A and B, even if you do work in two separate branches. Then if a file ends up located only on B, A will be able to say where it is, and could even get it from B (if B was set up as a remote).

Comment by joey Tue Dec 13 16:43:42 2016

Thank you for this, I've always wanted such a GUI, and it's been a common user request!

Comment by joey Wed Dec 7 19:58:11 2016
I was wondering if it is possible to share a rsync special remote between repository which are not parented in any way. The use case would be that even if these repositories are not related at all they still may contains the same binary file. It would be useful to have a single rsync remote in order to reduce space usage. I think it could work as the object names are based on their checksum, but I wonder if anyone has already try that ?
Comment by davidriod Thu Nov 24 19:23:42 2016

Been using the one-liner. Despite the warning, I'm not dead yet.

There's much more to do than the one-liner.

This post offers instructions.

First simple try: slow

Was slow (estimated >600s for 189 commits).

In tmpfs: about 6 times faster

I have cloned repository into /run/user/1000/rewrite-git, which is a tmpfs mount point. (Machine has plenty of RAM.)

There I also did git annex init, git-annex found its state branches.

On second try I also did

git checkout -t remotes/origin/synced/master

So that filter-branch would clean that, too.

There, filter-branch operation finished in 90s first try, 149s second try.

.git/objects wasn't smaller.

Practicing reduction on clone

This produced no visible benefit:

time git gc --aggressive time git repack -a -d

Even cloning and retrying on clone. Oh, but I should have done git clone file:///path as said on git-filter-branch man page's section titled "CHECKLIST FOR SHRINKING A REPOSITORY"

This (as seen on https://rtyley.github.io/bfg-repo-cleaner/ ) was efficient:

git reflog expire --expire=now --all && git gc --prune=now --aggressive

.git/objects shrunk from 148M to 58M

All this was on a clone of the repo in tmpfs.

Propagating cleaned up branches to origin

This confirmed that filter-branch did not change last tree:

git diff remotes/origin/master..master
git diff remotes/origin/synced/master synced/master

This, expectedly, was refused:

git push origin master
git push origin synced/master

On origin, I checked out the hash of current master, then on tmpfs clone

git push -f origin master
git push -f origin synced/master

Looks good.

I'm not doing the aggressive shrink now, because of the "two orders of magnitude more caution than normal filter-branch" recommended by arand.

Now what? Check if precious not broken

I'm planning to do the same operation on the other repos, then :

  • if everything seems right,
  • if git annex sync works between all those fellows
  • etc,
  • then I would perform the reflog expire, gc prune on some then all of them, etc.

Joey, does this seem okay? Any comment?

Comment by StephaneGourichon Thu Nov 24 11:27:59 2016
Wanted to add that "storageclass=COLDLINE" appears to work seamlessly, both from my mac and arm NAS. As far as I can tell, this appears to be a no-brainer vs glacier - builtin git annex client, simpler/cheaper billing, and no 4 hour delay!
Comment by scottgorlin Mon Nov 21 00:49:23 2016
comment 3 e6ce9bb92c973350852c9498b7ffb50f
[[!comment Error: unsupported page format sh]]
Sun Nov 20 04:32:42 2016

I'd like to reiterate a question that was unanswered above:

Is there a way to tell the S3 backend to store the files as they are named locally, instead of by hashed content name? i.e., I've annexed foo/bar.txt and annex puts it in s3 as mybucket.name/foo/bar.txt instead of mybucket.name/GPGHMACSHA1-random.txt

Comment by David_K Wed Nov 16 01:28:14 2016

Wow, scary

Dilyin's comment is scary. It suggests bad things can happen, but is not very clear.

Bloated history is one thing.
Obviously broken repo is bad but can be (slowly) recovered from remotes.
Subtly crippled history that you don't notice can be a major problem (especially once you have propagated it to all your remotes to "recover from bloat").

More common than it seems

There's a case probably more common than people actually report: mistakenly doing git add instead of git annex add and realizing it only after a number of commits. Doing git annex add at that time will have the file duplicated (regular git and annex).

Extra wish: when doing git annex add of a file that is already present in git history, git-annex could notice and tell.

Simple solution?

Can anyone elaborate on the scripts provided here, are they safe? What can happen if improperly used or in corner cases?

  • "files are replaced with symlinks and are in the index" -> so what ?
  • "Make sure that you don't have annex.largefiles settings that would prevent annexing the files." -> What would happen? Also .gitattributes.

Thank you.

Comment by https://launchpad.net/~stephane-gourichon-lpad Tue Nov 15 10:58:32 2016
Is there yet a version for Android 6 resp. 7? Or can I just use the Android 5 version?
Comment by lykos Wed Nov 9 21:44:26 2016

@davidriod if you're using a rsyncurl that uses ssh, then yes, the transmission goes over a secure connection. If the rsyncurl uses the rsync protocol, there would be no encryption.

Of course, encryption=none does not keep the data encrypted at rest, so the admin of the rsync server can see it etc.

Comment by joey Mon Nov 7 17:40:15 2016