Recent comments posted to this site:

Running extract on very large files (system backups) can be too long (killed it after running several hours). In general extract seem slow on tar.gz archives.

I added timeout 100s before the tool is called in the pre commit script:

LC_ALL=C timeout 100s $tool_exec "./$f" | ...

This allows to have the commit to complete in reasonable time, probably loosing some metadata.

Comment by aurelf Fri Sep 10 16:47:39 2021

You could add a config to the script that skips over files larger than a certian size.

Or for that matter, the script could be adapted to filter the files to only include images/videos, using eg:

git annex find --mimetype='image/*' --or --mimetype='video/*'

Should be a fairly easy change, patches accepted.

Comment by joey Fri Sep 10 16:47:39 2021
Following the instructions here, I cannot enable the remote. The error message is: git-annex: Unknown remote name.. I assume this is because git annex does not create a uuid for the type=git special remote, presumably because non is set for the actual git remote (the annex-uuid key does not exist for the existing git remote with the same url). This is the relevant line generated in remote.log: autoenable=true location=<ssh-url> name=<name> type=git timestamp=1629118438.628919s, as you can see there is no uuid at the beginning. Any ideas if this is a bug or if the instructions are outdated?
Comment by matthias.risze Mon Aug 30 19:02:09 2021

This can only be used for git remotes that have an annex-uuid set. All special remotes have to have a uuid. git remotes that are hosted on a system with git-annex installed get the annex-uuid set up automatically, but it won't happen if your special remote is on github or something like that. You could git config remote.foo.annex-uuid $(uuid) to generate and set a new uuid, I suppose.

Comment by joey Mon Aug 30 19:02:09 2021

This is like a 1000ft overview, but doesn't actually say where the files are actually stored or how they're synchronized.

Does one need to setup a samba, sftp, or AWS bucket to contain the large files? Does a clone of the repo full down all of the large files, or just the files in the working directory that's checked out? Are files transferred via direct connection to other repos (ex the same SSH tunnel that git uses, http, etc) or is there a UDP p2p layer like syncthing or bittorrent that might struggle with certain NAT situations?

The sentence "A file's content can be transferred from one repository to another by git-annex. Which repositories contain a given value is tracked by git-annex (see location tracking)." makes it sound like the old versions of the large files only exist on computers that checked out those copies. Does this mean old versions of a file might be lost forever if a single clone is deleted and temporarily unavailable if clones that contain those revisions of the file are offline?

Is there a way to ensure that a clone has all copies of all of the files (for example, when using git with a central trusted server)?

Comment by git-annex.branchable.com Mon Aug 30 19:02:09 2021
This is like a 1000ft overview, but doesn't actually say where the files are actually stored or how they're synchronized.

It does: "[...] That's a fancy way to say that git-annex stores the actual file content somewhere under .git/annex/. (See internals for details.)".
When using SHA256E hashing (the default), a file will end up for example under .git/annex/f87/4d5/SHA256E-s0--e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855.

Does one need to setup a samba, sftp, or AWS bucket to contain the large files?

No.

Does a clone of the repo pull down all of the large files, or just the files in the working directory that's checked out?

No and no. You decide yourself (via git-annex-get/git-annex-drop/git-annex-copy/git-annex-move) or automated (via git annex sync --content/git-annex-wanted/git-annex-preferred-content) what large files are stored where. If a file is not present you will just see a broken symlink.

Are files transferred via direct connection to other repos (ex the same SSH tunnel that git uses, http, etc) or is there a UDP p2p layer like syncthing or bittorrent that might struggle with certain NAT situations?

Yes and no. git-annex is very flexible, it can also communicate via tor.

The sentence "A file's content can be transferred from one repository to another by git-annex. Which repositories contain a given value is tracked by git-annex (see location tracking)." makes it sound like the old versions of the large files only exist on computers that checked out those copies. Does this mean old versions of a file might be lost forever if a single clone is deleted and temporarily unavailable if clones that contain those revisions of the file are offline?

Yes and yes. If you don't copy the file elsewhere (with the commands mentioned above) before deleting the repo, that version is lost.

Is there a way to ensure that a clone has all copies of all of the files (for example, when using git with a central trusted server)?

git annex get --all/git annex wanted server anything

I strongly suggest you to create a throwaway repository and try things out.

Comment by Lukey Mon Aug 30 19:02:09 2021

I'm using git-annex for backing up a variety of data, with several different remotes (including a USB drive for backup, rsync for encrypted cloud backup, etc). I have one particular use case that I am trying to figure out how to implement:

Pictures, Music, and Video are stored on a WD MyCloud under three corresponding folders. They are accessible via NFS, as well as via AFP; I specifically mount them on my laptop via NFS. I want to be able to access the files in the following ways:

  • Store new photos from other sources, including my laptop, using git-annex.
  • Have available the list of files on my laptop repository, in git-annex... i.e. like any repository.
  • Retain all the file contents on the WD MyCloud... i.e. like a special remote.
  • ONLY pull file contents to the laptop (or other repositories) if specifically requested (i.e. I don't want a sync to cause things to get pulled to the laptop).
  • When photos are pulled from the special remote, they go into ~/annex/photos (i.e. a different name than the folder on the remote).
  • I want to be able to access the files on the WD MyCloud "normally" from other computers, TVs, etc. I realize there's some 'danger' here. But I wanted to keep normal file trees for the special remote, not the hash-based filenames with symlinks.

So what I decided I needed was a directory special remote, with both the exporttree and importtree options. So I created the special remote using something like:

git annex initremote wdmycloud-pics directory=/net/wdmycloud/nfs/Public/Shared\ Pictures encryption=none exportree=yes importtree=yes

On my laptop's repository, I executed the following:

git annex import master:photos --from wdmycloud-pics
git merge wdmycloud-pics/master

Problem: - This imports the content from the special remote, into the laptop's repository. I don't want that.

I do realize that my main git-annex repository on the laptop probably has to actually download all of the files in order to compute their hashes. However, I do not want them to stay on the laptop, as there is nowhere's near enough space for all of them. I did try:

git-annex wanted wdmycloud-pics "exclude=photos/*"

However, this results in none of the files on the WD MyCloud actually getting processed.

What am I missing?

Thanks! -Rob

Comment by Rob Fri Aug 6 21:37:20 2021
Note that git-annex-import has a --no-content switch.
Comment by Lukey Fri Aug 6 21:37:20 2021

This is cool! I had to do the same thing earlier tonight, and ended up with this:

cat ~/path/to/playlist.m3u | sed -e "s/\ /\\\ /g" | xargs git annex copy --to remote

I thought I might stick that into a bash alias, or function.

Comment by datamanager Mon Jul 12 17:41:54 2021

If you are experiencing a problem using git-annex on Android, please examine the list of bugs and add a new, detailed bug report if no-one has reported the problem. If you are not sure if you have a bug, or need help in filing a good bug report, ask for help in the forum.

I have removeda lot of old comments about problems that may be fixed or not (hard to tell without a bug report!) This page cannot scale to handle every bug report that someone wants to paste into it.

Comment by http://joeyh.name/ Thu Jul 8 01:21:08 2021