Recent comments posted to this site:

First of all, git annex is an awesome tool, I like it very much!

When trying to git annex import from a special directory remote with a large number of files (~4 millions) with a cumulative size of about 1TB, git annex takes up all main memory during the final update remote/ref step on a machine with 16G of main memory and is then killed by the system. This also happens when supplying the --no-content option. Is there a way to make git annex less memory demanding when importing from a special directory remote with a large number of files?

Comment by georg.schnabel Thu Feb 25 16:09:08 2021
Yes, just cp or mv the files inside the repo and git annex add them as usual.
Comment by Lukey Thu Feb 25 16:09:08 2021

Importing from special remotes necessarily needs to hold the list of files in memory, or at least it seems like it would be hard to get it to stream over them. So there may be some way to decrease the memory use per file (currently 4.2 kb per file according to your numbers), possibly by around 50%, but it would still scale with the number of files. The whole import interface would need to change to use trees to avoid that. It would be ok to file a bug report about this.

The legacy directory import interface avoids such problems.

Comment by joey Thu Feb 25 16:09:08 2021

@m15 this page is not a bug tracking system. File bug reports over at bugs.

Comment by joey Tue Feb 2 00:23:29 2021

I'm trying to following the instructions here, but keep running into the following problem whenever I try to clone the remote repo:

0$ git clone ssh://remote.example.com/~/tmp/archive/git
Cloning into 'git'... warning: You appear to have cloned an empty repository.
0$ cd git/
0$ git annex init local
init servo (scanning for unlocked files...)

  Unable to parse git config from origin

  Remote origin does not have git-annex installed; setting annex-ignore

  This could be a problem with the git-annex installation on the remote. Please make sure that git-annex-shell is available in PATH when you ssh into the remote. Once you have fixed the git-annex installation, run: git annex enableremote origin
ok
(recording state in git...)
0$ 

What am I doing wrong? The remote repo is setup bare, exactly as suggested.

Comment by jrollins Tue Feb 2 00:23:29 2021

@jrollins Well, it looks like you have maybe not installed git-annex on your server. If it is installed and you still have this problem, you may need to consult get git-annex-shell into PATH.

Comment by joey Tue Feb 2 00:23:29 2021

After setting up git-annex in one of my submodules, I noticed that executing git checkout mybranch --recurse-submodules will cause a fatal error (see error message below) and my working copy will be left in a state somewhere in between the origin and the destination branch.
As a workaround, this two-step alternative seems to work fine though: git checkout mybranch && git submodule update.
Everything above applies to command git switch as well.
I use git version 2.27.0 and git-annex version 8.20200618

Error Message: fatal: could not open 'path/to/my/submodule/.git' for writing: Is a directory

Comment by aschoise Mon Jan 25 17:03:59 2021
ENV:
macOS 10.14.6, installed by 'brew install git-annex'
git annex version
git-annex version: 8.20201129
build flags: Assistant Webapp Pairing FsEvents TorrentParser MagicMime Feeds Testsuite S3 WebDAV
dependency versions: aws-0.22 bloomfilter-2.0.1.0 cryptonite-0.27 DAV-1.3.4 feed-1.3.0.1 ghc-8.10.3 http-client-0.7.3 persistent-sqlite-2.11.0.0 torrent-10000.1.1 uuid-1.3.13 yesod-1.6.1.0
key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL X*
remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso borg hook external
operating system: darwin x86_64
supported repository versions: 8
upgrade supported from repository versions: 0 1 2 3 4 5 6 7


STEPS:
File copied to git (git-annex) repo's dir  (did not 'git add' 'git annex add')
name: 'f.mp4'
Now run 'git annex addurl' (via Python, see below)

RESULT: (same if run in bash)
File "/opt/anaconda3/lib/python3.7/subprocess.py", line 512, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['git', 'annex', 'addurl', '--file=f.mp4', '--raw', '--relaxed', 'https://www.youtube.com/watch?v=U33dsEcKgeQ']' returned non-zero exit status 1.

The command works after doing 'git annex add f.mp4' first
  but it results in a backend not 'URL backend for youtube'
  I'd like to use 'URL backend for youtube' cause I worry about youtube-video binay-change, in which case all future download will fail backend verification.


NOTE:
command line taken from https://git-annex.branchable.com/tips/using_the_web_as_a_special_remote/
Comment by m15 Mon Jan 25 17:03:59 2021

A faster way of doing uninit is the following:

cp --no-clobber --dereference --recursive --preserve=all --reflink=auto --verbose ./git_annex_repo/your_symlinks/ ./target_dir/

This will simply copy (thin COW copy) symlinks (dereferenced) as normal files preserving the mtime, etc. the resulting ./target_dir/ will have your files if they existed in this annex or broken symlinks if the files were not here.

Comment by eric.w Mon Jan 18 15:40:20 2021

Does annex.largefiles has some documentation? It would be nice to link to that on the doc of git-annex-add.

Esp, after reading this, I wonder about the default value of annex.largefiles. (I assume/hope it is disabled?)

Comment by AlbertZeyer Tue Jan 12 14:19:58 2021