Recent comments posted to this site:
cp
or mv
the files inside the repo and git annex add
them as usual.
Importing from special remotes necessarily needs to hold the list of files in memory, or at least it seems like it would be hard to get it to stream over them. So there may be some way to decrease the memory use per file (currently 4.2 kb per file according to your numbers), possibly by around 50%, but it would still scale with the number of files. The whole import interface would need to change to use trees to avoid that. It would be ok to file a bug report about this.
The legacy directory import interface avoids such problems.
@m15 this page is not a bug tracking system. File bug reports over at bugs.
I'm trying to following the instructions here, but keep running into the following problem whenever I try to clone the remote repo:
0$ git clone ssh://remote.example.com/~/tmp/archive/git
Cloning into 'git'... warning: You appear to have cloned an empty repository.
0$ cd git/
0$ git annex init local
init servo (scanning for unlocked files...)
Unable to parse git config from origin
Remote origin does not have git-annex installed; setting annex-ignore
This could be a problem with the git-annex installation on the remote. Please make sure that git-annex-shell is available in PATH when you ssh into the remote. Once you have fixed the git-annex installation, run: git annex enableremote origin
ok
(recording state in git...)
0$
What am I doing wrong? The remote repo is setup bare, exactly as suggested.
@jrollins Well, it looks like you have maybe not installed git-annex on your server. If it is installed and you still have this problem, you may need to consult get git-annex-shell into PATH.
After setting up git-annex in one of my submodules, I noticed that executing git checkout mybranch --recurse-submodules
will cause a fatal error (see error message below) and my working copy will be left in a state somewhere in between the origin and the destination branch.
As a workaround, this two-step alternative seems to work fine though: git checkout mybranch && git submodule update
.
Everything above applies to command git switch
as well.
I use git version 2.27.0 and git-annex version 8.20200618
Error Message: fatal: could not open 'path/to/my/submodule/.git' for writing: Is a directory
ENV:
macOS 10.14.6, installed by 'brew install git-annex'
git annex version
git-annex version: 8.20201129
build flags: Assistant Webapp Pairing FsEvents TorrentParser MagicMime Feeds Testsuite S3 WebDAV
dependency versions: aws-0.22 bloomfilter-2.0.1.0 cryptonite-0.27 DAV-1.3.4 feed-1.3.0.1 ghc-8.10.3 http-client-0.7.3 persistent-sqlite-2.11.0.0 torrent-10000.1.1 uuid-1.3.13 yesod-1.6.1.0
key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL X*
remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso borg hook external
operating system: darwin x86_64
supported repository versions: 8
upgrade supported from repository versions: 0 1 2 3 4 5 6 7
STEPS:
File copied to git (git-annex) repo's dir (did not 'git add' 'git annex add')
name: 'f.mp4'
Now run 'git annex addurl' (via Python, see below)
RESULT: (same if run in bash)
File "/opt/anaconda3/lib/python3.7/subprocess.py", line 512, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['git', 'annex', 'addurl', '--file=f.mp4', '--raw', '--relaxed', 'https://www.youtube.com/watch?v=U33dsEcKgeQ']' returned non-zero exit status 1.
The command works after doing 'git annex add f.mp4' first
but it results in a backend not 'URL backend for youtube'
I'd like to use 'URL backend for youtube' cause I worry about youtube-video binay-change, in which case all future download will fail backend verification.
NOTE:
command line taken from https://git-annex.branchable.com/tips/using_the_web_as_a_special_remote/
A faster way of doing uninit is the following:
cp --no-clobber --dereference --recursive --preserve=all --reflink=auto --verbose ./git_annex_repo/your_symlinks/ ./target_dir/
This will simply copy (thin COW copy) symlinks (dereferenced) as normal files preserving the mtime, etc. the resulting ./target_dir/ will have your files if they existed in this annex or broken symlinks if the files were not here.
Does annex.largefiles has some documentation? It would be nice to link to that on the doc of git-annex-add.
Esp, after reading this, I wonder about the default value of annex.largefiles. (I assume/hope it is disabled?)
First of all, git annex is an awesome tool, I like it very much!
When trying to
git annex import
from a special directory remote with a large number of files (~4 millions) with a cumulative size of about 1TB, git annex takes up all main memory during the final update remote/ref step on a machine with 16G of main memory and is then killed by the system. This also happens when supplying the--no-content
option. Is there a way to make git annex less memory demanding when importing from a special directory remote with a large number of files?