Recent comments posted to this site:
Perhaps Joey can help me out here a bit with some background knowledge:
I've been seeing sporadic corruption with this setup:
- chunking
- encryption
- old helper program git-annex-remote-rclone
- rclone's pcloud backend
As it seems, rclone keeps partial files under the name of the full file when a transfer is interrupted, for the pcloud backend. (This is for rclone <= 1.67.0; 1.68.0 has changes for pcloud, which may fix this.) My theory how the corruption might have happened:
- First interrupted run of git-annex uploads chunks A and a partial(!) chunk B
- Second run skips chunks A and B(!); and proceedsto upload the rest of the chunks (C and D)
- At the end we have uploaded A, C and D and a corrupted/partial chunk B
Joey: Is this a possible error scenario?
git annex assist
directly after a git clone
, wondering why I'm getting a million files shoved into my face, CTRL+C'ing it, being left with a weird unclean work tree for the download-aborted unlocked files, so I have to git restore .
again, then configuring git annex wanted present
before I continue.
Is there any way to set a default preferred content setting -- either used when a new clone is made or whenever a repo doesn't specify one?
I've got an annex that has a couple servers with all the content, and several clients[1] -- which I create more often and more manually -- that just want the content I pick. Basically every time I set up another client, I run git annex sync --content
, am surprised to see a bunch of get ...
lines, go kill the sync, set group and preferred content to be manual/standard, and run the sync again. It'd be handy if I could set up the repo in advance to just configure that by default. (I guess I could make an alias that does like git clone $server/$repo && cd $repo && git annex wanted . standard && git annex group . manual
, but it'd be nice if I could just do the git clone
I'm used to and it would all work.)
[1] AIUI, the "client" group means "get every file referenced in HEAD, unless it's in archive/, and skip older versions"? I guess that makes sense for like a software project with some media assets. I've mostly used git-annex for situations where most files aren't being actively worked with and clients only have a few of them, which is where it seems to really shine over GitLFS. I've always been vaguely surprised by how the client group works as a result. Any sense of how commonly people use it for different use cases? It is excellent for the sparse checkout case though.
Here are a few pointers for switching from git-annex-remote-rclone
(old helper program) to rclone gitannex
(rclone's builtin support):
- Figure out
rcloneprefix
(directory relative to the rclone remote (rclone term here)) andrclonelayout
(layout of the git-annex content therein). If you set it up just like ingit-annex-remote-rclone
's README, those aregit-annex
andlower
. - Update rclone and git-annex
- Rename the old remote,
git remote rename my_rclone_remote my_rclone_remote.old; git annex renameremote my_rclone_remote my_rclone_remote.old
- Create a new remote, copying the encryption settings:
git annex initremote my_rclone_remote --sameas=my_rclone_remote.old type=rclone rcloneremotename=my_rclone_remote rcloneprefix=git-annex rclonelayout=lower
It might be possible to just change the type of the remote but at the time I'm writing this, that didn't work so I renamed the old remote and created a new one, with --sameas
to not lose any encryption settings.
appendonly=yes
for the special directory remote would likely help in my scenario.
I have old readonly backup media, say something like
tapeA1/apples.txt
tapeA2/apples.txt
tapeB1/earth.svg
tapeB2/earth.svg
I use git-annex special directory remotes to be able to navigate the directory tree that lives on those media (e.g. to decide if and which media I need to find to copy a file from that I need). I added the remotes like so (they are too big to import with content):
git annex initremote tapeA1 type=directory directory=/tapes/tapeA1 encryption=none importtree=yes
git annex import master:tapeA1 --from tapeA1 --no-content
git annex merge --allow-unrelated-histories tapeA1/main
At some point I may buy new hardware and recreate those backup media as proper git-annex remotes, but wouldn't it be great to keep the existing backups as long as they show no sign of bitrot and together hold enough copies?
Though, git-annex fsck behaves unexpected: It seems I cannot force trust these remotes nor does --numcopies=0 --mincopies=0
have the desired effect.
Concretely, when calling git annex fsck --from=tapeA1 --numcopies=0 --mincopies=0 --trust=tapeA1 --force
,
for every file that is still intact on tapeA1, git-annex fsck reports a failure as follows
fsck tapeA1/apples.txt
Only these untrusted locations may have copies of tapeA1/apples.txt
abc-def-ghi -- [tapeA1]
Back it up to trusted locations with git-annex copy.
failed
while I'd be happy to (semi)trust tapeA1 or to accept no copies whatsoever. So fsck ignores --trust=tapeA1 --force
and/or --numcopies=0 --mincopies=0
which are common git-annex options that should work for fsck?
Ideally, I would be able to (semi)trust my readonly tape remotes (which likely should be behind a --force
as it may lead to data loss in classical directory remote settings). Then I can use git-annex to index those tapes, but also to monitor their health via fsck (so I can over the years replace the tapes that are showing signs of corruption).
As for the corruption, I emulated bitrot on a test directory remote, which then leads to a fsck failure as follows:
fsck tapeB2/earth.svg
verification of content failed
(checksum...)
tapeB2/earth.svg: Bad file content; failed to drop fromtapeB2: dropping content from this remote is not supported because it is configured with importtree=yes
This suffices to detect tapes that should be replaced, and it's kinda expected that files cannot be dropped.
Somehow fsck does not work as I would expect -- am I misunderstanding the numcopies/mincopies arguments here? Is there really no way to force-trust a directory remote, which to me seems appropriate in this case? Is there another way to achieve what I have in mind with git-annex?
Thanks for this great piece of software – also use the assistant in another day-to-day usecase and it's simply great!
I've cloned a git repository through ssh from a server which I don't have root privileges. The clone command is something like:
git clone ssh://johndoe@somedomain.com:23/home/johndoe/Downloads/gitannextest4/
I tried to enable the remote and I get the error: Remote gitannextest4 does not have git-annex installed; setting annex-ignore. I had no success following the steps here.
I believe there is an error in the last of the alternatives presented here:
git config remote.annoyingserver.annex-shell /home/me/bin/git-annex-shell (does not work)
git config remote.annoyingserver.git-annex-shell /home/me/bin/git-annex-shell (works!)
So, annex-shell should be replaced by git-annex-shell.
Hope it helps.