Recent comments posted to this site:

comment 17 360caa8972c2daa94044cc95188306e9
[[!comment Error: unsupported page format sh]]
Tue Oct 1 00:46:39 2024
comment 23 70dcb7e7ffdd14351adaf4c40ee7fdd0
[[!comment Error: unsupported page format hs]]
Tue Oct 1 00:46:39 2024
comment 3 e6ce9bb92c973350852c9498b7ffb50f
[[!comment Error: unsupported page format sh]]
Tue Oct 1 00:46:39 2024

Perhaps Joey can help me out here a bit with some background knowledge:

I've been seeing sporadic corruption with this setup:

  • chunking
  • encryption
  • old helper program git-annex-remote-rclone
  • rclone's pcloud backend

As it seems, rclone keeps partial files under the name of the full file when a transfer is interrupted, for the pcloud backend. (This is for rclone <= 1.67.0; 1.68.0 has changes for pcloud, which may fix this.) My theory how the corruption might have happened:

  • First interrupted run of git-annex uploads chunks A and a partial(!) chunk B
  • Second run skips chunks A and B(!); and proceedsto upload the rest of the chunks (C and D)
  • At the end we have uploaded A, C and D and a corrupted/partial chunk B

Joey: Is this a possible error scenario?

Comment by mike Fri Sep 27 12:18:41 2024
@adehnert: Setting default preferred content expressions is an open todo and Joey acknowledges that it's useful, but he didn't implement it yet. You could voice your motivation for this feature over in that todo, to keep everything sorted. I'm fully with you that this is very much needed and I always fall into the trap of running git annex assist directly after a git clone, wondering why I'm getting a million files shoved into my face, CTRL+C'ing it, being left with a weird unclean work tree for the download-aborted unlocked files, so I have to git restore . again, then configuring git annex wanted present before I continue.
Comment by nobodyinperson Wed Sep 25 09:25:42 2024

Is there any way to set a default preferred content setting -- either used when a new clone is made or whenever a repo doesn't specify one?

I've got an annex that has a couple servers with all the content, and several clients[1] -- which I create more often and more manually -- that just want the content I pick. Basically every time I set up another client, I run git annex sync --content, am surprised to see a bunch of get ... lines, go kill the sync, set group and preferred content to be manual/standard, and run the sync again. It'd be handy if I could set up the repo in advance to just configure that by default. (I guess I could make an alias that does like git clone $server/$repo && cd $repo && git annex wanted . standard && git annex group . manual, but it'd be nice if I could just do the git clone I'm used to and it would all work.)

[1] AIUI, the "client" group means "get every file referenced in HEAD, unless it's in archive/, and skip older versions"? I guess that makes sense for like a software project with some media assets. I've mostly used git-annex for situations where most files aren't being actively worked with and clients only have a few of them, which is where it seems to really shine over GitLFS. I've always been vaguely surprised by how the client group works as a result. Any sense of how commonly people use it for different use cases? It is excellent for the sparse checkout case though.

Comment by adehnert Tue Sep 24 00:02:20 2024

Here are a few pointers for switching from git-annex-remote-rclone (old helper program) to rclone gitannex (rclone's builtin support):

  1. Figure out rcloneprefix (directory relative to the rclone remote (rclone term here)) and rclonelayout (layout of the git-annex content therein). If you set it up just like in git-annex-remote-rclone's README, those are git-annex and lower.
  2. Update rclone and git-annex
  3. Rename the old remote, git remote rename my_rclone_remote my_rclone_remote.old; git annex renameremote my_rclone_remote my_rclone_remote.old
  4. Create a new remote, copying the encryption settings: git annex initremote my_rclone_remote --sameas=my_rclone_remote.old type=rclone rcloneremotename=my_rclone_remote rcloneprefix=git-annex rclonelayout=lower

It might be possible to just change the type of the remote but at the time I'm writing this, that didn't work so I renamed the old remote and created a new one, with --sameas to not lose any encryption settings.

Comment by mike Thu Sep 12 15:40:24 2024
PS
If I am understanding the documentation of the borg special remote, then having something like appendonly=yes for the special directory remote would likely help in my scenario.
Comment by tapesafer Wed Sep 4 15:48:01 2024

I have old readonly backup media, say something like

  • tapeA1/apples.txt
  • tapeA2/apples.txt
  • tapeB1/earth.svg
  • tapeB2/earth.svg

I use git-annex special directory remotes to be able to navigate the directory tree that lives on those media (e.g. to decide if and which media I need to find to copy a file from that I need). I added the remotes like so (they are too big to import with content):

git annex initremote tapeA1 type=directory directory=/tapes/tapeA1 encryption=none importtree=yes
git annex import master:tapeA1 --from tapeA1 --no-content
git annex merge --allow-unrelated-histories tapeA1/main

At some point I may buy new hardware and recreate those backup media as proper git-annex remotes, but wouldn't it be great to keep the existing backups as long as they show no sign of bitrot and together hold enough copies?

Though, git-annex fsck behaves unexpected: It seems I cannot force trust these remotes nor does --numcopies=0 --mincopies=0 have the desired effect.

Concretely, when calling git annex fsck --from=tapeA1 --numcopies=0 --mincopies=0 --trust=tapeA1 --force, for every file that is still intact on tapeA1, git-annex fsck reports a failure as follows

fsck tapeA1/apples.txt
  Only these untrusted locations may have copies of tapeA1/apples.txt
        abc-def-ghi -- [tapeA1]
  Back it up to trusted locations with git-annex copy.
failed

while I'd be happy to (semi)trust tapeA1 or to accept no copies whatsoever. So fsck ignores --trust=tapeA1 --force and/or --numcopies=0 --mincopies=0 which are common git-annex options that should work for fsck?

Ideally, I would be able to (semi)trust my readonly tape remotes (which likely should be behind a --force as it may lead to data loss in classical directory remote settings). Then I can use git-annex to index those tapes, but also to monitor their health via fsck (so I can over the years replace the tapes that are showing signs of corruption).

As for the corruption, I emulated bitrot on a test directory remote, which then leads to a fsck failure as follows:

fsck tapeB2/earth.svg
  verification of content failed
(checksum...) 
  tapeB2/earth.svg: Bad file content; failed to drop fromtapeB2: dropping content from this remote is not supported because it is configured with importtree=yes

This suffices to detect tapes that should be replaced, and it's kinda expected that files cannot be dropped.

Somehow fsck does not work as I would expect -- am I misunderstanding the numcopies/mincopies arguments here? Is there really no way to force-trust a directory remote, which to me seems appropriate in this case? Is there another way to achieve what I have in mind with git-annex?

Thanks for this great piece of software – also use the assistant in another day-to-day usecase and it's simply great!

Comment by tapesafer Wed Sep 4 14:50:16 2024

I've cloned a git repository through ssh from a server which I don't have root privileges. The clone command is something like:

git clone ssh://johndoe@somedomain.com:23/home/johndoe/Downloads/gitannextest4/ 

I tried to enable the remote and I get the error: Remote gitannextest4 does not have git-annex installed; setting annex-ignore. I had no success following the steps here.

I believe there is an error in the last of the alternatives presented here:

git config remote.annoyingserver.annex-shell /home/me/bin/git-annex-shell (does not work)
git config remote.annoyingserver.git-annex-shell /home/me/bin/git-annex-shell (works!)

So, annex-shell should be replaced by git-annex-shell.

Hope it helps.

Comment by gauss Fri Aug 23 01:51:49 2024