Recent comments posted to this site:

importtree=yes remotes are always untrusted. The reason is that something else is assumed to be writing to those remotes, which is what populates them with files. And that could delete or change any file at any time. So if git-annex didn't untrust the remote, and relied on it to hold the only copy of a file, such a change would cause data loss.

There would need to be a new config setting to add the concept of guaranteed readonly importtree=yes remotes.

git-annex does not allow --numcopies to be set to 0 as that can cause data loss.

Comment by joey Wed Oct 23 14:50:20 2024

This is plausible. git-annex requires that special remotes only show a file as present after a successful upload. If the data store doesn't work that way, the file needs to be uploaded to a temporary name and renamed atomically instead. If that's not possible, the data store is not safe for use by git-annex.

Given all the different types data stores supported by rclone, this may be difficult, but it's the right thing for the external special remote to do. I think you should file a bug.

(Does rclone gitannex also have this problem?)

Comment by joey Wed Oct 23 14:50:20 2024
Tuning is not experimental for some time, I've removed the warnings.
Comment by joey Wed Oct 23 14:50:20 2024

Recently I tested the export command adding the --from parameter and it was not accepted.

git-annex version: 10.20240701

Comment by pedro-lopes-de-azevedo Sun Oct 13 02:59:02 2024

To access the manifest and bundles, one needs the UUID of the special remote initially configured. Then one can run

[[!format sh """ git clone 'annex::?type=directory&encryption=none&directory=/path/to/space%20sanitized%20directory' """]]

A bit tedious for both the need to type all settings (even those not shown by the remote helper when doing the push operations from the initial repo, in this case the directory, in other cases all required settings to init the remote in the first place) and for having to HTML sanitize any URL disallowed characters. But doable

The other option would be to manually clone by initializing the new empty repo, then adding the special remote the normal git annex way. This doesn't work right just yet because --uuid is not an allowed option for initremote. It would be nice if this were an option simply to avoid the tedium of typing the URL as above (one could copy and paste git --no-pager show git-annex:remote.log into initremote)

Despite the URL tedium, an exciting result of the current system is that any number of repos and file annexes can share one directory! Like an entire organization (or repo group) in one folder. Datalad has a similar archetype (remote indexed archives) which offer (slightly) improved user friendliness by filing each repo UUID into meaningfully-named folders (unhashed first three/remaining is nice for being actually the UUID but it still doesn't let me easily copy/paste the UUID for cloning). Although I kind of like how git-annex's implementation encourages a single unified "annex" (rather than RIA's UUID/Annexwhich gives each UUID a separate annex) and of course bundles over loose git files, especially for cloud special remotes which can be slow to upload each and every loose file.

Looking forward to seeing how this feature develops!

Comment by Spencer Sun Oct 13 02:59:02 2024

I have old readonly backup media, say something like

  • tapeA1/apples.txt
  • tapeA2/apples.txt
  • tapeB1/earth.svg
  • tapeB2/earth.svg

I use git-annex special directory remotes to be able to navigate the directory tree that lives on those media (e.g. to decide if and which media I need to find to copy a file from that I need). I added the remotes like so (they are too big to import with content):

git annex initremote tapeA1 type=directory directory=/tapes/tapeA1 encryption=none importtree=yes
git annex import master:tapeA1 --from tapeA1 --no-content
git annex merge --allow-unrelated-histories tapeA1/main

At some point I may buy new hardware and recreate those backup media as proper git-annex remotes, but wouldn't it be great to keep the existing backups as long as they show no sign of bitrot and together hold enough copies?

Though, git-annex fsck behaves unexpected: It seems I cannot force trust these remotes nor does --numcopies=0 --mincopies=0 have the desired effect.

Concretely, when calling git annex fsck --from=tapeA1 --numcopies=0 --mincopies=0 --trust=tapeA1 --force, for every file that is still intact on tapeA1, git-annex fsck reports a failure as follows

fsck tapeA1/apples.txt
  Only these untrusted locations may have copies of tapeA1/apples.txt
        abc-def-ghi -- [tapeA1]
  Back it up to trusted locations with git-annex copy.
failed

while I'd be happy to (semi)trust tapeA1 or to accept no copies whatsoever. So fsck ignores --trust=tapeA1 --force and/or --numcopies=0 --mincopies=0 which are common git-annex options that should work for fsck?

Ideally, I would be able to (semi)trust my readonly tape remotes (which likely should be behind a --force as it may lead to data loss in classical directory remote settings). Then I can use git-annex to index those tapes, but also to monitor their health via fsck (so I can over the years replace the tapes that are showing signs of corruption).

As for the corruption, I emulated bitrot on a test directory remote, which then leads to a fsck failure as follows:

fsck tapeB2/earth.svg
  verification of content failed
(checksum...) 
  tapeB2/earth.svg: Bad file content; failed to drop fromtapeB2: dropping content from this remote is not supported because it is configured with importtree=yes

This suffices to detect tapes that should be replaced, and it's kinda expected that files cannot be dropped.

Somehow fsck does not work as I would expect -- am I misunderstanding the numcopies/mincopies arguments here? Is there really no way to force-trust a directory remote, which to me seems appropriate in this case? Is there another way to achieve what I have in mind with git-annex?

Thanks for this great piece of software – also use the assistant in another day-to-day usecase and it's simply great!

Comment by tapesafer Sun Oct 13 02:59:02 2024

I've cloned a git repository through ssh from a server which I don't have root privileges. The clone command is something like:

git clone ssh://johndoe@somedomain.com:23/home/johndoe/Downloads/gitannextest4/ 

I tried to enable the remote and I get the error: Remote gitannextest4 does not have git-annex installed; setting annex-ignore. I had no success following the steps here.

I believe there is an error in the last of the alternatives presented here:

git config remote.annoyingserver.annex-shell /home/me/bin/git-annex-shell (does not work)
git config remote.annoyingserver.git-annex-shell /home/me/bin/git-annex-shell (works!)

So, annex-shell should be replaced by git-annex-shell.

Hope it helps.

Comment by gauss Sun Oct 13 02:59:02 2024

Is there any way to set a default preferred content setting -- either used when a new clone is made or whenever a repo doesn't specify one?

I've got an annex that has a couple servers with all the content, and several clients[1] -- which I create more often and more manually -- that just want the content I pick. Basically every time I set up another client, I run git annex sync --content, am surprised to see a bunch of get ... lines, go kill the sync, set group and preferred content to be manual/standard, and run the sync again. It'd be handy if I could set up the repo in advance to just configure that by default. (I guess I could make an alias that does like git clone $server/$repo && cd $repo && git annex wanted . standard && git annex group . manual, but it'd be nice if I could just do the git clone I'm used to and it would all work.)

[1] AIUI, the "client" group means "get every file referenced in HEAD, unless it's in archive/, and skip older versions"? I guess that makes sense for like a software project with some media assets. I've mostly used git-annex for situations where most files aren't being actively worked with and clients only have a few of them, which is where it seems to really shine over GitLFS. I've always been vaguely surprised by how the client group works as a result. Any sense of how commonly people use it for different use cases? It is excellent for the sparse checkout case though.

Comment by adehnert Sun Oct 13 02:59:02 2024
@adehnert: Setting default preferred content expressions is an open todo and Joey acknowledges that it's useful, but he didn't implement it yet. You could voice your motivation for this feature over in that todo, to keep everything sorted. I'm fully with you that this is very much needed and I always fall into the trap of running git annex assist directly after a git clone, wondering why I'm getting a million files shoved into my face, CTRL+C'ing it, being left with a weird unclean work tree for the download-aborted unlocked files, so I have to git restore . again, then configuring git annex wanted present before I continue.
Comment by nobodyinperson Sun Oct 13 02:59:02 2024
PS
If I am understanding the documentation of the borg special remote, then having something like appendonly=yes for the special directory remote would likely help in my scenario.
Comment by tapesafer Sun Oct 13 02:59:02 2024