Recent comments posted to this site:

Could you git-annex-untrust the laptop repo, do a git-annex-sync, then git-annex-fsck to check that the files have enough trusted copies (as set in your numcopies setting)?
Comment by Ilya_Shlyakhter Fri Aug 19 21:42:48 2022

@Dan findref never supported listing all keys either.

Yours is the best argument I've seen so far for wanting find --all. But the fact that this command is about listing files, not keys, still makes that seem out of scope for it.

Using whereis would certainly do what you want. Another option would be to untrust the repository that you are going to be deleting, and then run fsck --all. Although that would report potentially other problems besides files that are only present in that repository.

Finally, there's the bare metal option, which is also the fastest: find .git/annex/objects -type f

Comment by joey Fri Aug 19 21:42:48 2022

Thanks Joey and Ilya for the nigh simultaneous untrust then fsck suggestion. I was able to get everything squared away using the whereis approach as a sort of poor man's dry run, then running the copy command I described, and then using my whereis again to convince myself that nothing was left behind, although I imagine I'm not the first one to be retiring a repo and so hopefully these comments will be of use to future users.

While my particular problem is solved, I just wanted to add some additional input RE adding --all to find. I appreciate that Joey thinks that find is about "listing files, not keys" (and if anyone's opinion here is authoritative, it should be his), but this was not my expectation as a user (although I would agree it was called findfiles or something like that), so I just wanted to share my experience trying to accomplish this task.

Given what I understand of git-annex, my instinct was to reach for a command that would let me use the powerful matching options against all known keys, so I looked over the list of commands to try to identify something that would do this. Right away, find leapt out as the natural candidate, but I couldn't get it to work how I wanted, so the next obvious choice was list, but that also didn't work. It was only when I looked at the this wiki page for find and saw discussion of adding support for --all that I started searching for commands that did accept --all, and I stumbled upon whereis, but this required a fair deal of detective work on my part.

FWIW, whereis is, IMHO, just as much about listing files at particular paths as find is (the documentation for both describes the argument as [path ...]; it only typically talks about keys when --all is passed, and so whereis taking --all when find does not seems unbalanced given that whereis seems like a tool that would be built on top of find. I think there's a similar asymmetry with list since it's described as being "similar to git annex whereis but a more compact display."

Now that the --all genie is somewhat out of the bottle it might be too late for this, but I wonder if a findkeys command would help fill this need while obviating the need for --all being passed to most other commands. It would be unequivocally about finding keys and not files, and its output could be say a list of keys delimited by newlines (or perhaps optionally null's to make it play nice with commands that accept -z). If the user wanted to know more about the keys that matched their query, the output of this command could then be piped to whereis, examinekey, and other commands that support the --batch and/or -z option. Of course, instead of defining a new command, this functionality could be absorbed into find --all.

I realize that I can accomplish precisely what I describe above with e.g., git annex whereis --all --format='${key}\n', which is great now that I know it's possible under whereis, but as a new user I would expect to find this functionality in find (which helpfully already supports --format=) before I thought to check whereis.

Comment by Dan Fri Aug 19 21:42:48 2022

The reason I think that git-annex find is limited to operating on files is because it is analagous to the find command. It would violate least surprise to some extent for it to operate on keys. git-annex whereis has no such expectation.

Comment by joey Fri Aug 19 21:42:48 2022

Ah, I hadn't considered the parallel to the standard find command, but now that you mention that I understand where you're coming from and can appreciate why whereis is free of this association. Still, I would think that a user who, after looking at the docs for git annex find, specified --all because they wanted to operate on keys would not be surprised.

I notice that the man page for git annex find already has a "SEE ALSO" reference to git annex whereis. Could this be expanded so that it more clearly and prominently advises the reader who is looking to query against all known keys to check out the --all argument to git annex whereis as well as its --format= option if "whereis" information is not actually of interest?

Comment by Dan Fri Aug 19 21:42:48 2022
How about a git annex find --keys option? That way it's crystal clear you're searching among keys rather than files.
Comment by Atemu Fri Aug 19 21:42:48 2022

turns out that if you get into the habit of writing rsync remotes like rclone remotes, git annex breaks

I had a SSH config entry with the name annex and set the rsync URL to annex:. The works for moving content in, but fsck and get will fail and undo the content tracking.

Comment by aurelia Fri Aug 19 21:42:48 2022

Is it possible to edit comments on the branchable wiki? I realized there was a sentence I failed to finish when posting this comment and I'd love to go back and finish the thought. The "Edit" button at the top of the page lets me edit the content of the page, but not any of the comments.

I tried cloning the wiki, editing the file corresponding to my comment, and then pushing, but the push was rejected (the changes were in doc tree so I expected it to be accepted, but perhaps comments are more locked down).

Update: It seems I am able to edit comments by cloning the wiki, editing (comments are located in a folder with the same name as the associated page), committing, and then pushing. Yay!

Comment by Dan Fri Aug 19 21:42:48 2022

I'm preparing to recycle an aging laptop that has a few git-annex repos on it. I'd like to confirm that anything in its annex(es) exist in at least one other place and want to confirm what I'm doing to check this makes sense.

At first glance, it seems like the appropriate way to do this is with git annex find --in here --not --copies=2 (where the latter predicate should be equal to testing for copies strictly less than 2). Since I have recently git annex sync-ed, this doesn't turn up anything.

However, if I understand everything correctly, this only checks files that are reachable from my current working tree. Thus, if there are a bunch of files in my (not currently checked out) dev branch that are not in my worktree, then this query will not discover them. I can get them to be considered with git annex find --in here --not --copies=2 --branch dev. I could manually (or via script) loop over all of my branches and repeat git annex find --in here --not --copies=2 --branch ${branch} to check all of my branches. However, this will only check the tips. Suppose there's a file that previously existed (solely) in my master branch, but at some point it was git rm-ed. Then unless I specify using --branch a TREEISH that has that file, it will not be considered.

So, I need to use some sort of query tool that supports both the --all flag as well as all of the matching options. The only thing I was able to find was whereis, so I can run git annex whereis --all --in here --not --copies=2 in order to identify keys corresponding to files that are (a) locally available but where (b) the number of copies is not 2 or greater (i.e., it is here and only here). I suppose I could also just plunge ahead with git annex copy --to ${remote} --all --in here --not --copies=2, but it's reassuring to be able to run the query and see what would need to get moved (as well as to see the query come back empty before I wipe the hard drive).

Is this an appropriate use of git annex whereis, or is there a way that I can use git annex find to accomplish this, or perhaps some other query tool? In essence, I just want a way of querying "all" of the objects that git annex has ever known about using all of the standard matching options. I see discussion above regarding the lack of --all support for git annex find, which at the time suggested using findref instead but it seems like that has been deprecated in favor of find.

Comment by Dan Fri Aug 19 21:42:48 2022

If you add a file to your repo first via addurl --fast, it writes the filename as a symlink to a file that incorporates the URL, rather than the file hash. This is expected, since git-annex can't know the file hash until it's actually downloaded the file.

If you then git annex get that file, it downloads the file to the path that uses the URL. Is the hash ever recorded for these files? If you were to drop and re-download the file, would git-annex accept a different file?

Comment by tomdhunt Wed Aug 17 20:00:48 2022