Recent comments posted to this site:
@Dan findref
never supported listing all keys either.
Yours is the best argument I've seen so far for wanting find --all
.
But the fact that this command is about listing files, not keys, still
makes that seem out of scope for it.
Using whereis
would certainly do what you want. Another option would
be to untrust
the repository that you are going to be deleting, and then
run fsck --all
. Although that would report potentially other problems
besides files that are only present in that repository.
Finally, there's the bare metal option, which is also the fastest:
find .git/annex/objects -type f
Thanks Joey and Ilya for the nigh simultaneous untrust then fsck suggestion.
I was able to get everything squared away using the whereis
approach as a sort of poor man's dry run, then running the copy command I described, and then using my whereis
again to convince myself that nothing was left behind, although I imagine I'm not the first one to be retiring a repo and so hopefully these comments will be of use to future users.
While my particular problem is solved, I just wanted to add some additional input RE adding --all
to find.
I appreciate that Joey thinks that find
is about "listing files, not keys" (and if anyone's opinion here is authoritative, it should be his), but this was not my expectation as a user (although I would agree it was called findfiles
or something like that), so I just wanted to share my experience trying to accomplish this task.
Given what I understand of git-annex
, my instinct was to reach for a command that would let me use the powerful matching options against all known keys, so I looked over the list of commands to try to identify something that would do this.
Right away, find
leapt out as the natural candidate, but I couldn't get it to work how I wanted, so the next obvious choice was list
, but that also didn't work.
It was only when I looked at the this wiki page for find
and saw discussion of adding support for --all
that I started searching for commands that did accept --all
, and I stumbled upon whereis
, but this required a fair deal of detective work on my part.
FWIW, whereis
is, IMHO, just as much about listing files at particular paths as find
is (the documentation for both describes the argument as [path ...]
; it only typically talks about keys when --all
is passed, and so whereis
taking --all
when find
does not seems unbalanced given that whereis
seems like a tool that would be built on top of find
.
I think there's a similar asymmetry with list
since it's described as being "similar to git annex whereis
but a more compact display."
Now that the --all
genie is somewhat out of the bottle it might be too late for this, but I wonder if a findkeys
command would help fill this need while obviating the need for --all
being passed to most other commands.
It would be unequivocally about finding keys and not files, and its output could be say a list of keys delimited by newlines (or perhaps optionally null's to make it play nice with commands that accept -z
).
If the user wanted to know more about the keys that matched their query, the output of this command could then be piped to whereis
, examinekey
, and other commands that support the --batch
and/or -z
option.
Of course, instead of defining a new command, this functionality could be absorbed into find --all
.
I realize that I can accomplish precisely what I describe above with e.g., git annex whereis --all --format='${key}\n'
, which is great now that I know it's possible under whereis
, but as a new user I would expect to find this functionality in find
(which helpfully already supports --format=
) before I thought to check whereis
.
The reason I think that git-annex find
is limited to operating on files
is because it is analagous to the find
command. It would violate least
surprise to some extent for it to operate on keys. git-annex whereis
has
no such expectation.
Ah, I hadn't considered the parallel to the standard find
command, but now that you mention that I understand where you're coming from and can appreciate why whereis
is free of this association.
Still, I would think that a user who, after looking at the docs for git annex find
, specified --all
because they wanted to operate on keys would not be surprised.
I notice that the man page for git annex find
already has a "SEE ALSO" reference to git annex whereis
.
Could this be expanded so that it more clearly and prominently advises the reader who is looking to query against all known keys to check out the --all
argument to git annex whereis
as well as its --format=
option if "whereis" information is not actually of interest?
git annex find --keys
option? That way it's crystal clear you're searching among keys rather than files.
turns out that if you get into the habit of writing rsync remotes like rclone remotes, git annex breaks
I had a SSH config entry with the name annex
and set the rsync URL to annex:
. The works for moving content in, but fsck and get will fail and undo the content tracking.
Is it possible to edit comments on the branchable wiki? I realized there was a sentence I failed to finish when posting this comment and I'd love to go back and finish the thought. The "Edit" button at the top of the page lets me edit the content of the page, but not any of the comments.
I tried cloning the wiki, editing the file corresponding to my comment, and then pushing, but the push was rejected (the changes were in doc tree so I expected it to be accepted, but perhaps comments are more locked down).
Update: It seems I am able to edit comments by cloning the wiki, editing (comments are located in a folder with the same name as the associated page), committing, and then pushing. Yay!
I'm preparing to recycle an aging laptop that has a few git-annex repos on it. I'd like to confirm that anything in its annex(es) exist in at least one other place and want to confirm what I'm doing to check this makes sense.
At first glance, it seems like the appropriate way to do this is with git annex find --in here --not --copies=2
(where the latter predicate should be equal to testing for copies strictly less than 2).
Since I have recently git annex sync
-ed, this doesn't turn up anything.
However, if I understand everything correctly, this only checks files that are reachable from my current working tree.
Thus, if there are a bunch of files in my (not currently checked out) dev
branch that are not in my worktree, then this query will not discover them.
I can get them to be considered with git annex find --in here --not --copies=2 --branch dev
.
I could manually (or via script) loop over all of my branches and repeat git annex find --in here --not --copies=2 --branch ${branch}
to check all of my branches.
However, this will only check the tips.
Suppose there's a file that previously existed (solely) in my master branch, but at some point it was git rm
-ed. Then unless I specify using --branch
a TREEISH that has that file, it will not be considered.
So, I need to use some sort of query tool that supports both the --all
flag as well as all of the matching options.
The only thing I was able to find was whereis
, so I can run git annex whereis --all --in here --not --copies=2
in order to identify keys corresponding to files that are (a) locally available but where (b) the number of copies is not 2 or greater (i.e., it is here and only here).
I suppose I could also just plunge ahead with git annex copy --to ${remote} --all --in here --not --copies=2
, but it's reassuring to be able to run the query and see what would need to get moved (as well as to see the query come back empty before I wipe the hard drive).
Is this an appropriate use of git annex whereis
, or is there a way that I can use git annex find
to accomplish this, or perhaps some other query tool?
In essence, I just want a way of querying "all" of the objects that git annex has ever known about using all of the standard matching options.
I see discussion above regarding the lack of --all
support for git annex find
, which at the time suggested using findref
instead but it seems like that has been deprecated in favor of find
.
If you add a file to your repo first via addurl --fast
, it writes the filename as a symlink to a file that incorporates the URL, rather than the file hash. This is expected, since git-annex can't know the file hash until it's actually downloaded the file.
If you then git annex get
that file, it downloads the file to the path that uses the URL. Is the hash ever recorded for these files? If you were to drop and re-download the file, would git-annex accept a different file?
git-annex-untrust
the laptop repo, do agit-annex-sync
, thengit-annex-fsck
to check that the files have enough trusted copies (as set in yournumcopies
setting)?