NAME
git-annex find - lists available files
SYNOPSIS
git annex find [path ...]
DESCRIPTION
Outputs a list of annexed files in the specified path. With no path, finds files in the current directory and its subdirectories.
OPTIONS
matching options
The git-annex-matching-options(1) can be used to specify files to list.
By default, the find command only lists annexed files whose content is currently present. Specifying any of the matching options will override this default behavior.
To list all annexed files, present or not, specify
--include "*"
.To list annexed files whose content is not present, specify
--not --in=here
--branch=ref
List files in the specified branch or treeish.
--print0
Output filenames terminated with nulls, for use with
xargs -0
--format=value
Use custom output formatting.
The value is a format string, in which '${var}' is expanded to the value of a variable. To right-justify a variable with whitespace, use '${var;width}' ; to left-justify a variable, use '${var;-width}'; to escape unusual characters in a variable, use '${escaped_var}'
These variables are available for use in formats: file, key, backend, bytesize, humansize, keyname, hashdirlower, hashdirmixed, mtime (for the mtime field of a WORM key).
Also, '\n' is a newline, '\000' is a NULL, etc.
The default output format is the same as
--format='${file}\\n'
--json
Output the list of files in JSON format.
This is intended to be parsed by programs that use git-annex. Each line of output is a JSON object.
--json-error-messages
Messages that would normally be output to standard error are included in the json instead.
--batch
Enables batch mode, in which a file is read in a line from stdin, its information displayed, and repeat.
Note that if the file is not an annexed file, or is not present, or otherwise doesn't meet the matching options, an empty line will be output instead.
-z
Makes the
--batch
input be delimited by nulls instead of the usual newlines.Also the git-annex-common-options(1) can be used.
SEE ALSO
git-annex(1)
AUTHOR
Joey Hess id@joeyh.name
Warning: Automatically converted into a man page by mdwn2man. Edit with care.
Thanks for the quick answer and for the tip.
findref
still displays file names, so OK, I can pipe the output withlookupkey
to have the corresponding list of keys. Still, my understanding is that the computation is not the same as a potentialfind --all
(orfind
on bare repos), in the sense that commands likemove --all
(ormove
on bare repos) only scan the files that are present in the repo, whereasgit annex findref master
looks at the whole branch regardless of where the files are. Sure, I can filter it withfindref master --in=here
, but the computational cost wouldn't be the same, would it? (imagining that my repo contains orders of magnitude fewer files than the branch) Also,move --all
catches past versions of files that are still in the repo, i.e. "unused files", whereas I guessfindref master --in=here
would miss them? It's just that commands likemove --all
start by doing the job I want before taking an action on the files, so I just wish there was a "no-action" version of them. A--dry-run
option inmove
andcopy
would be good enough. I tried to trick themove
command with amove --all ... --from=here --to=here
but of course I was outsmarted by the command :-)You can use
git annex findref master
in a bare repository, which is like find but operates on some branch.I am not convinced that find --all would really be that useful, since it would have to display keys and not filenames, and find is all about displaying filenames. I did make find error out in a bare repo rather than not doing anything.
Actually, that's not even true:
lookupkey
doesn't seem to work on a bare repo. So I don't see how I can get the list of keys that are going to be moved or copied when agit annex move ...
orgit annex copy ...
is run from a bare repo.Sometimes I want to move files from one git annex repo to another. It would be really awesome if one could so something like:
Just to make myself clear. I do not mean "other remote" (foreign instance of "same" repo). I actually mean different repos without common location tracking, no common branches, etc. The only concession I would make (since I think it's necessary) would be that the same backend has to be used in both repos.
This approach could also be relevant for other git annex commands, e.g.:
Is there any way to do it? Or would this be a feature request worth to consider?
Is there a way to use find to figure out what is wanted (for getting or dropping) on a given remote? I want to be able to do this to anticipate the outbound consequences of a sync (I can anticipate the inbound consequences of a sync, modulo file availability on remotes, using
git annex find --wanted-get --not --in here
).I could manually figure out the wanted expressions for an arbitrary remote (resolving any group assignments, etc) and then use that expression, but it'd be nice to be able to do something like
git annex find --wanted-get-at remotename --not --in remotename --in here
to figure out what things will get copied to the remote if I rungit annex sync --content
. I presume this logic is implemented somewhere since it gets used when doinggit annex sync --content
, and perhaps there's a way that I can do it usinggit annex find
or other commands and I'm just not seeing how to do it.I'm preparing to recycle an aging laptop that has a few git-annex repos on it. I'd like to confirm that anything in its annex(es) exist in at least one other place and want to confirm what I'm doing to check this makes sense.
At first glance, it seems like the appropriate way to do this is with
git annex find --in here --not --copies=2
(where the latter predicate should be equal to testing for copies strictly less than 2). Since I have recentlygit annex sync
-ed, this doesn't turn up anything.However, if I understand everything correctly, this only checks files that are reachable from my current working tree. Thus, if there are a bunch of files in my (not currently checked out)
dev
branch that are not in my worktree, then this query will not discover them. I can get them to be considered withgit annex find --in here --not --copies=2 --branch dev
. I could manually (or via script) loop over all of my branches and repeatgit annex find --in here --not --copies=2 --branch ${branch}
to check all of my branches. However, this will only check the tips. Suppose there's a file that previously existed (solely) in my master branch, but at some point it wasgit rm
-ed. Then unless I specify using--branch
a TREEISH that has that file, it will not be considered.So, I need to use some sort of query tool that supports both the
--all
flag as well as all of the matching options. The only thing I was able to find waswhereis
, so I can rungit annex whereis --all --in here --not --copies=2
in order to identify keys corresponding to files that are (a) locally available but where (b) the number of copies is not 2 or greater (i.e., it is here and only here). I suppose I could also just plunge ahead withgit annex copy --to ${remote} --all --in here --not --copies=2
, but it's reassuring to be able to run the query and see what would need to get moved (as well as to see the query come back empty before I wipe the hard drive).Is this an appropriate use of
git annex whereis
, or is there a way that I can usegit annex find
to accomplish this, or perhaps some other query tool? In essence, I just want a way of querying "all" of the objects that git annex has ever known about using all of the standard matching options. I see discussion above regarding the lack of--all
support forgit annex find
, which at the time suggested usingfindref
instead but it seems like that has been deprecated in favor offind
.git annex find --keys
option? That way it's crystal clear you're searching among keys rather than files.Ah, I hadn't considered the parallel to the standard
find
command, but now that you mention that I understand where you're coming from and can appreciate whywhereis
is free of this association. Still, I would think that a user who, after looking at the docs forgit annex find
, specified--all
because they wanted to operate on keys would not be surprised.I notice that the man page for
git annex find
already has a "SEE ALSO" reference togit annex whereis
. Could this be expanded so that it more clearly and prominently advises the reader who is looking to query against all known keys to check out the--all
argument togit annex whereis
as well as its--format=
option if "whereis" information is not actually of interest?The reason I think that
git-annex find
is limited to operating on files is because it is analagous to thefind
command. It would violate least surprise to some extent for it to operate on keys.git-annex whereis
has no such expectation.Thanks Joey and Ilya for the nigh simultaneous untrust then fsck suggestion. I was able to get everything squared away using the
whereis
approach as a sort of poor man's dry run, then running the copy command I described, and then using mywhereis
again to convince myself that nothing was left behind, although I imagine I'm not the first one to be retiring a repo and so hopefully these comments will be of use to future users.While my particular problem is solved, I just wanted to add some additional input RE adding
--all
to find. I appreciate that Joey thinks thatfind
is about "listing files, not keys" (and if anyone's opinion here is authoritative, it should be his), but this was not my expectation as a user (although I would agree it was calledfindfiles
or something like that), so I just wanted to share my experience trying to accomplish this task.Given what I understand of
git-annex
, my instinct was to reach for a command that would let me use the powerful matching options against all known keys, so I looked over the list of commands to try to identify something that would do this. Right away,find
leapt out as the natural candidate, but I couldn't get it to work how I wanted, so the next obvious choice waslist
, but that also didn't work. It was only when I looked at the this wiki page forfind
and saw discussion of adding support for--all
that I started searching for commands that did accept--all
, and I stumbled uponwhereis
, but this required a fair deal of detective work on my part.FWIW,
whereis
is, IMHO, just as much about listing files at particular paths asfind
is (the documentation for both describes the argument as[path ...]
; it only typically talks about keys when--all
is passed, and sowhereis
taking--all
whenfind
does not seems unbalanced given thatwhereis
seems like a tool that would be built on top offind
. I think there's a similar asymmetry withlist
since it's described as being "similar togit annex whereis
but a more compact display."Now that the
--all
genie is somewhat out of the bottle it might be too late for this, but I wonder if afindkeys
command would help fill this need while obviating the need for--all
being passed to most other commands. It would be unequivocally about finding keys and not files, and its output could be say a list of keys delimited by newlines (or perhaps optionally null's to make it play nice with commands that accept-z
). If the user wanted to know more about the keys that matched their query, the output of this command could then be piped towhereis
,examinekey
, and other commands that support the--batch
and/or-z
option. Of course, instead of defining a new command, this functionality could be absorbed intofind --all
.I realize that I can accomplish precisely what I describe above with e.g.,
git annex whereis --all --format='${key}\n'
, which is great now that I know it's possible underwhereis
, but as a new user I would expect to find this functionality infind
(which helpfully already supports--format=
) before I thought to checkwhereis
.@Dan
findref
never supported listing all keys either.Yours is the best argument I've seen so far for wanting
find --all
. But the fact that this command is about listing files, not keys, still makes that seem out of scope for it.Using
whereis
would certainly do what you want. Another option would be tountrust
the repository that you are going to be deleting, and then runfsck --all
. Although that would report potentially other problems besides files that are only present in that repository.Finally, there's the bare metal option, which is also the fastest:
find .git/annex/objects -type f
git-annex-untrust
the laptop repo, do agit-annex-sync
, thengit-annex-fsck
to check that the files have enough trusted copies (as set in yournumcopies
setting)?