Recent comments posted to this site:
You could add a config to the script that skips over files larger than a certian size.
Or for that matter, the script could be adapted to filter the files to only include images/videos, using eg:
git annex find --mimetype='image/*' --or --mimetype='video/*'
Should be a fairly easy change, patches accepted.
git-annex: Unknown remote name.
. I assume this is because git annex does not create a uuid for the type=git special remote, presumably because non is set for the actual git remote (the annex-uuid key does not exist for the existing git remote with the same url). This is the relevant line generated in remote.log: autoenable=true location=<ssh-url> name=<name> type=git timestamp=1629118438.628919s
, as you can see there is no uuid at the beginning. Any ideas if this is a bug or if the instructions are outdated?
This can only be used for git remotes that have an annex-uuid set. All
special remotes have to have a uuid. git remotes that are hosted on a
system with git-annex installed get the annex-uuid set up automatically,
but it won't happen if your special remote is on github or something like
that. You could git config remote.foo.annex-uuid $(uuid)
to generate and
set a new uuid, I suppose.
This is like a 1000ft overview, but doesn't actually say where the files are actually stored or how they're synchronized.
Does one need to setup a samba, sftp, or AWS bucket to contain the large files? Does a clone of the repo full down all of the large files, or just the files in the working directory that's checked out? Are files transferred via direct connection to other repos (ex the same SSH tunnel that git uses, http, etc) or is there a UDP p2p layer like syncthing or bittorrent that might struggle with certain NAT situations?
The sentence "A file's content can be transferred from one repository to another by git-annex. Which repositories contain a given value is tracked by git-annex (see location tracking)." makes it sound like the old versions of the large files only exist on computers that checked out those copies. Does this mean old versions of a file might be lost forever if a single clone is deleted and temporarily unavailable if clones that contain those revisions of the file are offline?
Is there a way to ensure that a clone has all copies of all of the files (for example, when using git with a central trusted server)?
This is like a 1000ft overview, but doesn't actually say where the files are actually stored or how they're synchronized.
It does:
"[...] That's a fancy way to say that git-annex stores the actual file content somewhere under .git/annex/. (See internals for details.)".
When using SHA256E hashing (the default), a file will end up for example under .git/annex/f87/4d5/SHA256E-s0--e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
.
Does one need to setup a samba, sftp, or AWS bucket to contain the large files?
No.
Does a clone of the repo pull down all of the large files, or just the files in the working directory that's checked out?
No and no. You decide yourself (via git-annex-get/git-annex-drop/git-annex-copy/git-annex-move) or automated (via git annex sync --content
/git-annex-wanted/git-annex-preferred-content) what large files are stored where. If a file is not present you will just see a broken symlink.
Are files transferred via direct connection to other repos (ex the same SSH tunnel that git uses, http, etc) or is there a UDP p2p layer like syncthing or bittorrent that might struggle with certain NAT situations?
Yes and no. git-annex is very flexible, it can also communicate via tor.
The sentence "A file's content can be transferred from one repository to another by git-annex. Which repositories contain a given value is tracked by git-annex (see location tracking)." makes it sound like the old versions of the large files only exist on computers that checked out those copies. Does this mean old versions of a file might be lost forever if a single clone is deleted and temporarily unavailable if clones that contain those revisions of the file are offline?
Yes and yes. If you don't copy the file elsewhere (with the commands mentioned above) before deleting the repo, that version is lost.
Is there a way to ensure that a clone has all copies of all of the files (for example, when using git with a central trusted server)?
git annex get --all
/git annex wanted server anything
I strongly suggest you to create a throwaway repository and try things out.
I'm using git-annex for backing up a variety of data, with several different remotes (including a USB drive for backup, rsync for encrypted cloud backup, etc). I have one particular use case that I am trying to figure out how to implement:
Pictures, Music, and Video are stored on a WD MyCloud under three corresponding folders. They are accessible via NFS, as well as via AFP; I specifically mount them on my laptop via NFS. I want to be able to access the files in the following ways:
- Store new photos from other sources, including my laptop, using git-annex.
- Have available the list of files on my laptop repository, in git-annex... i.e. like any repository.
- Retain all the file contents on the WD MyCloud... i.e. like a special remote.
- ONLY pull file contents to the laptop (or other repositories) if specifically requested (i.e. I don't want a sync to cause things to get pulled to the laptop).
- When photos are pulled from the special remote, they go into ~/annex/photos (i.e. a different name than the folder on the remote).
- I want to be able to access the files on the WD MyCloud "normally" from other computers, TVs, etc. I realize there's some 'danger' here. But I wanted to keep normal file trees for the special remote, not the hash-based filenames with symlinks.
So what I decided I needed was a directory special remote, with both the exporttree and importtree options. So I created the special remote using something like:
git annex initremote wdmycloud-pics directory=/net/wdmycloud/nfs/Public/Shared\ Pictures encryption=none exportree=yes importtree=yes
On my laptop's repository, I executed the following:
git annex import master:photos --from wdmycloud-pics
git merge wdmycloud-pics/master
Problem: - This imports the content from the special remote, into the laptop's repository. I don't want that.
I do realize that my main git-annex repository on the laptop probably has to actually download all of the files in order to compute their hashes. However, I do not want them to stay on the laptop, as there is nowhere's near enough space for all of them. I did try:
git-annex wanted wdmycloud-pics "exclude=photos/*"
However, this results in none of the files on the WD MyCloud actually getting processed.
What am I missing?
Thanks! -Rob
git-annex-import
has a --no-content
switch.
This is cool! I had to do the same thing earlier tonight, and ended up with this:
cat ~/path/to/playlist.m3u | sed -e "s/\ /\\\ /g" | xargs git annex copy --to remote
I thought I might stick that into a bash alias, or function.
If you are experiencing a problem using git-annex on Android, please examine the list of bugs and add a new, detailed bug report if no-one has reported the problem. If you are not sure if you have a bug, or need help in filing a good bug report, ask for help in the forum.
I have removeda lot of old comments about problems that may be fixed or not (hard to tell without a bug report!) This page cannot scale to handle every bug report that someone wants to paste into it.
Running extract on very large files (system backups) can be too long (killed it after running several hours). In general
extract
seem slow on tar.gz archives.I added
timeout 100s
before the tool is called in the pre commit script:LC_ALL=C timeout 100s $tool_exec "./$f" | ...
This allows to have the commit to complete in reasonable time, probably loosing some metadata.