Recent comments posted to this site:
@talmukoydu you need to file a bug report and include things like the version of git-annex you are using.. https://git-annex.branchable.com/bugs/
@mike, you can use git-annex info $remotename
The output will include a "chunking" line which ends in "(old style)" when applicable.
Hi,
I'm preparing a lecture on how git annex can help research data
management and I stumbled, when playing with git-annex unannex
, on a
strange behavior that I fail to understand nor to properly work around.
When preparing for a public archive it may make sense to include
some annexed files in the archive while it may be desirable to keep
the symlinks for others (e.g., because they are already available from
somewhere else). This is why I do not want to rely on the git-annex
export
mechanism that would replace the symlinks of all annexed files
by their content.
Instead, I unannex
some of my files but surprisingly, depending on git
annex configuration, their content may not be in the archive produced by
git archive
. Here is a minimal working example.
DIR=/tmp/test
chmod -Rf u+w $DIR; rm -rf $DIR ; mkdir -p $DIR; cd $DIR
git init
git annex init
git config --local annex.largefiles 'largerthan=100kb and include=data/*'
echo "Hello" > README
git add README
mkdir data/
dd if=/dev/zero of=data/foo.dat bs=1M count=1 2>/dev/null
git annex add data/foo.dat
git commit -m "Initial commit"
## git config --local annex.largefiles ''
git annex unannex data/foo.dat && git add data/foo.dat && git commit -m "Unannexing"
git archive --format=tar.gz --prefix nobel_project/ -o ../archive.tgz HEAD
tar zxf ../archive.tgz
tree -s nobel_project/
Initialized empty Git repository in /tmp/test/.git/
init ok
(recording state in git...)
add data/foo.dat
31.98 KiB 14 MiB/s 0s100% 1 MiB 137 MiB/s 0s ok
(recording state in git...)
[master (root-commit) 8fbb907] Initial commit
2 files changed, 2 insertions(+)
create mode 100644 README
create mode 120000 data/foo.dat
unannex data/foo.dat ok
(recording state in git...)
[master da73fb0] Unannexing
1 file changed, 1 insertion(+), 1 deletion(-)
)
100644
mnobel_project/
├── [ 4096] data
│ └── [ 102] foo.dat
└── [ 6] README
1 directory, 2 files
As you may see from the output, foo.dat
is only 102 bytes whereas it
should be 1MB. Instead the content of foo.dat
is:
cat nobel_project/data/foo.dat
/annex/objects/SHA256E-s1048576--30e14955ebf1352266dc2ff8067e68104607e750abb9d3b36582b8af909fcb58.dat
But if I remove the annex.largefiles
configuration (either upfront or
right before calling unannex
), everything works as expected, i.e., my
archive comprises the content of the annexed file.
Is this an expected behavior ? This is the kind of operation I typically do in a branch that I erase afterward but it (temporarily) messes my local git configuration, which I don't like, so I'm looking for a better workaround.
Thanks for you amazing work,
Arnaud
That behavior does not really have anything to do with the use of
git-annex unannex
.
You have configured annex.largefiles to something that matches a file. The
file is not checked into git (perhaps because you ran git-annex unannex
after adding it earlier, but it could just as well be a new file). You run
git add
on the file. What happens? Well, git add
asks git-annex if the
file should be annexed. Your annex.largefiles configuration tells it it
should. So the file is added to the annex, and a pointer file is checked
into git. (Not a symlink because the file is added unlocked;
see git-annex-unlock.)
git archive
does not archive the contents of the annex, so it only
archives the pointer file content.
What you can do is use git-annex add --force-small
when adding the file
to override the annex.largefiles config. See largefiles at the
bottom for a recipe for using that to convert an annexed file to be stored
in git.
I always use this
git show git-annex:remote.log
to print the current configurations of all remotes. It should have all the arguments you passed to initremote
or enableremote
. If there is a better way, I would also like to know.
(the git command prints the content of the file remote.log
from branch git-annex
)
@matthias.risze length is not an issue. You should avoid characters that are not usually in urls, particularly whitespace and newline.
It seems to me though that your special remote would perhaps be better served by using the SETSTATE and GETSTATE commands (see external special remote protocol)
@joey Is this a bug or am I missing something?
Notes:
Flow 1
git remote add test gcrypt::rsync://user@user.rsync.net:relative/path/to/repo
git annex sync
-> DOES NOT SYNC to test remotegit clone gcrypt::rsync://user@user.rsync.net:relative/path/to/repo
git push test git-annex master
git clone gcrypt::rsync://user@user.rsync.net:relative/path/to/repo
Flow 2
git remote add test gcrypt::rsync://user@user.rsync.net/full/path/to/repo
git annex sync
-> DOES SYNC to test remotegit clone gcrypt::rsync://user@user.rsync.net:relative/path/to/repo