Recent comments posted to this site:

@joey Is this a bug or am I missing something?

Notes:

  • I am using the latest git-remote-gcrypt, version 1.5

Flow 1

  • git remote add test gcrypt::rsync://user@user.rsync.net:relative/path/to/repo
  • git annex sync -> DOES NOT SYNC to test remote
  • Nothing has been synced so I CANNOT successfully clone from the test remote with git clone gcrypt::rsync://user@user.rsync.net:relative/path/to/repo
  • git push test git-annex master
  • I can successfully clone from the test remote with git clone gcrypt::rsync://user@user.rsync.net:relative/path/to/repo

Flow 2

  • git remote add test gcrypt::rsync://user@user.rsync.net/full/path/to/repo
  • git annex sync -> DOES SYNC to test remote
  • I can successfully clone from the test remote with git clone gcrypt::rsync://user@user.rsync.net:relative/path/to/repo
Comment by talmukoydu Wed Mar 22 00:29:54 2023
@joey definitely seems like a bug. I am able to easily verify by changing the remote url back and forth in the .git/config and then running git annex sync. If the relative url is used git annex sync does not sync to that remote.
Comment by talmukoydu Wed Mar 22 00:29:54 2023

@talmukoydu you need to file a bug report and include things like the version of git-annex you are using.. https://git-annex.branchable.com/bugs/

Comment by joey Wed Mar 22 00:29:54 2023

@mike, you can use git-annex info $remotename

The output will include a "chunking" line which ends in "(old style)" when applicable.

Comment by joey Mon Feb 27 19:08:33 2023
Thanks a lot Joey for your very clear explanation and clean workaround.
Comment by arnaud.legrand Mon Feb 27 19:08:33 2023

Hi,

I'm preparing a lecture on how git annex can help research data management and I stumbled, when playing with git-annex unannex, on a strange behavior that I fail to understand nor to properly work around. When preparing for a public archive it may make sense to include some annexed files in the archive while it may be desirable to keep the symlinks for others (e.g., because they are already available from somewhere else). This is why I do not want to rely on the git-annex export mechanism that would replace the symlinks of all annexed files by their content.

Instead, I unannex some of my files but surprisingly, depending on git annex configuration, their content may not be in the archive produced by git archive. Here is a minimal working example.

DIR=/tmp/test
chmod -Rf u+w $DIR; rm -rf $DIR ; mkdir -p $DIR; cd $DIR
git init
git annex init
git config --local annex.largefiles 'largerthan=100kb and include=data/*'

echo "Hello" > README
git add README

mkdir data/
dd if=/dev/zero of=data/foo.dat bs=1M count=1 2>/dev/null
git annex add data/foo.dat
git commit -m "Initial commit"

## git config --local annex.largefiles ''
git annex unannex data/foo.dat && git add data/foo.dat && git commit -m "Unannexing" 
git archive --format=tar.gz --prefix nobel_project/ -o ../archive.tgz HEAD

tar zxf ../archive.tgz
tree -s nobel_project/

 Initialized empty Git repository in /tmp/test/.git/
init  ok
(recording state in git...)
       add data/foo.dat
31.98 KiB        14 MiB/s 0s100%  1 MiB           137 MiB/s 0s                                  ok
(recording state in git...)
[master (root-commit) 8fbb907] Initial commit
 2 files changed, 2 insertions(+)
 create mode 100644 README
 create mode 120000 data/foo.dat
  unannex data/foo.dat ok
(recording state in git...)
[master da73fb0] Unannexing
 1 file changed, 1 insertion(+), 1 deletion(-)
)
100644
mnobel_project/
├── [       4096]  data
│   └── [        102]  foo.dat
└── [          6]  README

1 directory, 2 files

As you may see from the output, foo.dat is only 102 bytes whereas it should be 1MB. Instead the content of foo.dat is:

cat nobel_project/data/foo.dat
/annex/objects/SHA256E-s1048576--30e14955ebf1352266dc2ff8067e68104607e750abb9d3b36582b8af909fcb58.dat

But if I remove the annex.largefiles configuration (either upfront or right before calling unannex), everything works as expected, i.e., my archive comprises the content of the annexed file.

Is this an expected behavior ? This is the kind of operation I typically do in a branch that I erase afterward but it (temporarily) messes my local git configuration, which I don't like, so I'm looking for a better workaround.

Thanks for you amazing work,

Arnaud

Comment by arnaud.legrand Mon Feb 20 20:51:36 2023

That behavior does not really have anything to do with the use of git-annex unannex.

You have configured annex.largefiles to something that matches a file. The file is not checked into git (perhaps because you ran git-annex unannex after adding it earlier, but it could just as well be a new file). You run git add on the file. What happens? Well, git add asks git-annex if the file should be annexed. Your annex.largefiles configuration tells it it should. So the file is added to the annex, and a pointer file is checked into git. (Not a symlink because the file is added unlocked; see git-annex-unlock.)

git archive does not archive the contents of the annex, so it only archives the pointer file content.

What you can do is use git-annex add --force-small when adding the file to override the annex.largefiles config. See largefiles at the bottom for a recipe for using that to convert an annexed file to be stored in git.

Comment by joey Mon Feb 20 20:51:36 2023
Is there a way to show the chunking parameters used for a remote? In particular I would like to know if I had used the old-style chunking for a remote.
Comment by mike Wed Feb 15 16:39:35 2023

I always use this

git show git-annex:remote.log

to print the current configurations of all remotes. It should have all the arguments you passed to initremote or enableremote. If there is a better way, I would also like to know.

(the git command prints the content of the file remote.log from branch git-annex)

Comment by MatusGoljer1 Wed Feb 15 16:39:35 2023

@matthias.risze length is not an issue. You should avoid characters that are not usually in urls, particularly whitespace and newline.

It seems to me though that your special remote would perhaps be better served by using the SETSTATE and GETSTATE commands (see external special remote protocol)

Comment by joey Wed Feb 1 02:26:33 2023