Recent comments posted to this site:
Is it possible to add git-lfs capabilities to a git-annex, without using a special remote?
I guess what I want is, are there any reasonable instructions to graft the hooks so that this is possible:
$ git init
$ git-lfs install
$ git-annex init
And you can alternate between something like below:
$ git-lfs track "*.exif_thumbnail.*"
$ git-annex add IMG_0001.jpg
$ git add IMG_0001.exif_thumbnail.jpg
Obviously this betrays the scenario of extracting thumbnails from the EXIF header and storing them alongside, as another form of metadata. If there's a better workflow to this, that would be appreciated too.
Sounds like you might want to use datalad, which is built around git annex and where submodules are a first-class citizen.
Datalad handles submodules as subdatasets and add python code layers on it to handle datasets(e.g. dedup submodules). But it doesn't detect the submodules path changed like git.
So, it doesn't do my needs sadly.
@TTTTAAAx kindly posted a full example of their problem, which I've moved to detect and handle submodules after path changed by mv.
I do think that using git mv
to rename directories that contain
submodules is the right way to avoid that kind of problem.
Note that renaming such a directory without using git followed by running
git add
on the new directory has the same behavior as running
git-annex assist
does. This is not a git-annex problem, but I think it
could be considered a git problem; git could make git add
of a moved
submodule do the right thing.
Another note worthwhile making IMHO is that AFAIK those
git replace
markers are local only, and whoever has unredacted-master later on might need to set them up as well for their local clones to make such a "collapse" of histories
Right, any repository you fetch unredacted-master into, you will also want to fetch refs/redacted/ to as well, and run git replace there, as shown in the last code block of the tip above.
Here is a script I crafted to use to make it easy and reuse current tree object for new "squashed history" commit
#!/bin/bash
#
# A helper to establish an alternative history to hide commits which could have
# leaked personal data etc.
#
# More information on motivation etc and another implementation could be
# found at https://git-annex.branchable.com/tips/redacting_history_by_converting_git_files_to_annexed/
#
set -eu
BRANCH=$(git rev-parse --abbrev-ref HEAD)
: "${SECRET_BRANCH:=unredacted-$BRANCH}"
SAFE_BASE="$1"
git branch "${SECRET_BRANCH}"
rm -f .git/COMBINED_COMMIT_MESSAGE
echo -e "Combined commits to hide away sensitive data\n" >> .git/COMBINED_COMMIT_MESSAGE
git log --format=%B "$SAFE_BASE..HEAD" >> .git/COMBINED_COMMIT_MESSAGE
# the tree we are on ATM
TREE_HASH=$(git log -1 --format=%T HEAD)
NEW_COMMIT=$(git commit-tree $TREE_HASH -p "$SAFE_BASE" -F .git/COMBINED_COMMIT_MESSAGE)
rm -f .git/COMBINED_COMMIT_MESSAGE
git reset --hard $NEW_COMMIT
git replace "$BRANCH" "$SECRET_BRANCH"
Is it possible to somehow make git annex whereis
show the response of the special remote to WHEREIS
over multiple lines? Just including newlines obviously results in an error, since that ends the WHEREIS-SUCCESS message.
I am implementing a special remote for which the data is fully described by what is essentially a json-encoded request to a third-party API, and I would like to show this json string pretty-printed over multiple lines in the whereis output, instead of as a single line.
@craig, all of git-annex's information about a special remote is stored in
the git-annex branch in git, so any clone of the git repository is
sufficient to back that up. You can run git annex enableremote
in an clone to enable an existing special remote.
The only catch is that, if you have chosen to initremote a special remote
using a gpg key, with keyid=whatever
, you'll of course
also need that gpg key to to use it. If you run git annex info $myremote
it will tell you amoung other things, any gpg keys that are used by that
remote.
Hi, what would be a recommended setup and the working procedures for the following scenario:
- using git-annex version: 8.20210223, which is the one in ubuntu-22.04 (can't upgrade easily)
- a central server as a mutable central archive, many users (over ssh)
- users are all trusted
- the server shall keep all annexed files, but only the HEAD version is relevant, that is: if the file is removed by the user, it shall eventually be permanently removed from the central server too, to save space.
- users would tipically not need all the files, but only some, so
git annex get files...
would do - users would also add or remove annexed files (and push them to the central repository)
- a user might remove his/her local repository at any time, so the central server shall not keep track about clones or at least shall not care if any or all clones get removed
I have created central repository like this (please correct me):
git init test --bare
cd test
git annex init
git annex required . "include=*"
On the user site
git clone ssh://some.server/repo/test test
cd test
dd if=/dev/random of=./bigfile bs=1M count=10
git annex add bigfile
# how to sync (push only)?
# how to permanently remove big file?
cd ..
# done with the task
chmod -R 777 test
rm -rf test
What I am looking for is the sequence of commands for the users, to:
- sync to the latest state (without fetching the content)
- add new annexed file to the repository and push it
- permanently delete annexed file
There are several issues I am facing at the moment.
I was expecting to push the new file with git annex sync --content --no-pull
, but this command also pulls the contents of all annexd files, which I don't want. The server does not want to remove the old content. It looks like I am doing something wrong. Appreciate your suggestions about this scenario.
Thank you! In my case, since safe commit is too far in the past, what I had in mind is a little different: I wanted to have a completely disconnected history with a commit which had
$privatefile
s moved to annex, but I think the approach is "the same" in effect. The only thing I would do differently is to first convert files to git-annex inmaster
(to becomeunredacted-master
), so I end up with the same tree inunredacted-master
andmaster
happen that later I would need to cherry-pick some changes accidentally committed (e.g. by collaborators) on top ofunredacted-master
.Another note worthwhile making IMHO is that AFAIK those
git replace
markers are local only, and whoever has unredacted-master later on might need to set them up as well for their local clones to make such a "collapse" of histories.