Creating a special S3 remote to hold files shareable by URL

(In this example, I'll assume you'll be creating a bucket in S3 named public-annex and a special remote in git-annex, which will store its files in the previous bucket, named public-s3, but change these names if you are going to do the thing for real)

First, in the AWS dashboard, go to (or create) the bucket you will use at S3 and add a public get policy to it:

{
  "Version": "2008-10-17",
  "Statement": [
    {
      "Sid": "AllowPublicRead",
      "Effect": "Allow",
      "Principal": {
        "AWS": "*"
      },
      "Action": "s3:GetObject",
      "Resource": "arn:aws:s3:::public-annex/*"
    }
  ]
}

Then set up your special S3 remote with (at least) these options:

git annex initremote public-s3 type=s3 encryption=none bucket=public-annex chunk=0

This way git-annex will upload the files to this repo, (when you call git annex copy [FILES...] --to public-s3) without encrypting them and without chunking them, and, because of the policy of the bucket, they will be accessible by anyone with the link.

Following the example, the files will be accessible at http://public-annex.s3.amazonaws.com/KEY where KEY is the file key created by git-annex and which you can discover running

git annex lookupkey FILEPATH

This way you can share a link to each file you have at your S3 remote.


Sharing all links in a folder

To share all the links in a given folder, for example, you can go to that folder and run (this is an example with the fish shell, but I'm sure you can do the same in bash, I just don't know exactly):

for filename in (ls)
    echo $filename": https://public-annex.s3.amazonaws.com/"(git annex lookupkey $filename)
end

Sharing all links matching certain metadata

The same applies to all the filters you can do with git-annex.

For example, let's share links to all the files whose author's name starts with "Mario" and are, in fact, stored at your public-s3 remote. However, instead of just a list of links we will output a markdown-formatted list of the filenames linked to their S3 urls:

for filename in (git annex find --metadata "author=Mario*" --and --in public-s3)
   echo "* ["$filename"](https://public-annex.s3.amazonaws.com/"(git annex lookupkey $filename)")"
end

Very useful.

Thanks Giovanni for that nice tip!

You can additionally publish the whole git repository by eg pushing it to github. (Not if it contains private files or if you have embedded encryption keys or credentials though.)

You can tell git-annex the public url for the files too, and then others can just clone the git repository and use git-annex to download the files from S3.

You could set that up by running something like this:

for filename in $(git annex find --in public-s3); do
    git annex addurl --file "$filename" https://public-annex.s3.amazonaws.com/"$(git annex lookupkey $filename)"
done
Comment by joey Mon Dec 1 22:59:58 2014
Is it possible to easily do the same with rsync/directory layout of the special remote? These have prefixes which are not shown when doing git annex lookupkey
Comment by BojanNikolic Mon Feb 16 10:04:41 2015

You can look up the hash directories for a key using:

git annex examinekey $key --format '${hashdirlower}\n'

Comment by joey Wed Feb 25 19:44:40 2015

Many thanks. The command line I ended up using is:

fname="2015/01/04/myfile.pdf" ;  git annex copy --to pubweb $fname; key=`git annex lookupkey "$fname"`;  git annex examinekey $key --format 'https://www.myweb.com/d/${hashdirlower}${key}/${key}\n'

to publish selected documents in my git-annex repository onto the web via a rsync special remote on a conventional http server.

Comment by BojanNikolic Fri Feb 27 09:55:15 2015
Comments on this page are closed.