git-annex extends git's usual remotes with some special remotes, that are not git repositories. This way you can set up a remote using say, Amazon S3, and use git-annex to transfer files into the cloud.
First, export your Amazon AWS credentials:
# export AWS_ACCESS_KEY_ID="08TJMT99S3511WOZEP91"
# export AWS_SECRET_ACCESS_KEY="s3kr1t"
Now, create a gpg key, if you don't already have one. This will be used
to encrypt everything stored in S3, for your privacy. Once you have
a gpg key, run gpg --list-secret-keys
to look up its key id, something
like "2512E3C7"
Next, create the S3 remote, and describe it.
# git annex initremote cloud type=S3 chunk=1MiB keyid=2512E3C7
initremote cloud (encryption setup with gpg key C910D9222512E3C7) (checking bucket) (creating bucket in US) (gpg) ok
# git annex describe cloud "at Amazon's US datacenter"
describe cloud ok
The configuration for the S3 remote is stored in git. So to make another repository use the same S3 remote is easy:
# cd /media/usb/annex
# git pull laptop
# git annex enableremote cloud
enableremote cloud (gpg) (checking bucket) ok
Now the remote can be used like any other remote.
# git annex copy my_cool_big_file --to cloud
copy my_cool_big_file (gpg) (checking cloud...) (to cloud...) ok
# git annex move video/hackity_hack_and_kaxxt.mov --to cloud
move video/hackity_hack_and_kaxxt.mov (checking cloud...) (to cloud...) ok
See S3 for details.
AWS_ACCESS_KEY_ID
andAWS_SECRET_ACCESS_KEY
?You can use
git annex enableremote
to change an existing remote's configuration. So this should work:Jack, if you don't want to use encryption you can use
encryption=none
as documented here.I'm not sure exactly what you're trying to do, but please note that you files won't be easily available on S3: they will be named as git-annex keys, with long and unreadable names such as "SHA256E-s6311--c7533fdd259d872793b7298cbb56a1912e80c52a845661b0b9ff391c65ee2abc.html" instead of "index.html".
I don't know if this is what Jack wanted, but you can upload your files to S3 and let them be accessible through a public URL.
First, go to (or create) the bucket you will use at S3 and add a public get policy to it:
Then set up your special remote with the options
encryption=none
,bucket='BUCKETNAME'
chunk=0
(and any others you want).Your files will be accessible through
http://BUCKETNAME.s3-website-LOCATION.amazonaws.com/KEY
where location is the one specified through the optionsdatacenter
and KEY is the SHA-SOMETHING hash of the file, created by git annex and accessible if you rungit annex lookupkey FILEPATH
.This way you can share a link to each file you have at your S3 remote.
I use github as my central git repository and I would like to use S3 to store large files with annex. Since the s3 remote in .git/config is not stored in github, how do I make sure I reconnect to the same s3 bucket in case I delete my local clone? Reinitializing the remote will create a completely new bucket.
I would also be a good idea to centralize git-annex folders inside a single bucket so I keep the global namespace under control and can narrow down the permissioning.
Lemao, make sure you have pushed your git-annex branch to your central git repository.
When you clone that repo elsewhere, you can add the S3 remote by running
git annex enableremote cloud
(replace "cloud" with whatever name you originally picked when you usedgit annex initremote
to set up the S3 remote in the first place.git-annex stores the necessary configuration of the S3 remote on the git-annex branch.
Even after enableremote I can't get from s3.
This is after all branches are pushed from my original repo. Any suggestions?
RE: my last comment
The reason I couldn't get it to work is because I didn't have proper read access to the bucket. My bad for not checking first but it would be great it there was a clearer error message from git-annex and/or a way to get more detailed information on the s3 extension (-d doesn't do much).
Regardless git-annex is pretty cool, thanks to all the maintainers for their hard work.