The web can be used as a special remote too.

# git annex addurl http://example.com/video.mpeg
addurl example.com_video.mpeg (downloading http://example.com/video.mpeg)
########################################################## 100.0%
ok

Now the file is downloaded, and has been added to the annex like any other file. So it can be renamed, copied to other repositories, and so on.

To add a lot of urls at once, just list them all as parameters to git annex addurl.

trust issues

Note that git-annex assumes that, if the web site does not 404, and has the right file size, the file is still present on the web, and this counts as one copy of the file. If the file still seems to be present on the web, it will let you remove your last copy, trusting it can be downloaded again:

# git annex drop example.com_video.mpeg
drop example.com_video.mpeg (checking http://example.com/video.mpeg) ok

If you don't trust the web to this degree, just let git-annex know:

# git annex untrust web
untrust web ok

With the result that it will hang onto files:

# git annex drop example.com_video.mpeg
drop example.com_video.mpeg (unsafe) 
  Could only verify the existence of 0 out of 1 necessary copies
  Also these untrusted repositories may contain the file:
    00000000-0000-0000-0000-000000000001  -- web
  (Use --force to override this check, or adjust numcopies.)
failed

attaching urls to existing files

You can also attach urls to any file already in the annex:

# git annex addurl --file my_cool_big_file http://example.com/cool_big_file
addurl my_cool_big_file ok
# git annex whereis my_cool_big_file
whereis my_cool_big_file (2 copies) 
00000000-0000-0000-0000-000000000001 -- web
27a9510c-760a-11e1-b9a0-c731d2b77df9 -- here

configuring filenames

By default, addurl will generate a filename for you. You can use --file= to specify the filename to use.

If you're adding a bunch of related files to a directory, or just don't like the default filenames generated by addurl, you can use --pathdepth to specify how many parts of the url are put in the filename. A positive number drops that many paths from the beginning, while a negative number takes that many paths from the end.

# git annex addurl http://example.com/videos/2012/01/video.mpeg
addurl example.com_videos_2012_01_video.mpeg (downloading http://example.com/videos/2012/01/video.mpeg)
# git annex addurl http://example.com/videos/2012/01/video.mpeg --pathdepth=2
addurl 2012_01_video.mpeg (downloading http://example.com/videos/2012/01/video.mpeg)
# git annex addurl http://example.com/videos/2012/01/video.mpeg --pathdepth=-2
addurl 01_video.mpeg (downloading http://example.com/videos/2012/01/video.mpeg)

videos

There's support for downloading videos from sites like YouTube, Vimeo, and many more. This relies on quvi to find urls to the actual videos files.

When you have quvi installed, you can just git annex addurl http://youtube.com/foo and it will detect that it is a video and download the video content for offline viewing.

Later, in another clone of the repository, you can run git annex get on the file and it will also be downloaded with the help of quvi. This works even if the video host has transcoded or otherwise changed the video in the meantime; the assumption is that these video files are equivilant.

There is an annex.quvi-options configuration setting that can be used to pass parameters to quvi. For example, you could set git config annex.quvi-options "--format low" to configure it to download low quality videos from YouTube.

Note that for performance reasons, the url is not checked for redirects, so some shortened urls will not be detected. You can either load the short url in a browser to get the full url, or you can force use of quvi with redirect detection, by prepending "quvi:" to the url.

To download a youtube playlist, you need to find the feed associated with that playlist, and pass it to git annex importfeed. There does not seem to be an easy link anywhere to get the feed, but you can construct its url manually. For a playlist like "https://www.youtube.com/playlist?list=PL4F80C7D2DC8D9B6C", the feed is "http://gdata.youtube.com/feeds/api/playlists/PL4F80C7D2DC8D9B6C"

More details about youtube feeds at http://googlesystem.blogspot.com/2008/01/youtube-feeds.html -- git-annex importfeed should handle all of them.

bittorrent

The bittorrent special remote lets git-annex also download the content of torrent files, and magnet links to torrents.

You can simply pass the url to a torrent to git annex addurl the same as any other url.

You have to have aria2 and bittornado (or the original bittorrent) installed for this to work.

podcasts

This is done using git annex importfeed. See downloading podcasts.

There are resources that I want to add to my annex that are currently available via a URL, but it seems like if I add these using git-annex addurl, they get symlinked to file in the annex/objects directory that starts with URL-..., instead of the more typical SHA256-..., and this does not change even after the files are downloaded.

My concern is that I really want to ensure that these files don't change, which is the appeal of content-addressable symlinking of normal files (as opposed to URL addressable ones).

Would there be a way to automate the injection of hash-based symlinking for files that are added via addurl? Sometimes I add a bunch of files via addurl --fast, and after I've download them via get, it would be nice to have those files have the same level of data integrity as when I download them using something outside of git-annex, add them to the annex, and do an addurl --file afterward.

Thanks for all of your hard work!

addurl only uses the URL- keys if you run it with --fast. Otherwise it downloads the content and hashes it the same as add does.

If you use --fast, you can go back and git annex migrate the file once it's been downloaded, to convert it to the SHA backend.

Comment by http://joeyh.name/ Thu Sep 20 21:55:57 2012

is there a way to remove one of the urls? e.g. if I have

$> git annex whereis fail2ban_logo.png
whereis fail2ban_logo.png (1 copy) 
    00000000-0000-0000-0000-000000000001 -- web

  web: http://www.fail2ban.org/fail2ban_logo.png
  web: http://www.onerussian.com/tmp/statsmodes.png
ok

and would like to remove the fail2ban.org one... ?

You can use git annex rmurl $file $url, which I just added to git-annex.

(Also, git annex drop $file --from web will remove all the urls..)

Comment by http://joeyh.name/ Mon Apr 22 21:28:03 2013

Adding videos from youtube ends up with it using the URL backend, even without fast.

$ git init quvitest
$ cd quvitest/
$ git annex init
$ git annex addurl https://www.youtube.com/watch?v=mghhLqu31cQ
(... file is downloaded ...)
$ find .git/annex/objects/ -type f
.git/annex/objects/1J/Wp/URL--quvi&chttps&c%%www.youtube.com%watch,63v,61mghhLqu31cQ/URL--quvi&chttps&c%%www.youtube.com%watch,63v,61mghhLqu31cQ

Is migrating manually required or should I log a bug?

Comment by Xyem Fri Apr 4 15:25:39 2014
Using the URL backend for youtube is intentional. Youtube may serve up different encodings for the same video over time, and this way git-annex treats them all as equvilant. If you want to "freeze" the repository to the current one, use git annex migrate, and be prepared for git annex get --from web to not work long term.
Comment by http://joeyh.name/ Mon Apr 7 20:07:45 2014
Is there away to change the default pathdepth so I do not need to add --pathdepth=-1 everything I run addurl?

Hi!

I have a somewhat interesting use case. My course notes require HTTP authentication. This is possible with wget, but is there any way to make git annex do it?

wget authentication stuff!

It would be nice to have the user and pass encrypted with GPG too. This might be a strange use case, but I can see other people wanting to do something like this in the future.

Thanks!

For urls using http basic auth, you can use the standard url form, http://username:password@example.org/url/ , which should work with git annex addurl. The url, including the password, will be stored in the git-annex branch though. If you want to protect the password from being exposed to anyone who gets a clone of the repository, just download manually, and then git annex add the file.
Comment by http://joeyh.name/ Tue Sep 30 18:09:04 2014
Once a file has been added with either addurl or importfeed, how can I get the URL of the file or feed from git-annex?
Comment by thnetos Thu Dec 18 18:56:57 2014
You can see the url(s) of a file when you run git annex whereis $file
Comment by joey Thu Dec 18 19:59:54 2014
Comments on this page are closed.