git-annex can transfer data to and from configured git remotes. Normally those remotes are normal git repositories (bare and non-bare; local and remote), that store the file contents in their own git-annex directory.

But, git-annex also extends git's concept of remotes, with these special types of remotes. These can be used just like any normal remote by git-annex. They cannot be used by other git commands though.

The above special remotes can be used to tie git-annex into many cloud services. Here are specific instructions for various cloud things:

Unused content on special remotes

Over time, special remotes can accumulate file content that is no longer referred to by files in git. Normally, unused content in the current repository is found by running git annex unused. To detect unused content on special remotes, instead use git annex unused --from. Example:

$ git annex unused --from mys3
unused mys3 (checking for unused data...) 
  Some annexed data on mys3 is not used by any files in this repository.
    NUMBER  KEY
    1       WORM-s3-m1301674316--foo
  (To see where data was previously used, try: git log --stat -S'KEY')
  (To remove unwanted data: git-annex dropunused --from mys3 NUMBER)
$ git annex dropunused --from mys3 1
dropunused 12948 (from mys3...) ok

Thanks for this great tool! I was wondering what the differences are between using type=directory, type=rsync, or a bare git repo for directories?

I guess I can't use just a regular repo because my USB drive is formatted as vfat -- which threw me for a loop the first time I heard about git-annex about a year ago, because I followed the walkthrough, and it didn't work as expected and gave up (now I know it was just a case of PEBKAC). It might be worth adding a note about vfat to the "Adding a remote" section of the walkthrough, since the unstated assumption there is that the USB drive is formatted as a filesystem that supports symlinks.

Thanks again, my scientific data management just got a lot more sane!

The directory and rsync special remotes intentionally use the same layout. So the same directory could be set up as both types of special remotes.

The main reason to use this rather than a bare git repo is that it supports encryption.

Comment by http://joeyh.name/ Mon Jun 25 15:29:29 2012
Just noting that the environment variables ANNEX_S3_ACCESS_KEY_ID and ANNEX_S3_SECRET_ACCESS_KEY seem to have been changed to AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
Thanks! Being new here, I didn't want to overstep my boundaries. I've gone ahead and made a small edit and will do so elsewhere as needed.
Thanks, I've fixed that. (You could have too.. this is a wiki ;)
Comment by http://joeyh.name/ Tue May 29 19:10:46 2012
Comments on this page are closed.