Recent comments posted to this site:
Joey, thanks for you quick help! I'll try the manual haskell-platform install once I have quicker internet again, i.e. tomorrow.
And sorry for the mess-up; I splitted the post into two. Hope it's clearer now.
The directory and rsync special remotes intentionally use the same layout. So the same directory could be set up as both types of special remotes.
The main reason to use this rather than a bare git repo is that it supports encryption.
Thanks for this great tool! I was wondering what the differences are between using type=directory
, type=rsync
, or a bare git repo for directories?
I guess I can't use just a regular repo because my USB drive is formatted as vfat
-- which threw me for a loop the first time I heard about git-annex
about a year ago, because I followed the walkthrough, and it didn't work as expected and gave up (now I know it was just a case of PEBKAC). It might be worth adding a note about vfat to the "Adding a remote" section of the walkthrough, since the unstated assumption there is that the USB drive is formatted as a filesystem that supports symlinks.
Thanks again, my scientific data management just got a lot more sane!
Use du -L
for the disk space used locally. The other number is not currently available, but it would be nice to have. I also sometimes would like to have data on which backends are used how much, so making this git annex status --subdir
is tempting. Unfortunatly, it's current implementation scans .git/annex/objects
and not the disk tree (better for accurate numbers due to copies), so it would not be a very easy thing to add. Not massively hard, but not something I can pound out before I start work today..
Sure, you can simply:
cp annexedfile ~
Or just attach the file right from the git repository to an email, like any other file. Should work fine.
If you wanted to copy a whole directory to export, you'd need to use the -L flag to make cp follow the symlinks and copy the real contents:
cp -r -L annexeddirectory /media/usbdrive/
Running git checkout
by hand is fine, of course.
Underlying problem is that git has some O(N) scalability of operations on the index with regards to the number of files in the repo. So a repo with a whole lot of files will have a big index, and any operation that changes the index, like the git reset
this needs to do, has to read in the entire index, and write out a new, modified version. It seems that git could be much smarter about its index data structures here, but I confess I don't understand the index's data structures at all. I hope someone takes it on, as git's scalability to number of files in the repo is becoming a new pain point, now that scalability to large files is "solved". ;)
Still, it is possible to speed this up at git-annex's level. Rather than doing a git reset
followed by a git checkout, it can just git checkout HEAD -- file
, and since that's one command, it can then be fed into the queueing machinery in git-annex (that exists mostly to work around this git malfescence), and so only a single git command will need to be run to lock multiple files.
I've just implemented the above. In my music repo, this changed an lock of a CD's worth of files from taking ctrl-c long to 1.75 seconds. Enjoy!
(Hey, this even speeds up the one file case greatly, since git reset -- file
is slooooow -- it seems to scan the entire repository tree. Yipes.)
This is now about different build failure than the bug you reported, which was already fixed. Conflating the two is just confusing.
The error message about
syb
is because by using cabal-install on an Ubuntu system from 2010, you're mixing the very old versions of some haskell libraries in Ubuntu with the new versions cabal wants to install. The solution is to stop mixing two package management systems --apt-get remove ghc
, and manually install a current version of The Haskell Platform and use cabal.