Recent comments posted to this site:
Th expression format in the file is the same as preferred, but the latter can be set through command line (git-annex wanted REPO EXP), but I can't find any way to set required through command line. Is there any way, in the works, or not planned?
Thanks!
If I understand it correctly, 20PB at 2400 shards of 8TB each with 3 copies is 24TB/shard at 1TB/client is 2400*24 = ~60K clients assuming no churn. So it would probably need ~100K clients to cover the churn and have a good chance that each shard had 3 copies at all times. That's 1/3 the size of BOINC's active population.
It would take time to scale to that population. And it would take time to get three copies out of the Archive. During that time, the Archive is growing. The back of my envelope says that doing this in 2.5yrs roughly doubles the Archive's outbound bandwidth if you average it across the 2.5 years. But the population would grow slowly to start with, then faster, so that the bandwidth impact would be back-loaded. And at the end of the 2.5 years, you would need a lot more than the 100K users.
A design that used erasure coding or entanglement would reduce the storage and bandwidth demand considerably while providing adequate reliability.
I have a gcrypt special remote encrypted in hybrid mode, when I try to add a keyid using:
git annex enableremote myremote keyid+=XXXXXXXX
I get this error:
enableremote myremote (encryption update) (hybrid cipher with gpg keys XXXXXXXX XXXXXXX) fatal: remote myremote already exists.
git-annex: git [Params "remote add",Param "myremote",Param "gcrypt::XXXXXXXXXXX:gcrypt-tests"] failed
this is my git-annex version info:
git-annex version: 5.20141125
build flags: Assistant Webapp Webapp-secure Pairing Testsuite S3 WebDAV Inotify DBus DesktopNotify XMPP DNS Feeds Quvi TDFA CryptoHash
key/value backends: SHA256E SHA1E SHA512E SHA224E SHA384E SKEIN256E SKEIN512E SHA256 SHA1 SHA512 SHA224 SHA384 SKEIN256 SKEIN512 WORM URL
remote types: git gcrypt S3 bup directory rsync web webdav tahoe glacier ddar hook external
local repository version: 5
supported repository version: 5
upgrade supported from repository versions: 0 1 2 4
am I doing something wrong? thank you Giovanni
and I would still maintain my view that removing intermediate directory withing .git/annex/objects whose current roles is simply to provide read-only protection might half the burden on the underlying file system, either annex repo(s) are multitude or a single one [1]. lean view [2] could also be of good use as well[2]. Similar exercises with simulated annex'es with >5M files also "helped" to identify problems with ZOL (ZFS on Linux) caching suggesting that even mere handling of such vast arrays of tiny files (as dead symlinks) might give filesystems a good test, so the leaner impact would be -- the better.
[1] e.g. https://github.com/datalad/datalad/issues/32#issuecomment-70523036 [2] https://github.com/datalad/datalad/issues/25