Riding the Flow

musings from the shire

My PC configuration involves single big SSD to store data (currently 512GB Crucial M4). This provides best performance and convenience of single hard drive.
However, for obvious reasons it makes regular backups even more paramount - since SSDs are being generally less reliable than usual magnetic drives.
To backup data, I use 2.5" hard drive enclosure, connected with eSATAp(Powered eSATA) interface - this way only single cable is required, while providing maximum speed and transparency.

Previously I was using Windows 7 built in backups (System Image), protected with Bitlocker to Go encryption. But recently I was badly burned by this setup, when my SSD developed single faulty bad sector. It wasn't too bad, until Windows Backup failed to properly backup this and didn't indicate a fault - but aborted backup early indicating a "success". Then it became apparent only after I tried to re-image SSD using this corrupt backup - necessitating a roll-back to previous, 1+ month older backup. At least I've managed to get access to older one, which wasn't very user-friendly either (since Windows Backup stores previous backup versions inside hidden Shadow Copy instead of normal files, also it could purge old ones quite unpredictably).

So, I've started to look for more professional and transparent backup solution, and came over quite big number of alternatives. I've tested them in my configuration (backing a fast SATA3 SSD onto large, but no-so-fast external HDD).

CityHash64 and Finding Collision Samples for it

200px-Hash_table_4_1_1_0_0_1_0_LL.svg[1]CityHash is a modern family of hash functions for strings (variable-length arrays) that provides one of the best performance and quality, requiring only standard 64-bit instructions for optimal performance.
This makes it very useful "general-purpose" hash. In particular, CityHash64 hashes to 64-bit value, which makes it primary function of choice on 64-bit platform.
For some tests (e.g. containers), it's beneficial to know which pairs of strings will produce same hash value (collision). Finding collisions for well-designed hash function could be difficult - since brute force most likely will be most viable approach.
I've recently have calculated some collisions for CityHash64 using multi-threaded birthday attack algorithm. So I provide two such pairs there for your convenience:

  • StrA="oH]pPZccPmOEHjBW" StrB="vm`sd|obCXIKJ}aE" CityHash64=DCBE2B1930540000
  • StrA="?];lLLK:R_@XddTg" StrB="bgW^Wi]IkQgDM3WW" CityHash64=9E75C019D3D94BB7