v0.4 : 20090115 Soon after the submission deadline, I noticed that I'd made a mistake with regards to how the principle key is dervied (I published notification of the error on my website on the 5th Nov). The compression function accepts input blocks in the size of 4096-bits. From this input block, a 2048-bit principle key is dervied which is the main key used in the state update. As part of the derivation of the principle key, a 512-bit data independent preliminary key was used which meant the same input message would create a different principle key if it was used at a different input block index. In the version submitted to NIST before the 31st October 2008 deadline, a single preliminary key was used for both 2048-bit "halves" of the 4096-bit message input; this meant that the two blocks could be interchanged and thus trivially creating a collision. The error originated because I had been developing two versions of the hash function before the submission date, one operating on single 2048-bit input blocks and the other using a multi-tier tree structure (for the key schedule only). The differences only affected how the principle key was dervied at, with the remainder of the algorithm identical. The single 2048-bit input block processing was too slow, and while the tree structure should have theoretically been fairly fast the code to implement it was too complex and slow. The tree structure had used a scheme to prevent block interchange between any of the tree nodes. Fairly near to the deadline, I had decided that a double-block processing would be a good design compromise and it performed as expected, unfortunatly I had neglected to incorporate the block interchange protection that I originally had used in the tree structure - hence the problem. In this new version, the error has been remedied by doing the following: - Creating left, right and combined preliminary keys. - The left preliminary key is used when processing the left half of the input block, while the right preliminary key used in processing the right half. - The combined preliminary key is the xor sum of the left and right preliminary keys and is used in the "extract" functions (previously, "the" preliminary key was used here). - The actual scheme for deriving the principle key from the input block with the preliminary keys has been changed slightly to ensure that it is not possible for an attacker to "switch" the left and right preliminary keys by careful modification of the input message blocks. This is the only change to the actual algorithm and even then only affects how the principle key is arrived at. The round key extraction, permutation matrix generation and state update are not affected. There are a few implementation errors corrected where the reference implementation was buggy or didn't properly follow the specification. There are also a large number of typographical errors corrected in the specification. The performance timings have also been recalculated, as have the KAT and intermediate value results.