https://github.com/cr-marcstevens/sha1collisiondetection/blob/master/lib/sha1.c
>>59095583
>pcode]#define SHA1_RECOMPRESSION_SIMD(t) sha1recompress_fast_ ## t ## _avx256[/code]
Go ahead, write a for loop for that in c and see what comes out. I'll wait.
>>59095583
there surely must be a reason for that
Manual loop unrolling seems to be a crypto programmer meme. I wish I knew why they did it. It could be because they are mathematicians first and programmers second. It could be because they don't know about --funroll-all-loops. There could be a legitimate reason.
>>59095642
Probably because they need deterministic timing and can't rely on what they are pretty sure the compiler's doing.
Or maybe they only want some loops unrolled. Or maybe it needs to work with a variety of compilers.
>>59095673
I guess that's probably it, but I thought loops with fixed bounds had deterministic timing anyway. It definitely shouldn't be secret-dependent, not that it matters for this particular program. If the compiler really decides not to unroll the loop with -funroll-all-loops, maybe it's making the right decision and it's trying to save space in the instruction cache.
Do they benchmark it at least? My guess is that they probably don't, since you see macros used for loop unrolling in almost all crypto code. They probably just do it by habit.
>>59095739
I'm sure he didn't dive into writing asm for every platform. It probably started all in c and was expanded to asm later.
Contributing
>>59095583
>>59095884
Wtf is trying to be accomplished here.
When I was first learning crypto, I read this.
https://cr.yp.to/mac.html
Still my favorite, he must have spent so much time autism optimizing it. It's like he's writting assembly in C, alot of the C files look like this:register uint32 x0;
register uint32 x1;
register uint32 x2;
register uint32 x3;
register uint32 y0;
register uint32 y1;
register uint32 y2;
register uint32 y3;
register uint32 byte0;
register uint32 byte1;
register uint32 byte2;
register uint32 byte3;
register uint32 e;
register uint32 p00;
register uint32 p01;
...
byte0 = 0xff;
byte1 = 0xff00;
byte2 = 0xff0000;
byte3 = 0xff000000;
loop4 = -36;
k30 = *(uchar *) (k + 15);
k31 = *(uchar *) (k + 14);
k32 = *(uchar *) (k + 13);
k33 = *(uchar *) (k + 12);
k31 <<= 8;
k32 <<= 16;
k33 <<= 24;
x3 = k30 ^ k31;
x3 ^= k32;
x3 ^= k33;
k00 = *(uchar *) (k + 3);
k01 = *(uchar *) (k + 2);
k02 = *(uchar *) (k + 1);
k03 = *(uchar *) (k + 0);
k01 <<= 8;
k02 <<= 16;
k03 <<= 24;
x0 = k00 ^ k01;
x0 ^= k02;
x0 ^= k03;
k10 = *(uchar *) (k + 7);
k11 = *(uchar *) (k + 6);
k12 = *(uchar *) (k + 5);
k13 = *(uchar *) (k + 4);
k11 <<= 8;
k12 <<= 16;
k13 <<= 24;
x1 = k10 ^ k11;
x1 ^= k12;
x1 ^= k13;
...
>>59096115
Wow. Unless he benchmarked it, that's some advanced bullshit. The compiler is free to totally ignore the register keyword. If you try to reserve 15 registers, the register allocator is free to just say 'fuck you.'
>>59096160
I cut out a lot of them so it would fit in a single post, in this function there was 73 variables. Every single one has the register keyword.
>>59095583
block editing was a mistake.
I can sort of understand why he didn't just use a pointer to change the different functions, but writing a loop would make it so much easier for him.
>>59096210
That's definitely retarded then. Even the PowerPC only has 32 general purpose registers. The 'register' keyword isn't just a 'go fast' option. If anything, it can actually slow down code by preventing the register allocator from doing its job.
I wonder why the prevailing wisdom says that you shouldn't implement your own crypto code when the code written by """"""cryptographers"""""" looks like this. Why do we trust Bernstein again?
>>59096113
It's making a character in floating point space I think. Like a vector of a text.
>>59096113
If I recall correctly, this is from some Minecraft clone (don't ask me how I got there), so >>59096304 sounds about right.