How do Ed5519 keys work?
There are several different implementations of the Ed25519 signature system, and they each use slightly different key formats. While writing python-ed25519, I wanted to validate it against the upstream known-answer-tests, so I had to figure out how to convert those keys into a format that my code could use.
The best reference is the original paper, which explains everything you need to know to implement the basic algorithm. I currently know of three or four implementations (many of which are on the Ed25519 software page):
- a portable C “reference” version
- a 10x faster AMD64-optimized C/assembly version (in two flavors, with different speed/memory tradeoffs)
- an educational (but unusably slow) pure-python module
- my Python binding to the C reference version
djb’s NaCl library (as of version 20110221) includes an older copy prototype of the portable C version, in nacl-20110221/crypto_sign/edwards5519sha512batch/ref/ (as far as I can tell, it doesn’t implement quite the same algorithm: it doesn’t include the pubkey in the hash when computing S. UPDATE: djb told me that nacl-20110221′s code is just a prototype, and will be replaced with the real Ed25519 in the next release). The SUPERCOP benchmark suite includes both the portable C and the two AMD64-specific flavors (in supercop-20110704/crypto_sign/ed25519/, under the ref, amd64-51-30k, and amd64-64-24k subdirectories). The pedagogical pure-python implementation is referenced by the ed25519 web page, but since it takes tens of seconds to compute each signature, it’s really only useful for validating the output of other implementations.
My python-ed25519 library provides Python bindings to the portable C version, using code taken from the SUPERCOP suite.
These different implementations all arrange the keys and signatures slightly differently. NaCl’s goal is to provide safe high-level box/unbox functions, so the only interface it offers will copy the entire message into the output and append a signature to it (not so convenient for building into other protocols). SUPERCOP’s goal is to measure the speed of each operation, so its interface isn’t so convenient for general use either. I’ve tried to give python-ed25519 a good interface for use in other projects.
To remind myself about how these different implementations handle keys, here’s an overview.
Ed25519 keys start life as a 32-byte (256-bit) uniformly random binary seed (e.g. the output of SHA256 on some random input). The seed is then hashed using SHA512, which gets you 64 bytes (512 bits), which is then split into a “left half” (the first 32 bytes) and a “right half”. The left half is massaged into a curve25519 private scalar “a” by setting and clearing a few high/low-order bits. The pubkey is generated by multiplying this secret scalar by “B” (the generator), which yields a 32-byte/256-bit group element “A”.
When signatures are made, two values result: R and S (both 32-bytes, so the overall signature is 64 bytes long). R depends upon the right half of the expanded seed and on the message. (In traditional DSA, R is randomly generated, and the security of the private key depends upon the quality of that randomness, leading to some high-profile failures). S is the real meat of the signature, and is a function of everything (including the pubkey).
Private signing keys can be generated from just the 32-byte seed, but require additional work before you can make signatures (SHA512-based expansion, the bit setting/clearing massage step, and pubkey exponentiation). Likewise public verifying keys can be derived from either the private seed or the private exponent. (note that it’s the multiplications that take the most time: everything else is trivial by comparison).
If you care more about the speed of operations than storage space, you’d want to store the expanded versions. Or, you might want to store as little information as possible, and accept the performance penalty of re-deriving things when necessary. Different implementations choose different tradeoffs.
With that background, here’s what the different implementations do:
- slow pure-python “ed25519.py” implementation, with Known-Answer-Tests in sign.py and sign.input:
- publickey(sk) takes the seed, returns the 32-byte pubkey
- signature() takes the message to be signed, the seed, and the pubkey, returns the 64-byte signature
- checkvalid() takes the signature, the message, and the 32-byte pubkey
- each line of the test data contains four fields (joined by colons):
- seed+pubkey (64 bytes)
- just the pubkey (32 bytes)
- the message (variable length)
- the signature concatenated with the message (R+S+msg)
- NaCl-20110221 implementation:
- keypair() returns two values. The first (signing key) is the private scalar (32 bytes) concatenated with the “right half” (also 32 bytes). The second is the pubkey (32 bytes)
- sign() takes the signing key and message. The NaCl version doesn’t take the pubkey, and thus diverges from the ed25519.py implementation: S=H(r+msg), not H(r+pubkey+msg). ed25519.py matches the paper, as does SUPERCOP, but NaCl does not. sign() returns R+msg+S
- SUPERCOP-20110704 implementation:
- keypair() returns two values. The first (signing key) is the seed (32 bytes) concatenated with the generated pubkey (32 bytes). The second is the pubkey (32 bytes).
- sign() returns R+S+msg
- python-ed25519 (using the SUPERCOP code but changing the output):
- ed25519.create_keypair() returns a SigningKey object and a VerifyingKey object
- SigningKeys can be serialized with to_bytes() (which returns the same 64 bytes as the SUPERCOP code that it wraps: the seed concatenated with the pubkey A), or with to_seed() (which returns just the 32-byte seed).
- VerifyingKeys can be serialized with to_bytes(), which returns the
32-byte pubkey A. - sk.sign() returns a 64-byte signature (the concatenation of R and S) without returning a copy of the message too. (internally, this invokes SUPERCOP’s crypto_sign() function with an output buffer large enough to hold R+S+msg, then discards the message).
This is summarized in the following diagram:
