Date
7 November, 2021
Topics
Speakers
Transcript by
Shourya742 via review.btctranscripts.com
Anyway, like I said, this is a technical talk and we're going to talk about Schnorr signatures and the benefits that it brings. So let's get started.
So here's the agenda. We're going to talk about, we're going to review the signature algorithm that's currently on Bitcoin. This is Elliptic Curve Digital Signature Algorithm or ECDSA. Then we're going to go describe what Schnorr signatures are. And then we're going to go over all of the key and signature aggregation stuff that you can do or some of the benefits that you get out of Schnorr. So let's get started.
Here is the Elliptic Curve Digital Signature Algorithm or ECDSA. And we believe that Satoshi chose ECDSA back in 2009 because Schnorr was under patent. Patent expired a couple of years later. So we are able to use it now without any sort of like legal trouble or possibility thereof. But ECDSA is actually like a little more complicated than Schnorr as you'll see. But anyway, the way it works is using some sort of like a private key, which we're going to call e
. Private key is a scalar. It's a 256 bit number. This is sort of like the secret that you have to keep away from other people. The public key is the resulting point. And if you are not familiar with Elliptic Curve cryptography or finite fields or elliptic curves, I do have a book, Programming Bitcoin, that describes a lot of it and goes through it like step by step. So it's not going to make that much sense unless you know that stuff. But it is eG = P
. P
is the public key. And that public key is actually two numbers. It's a coordinate system P = (x,y)
. An X coordinate and a Y coordinate. And that's kind of important for what we're going to do. All right. So what does ECDSA actually look like? Here's the actual overview of it. The main equation in ECDSA is this: z/s*Gโ + r/s*P โ= (r,y)
. If you haven't seen this before, it's kind of confusing because people don't know what these things stand for. But the thing in front of the capital letters is a scalar of some kind. It's a 256 bit number. And the division that's being done is what you would call group division or field division. It's not the division that you think. And it's actually quite costly on the CPU. All right. But r
is the result committed by r/s*P
. So you're committing to the result that you're going to get. So you evaluate the left side of the equation and you're expecting little r
to come up. But that little r
also shows up right before P
. So you commit to the target that you're going to hit. You're committing to the result that you're going to get by putting it into that equation. z
is the hash of the message. In Bitcoin, this can be any message that you want. But generally, it's a transaction that you are performing. So it's generally on a per input basis. Like, I want this transaction to go through, so I am committing to this particular message. And I am signing this particular message. And that's z/s*G
. And s
is computed by the signer. This is where the division stuff comes in. Which requires the private key e
. So s
cannot be computed without the secret, the private key. And that's the key to how signing works. But that's a general overview. The actual verification algorithm, once again, uses this equation.
And basically, we already know the hash of the message or what message was signed. We already know the public key that's been communicated to us in Bitcoin. That's obvious because it's in the script sig or in the witness field. And we also have the signature. And the signature is a pair,(r,s)
. r
is the committed target, right, where we're going with this equation. And s
is the thing that makes all of the variables work. And the verifier calculates the left side of that equation, z/s*G + r/s*P
. And sees if the result is the value that it was committed to. And the commitment is only for the X-coordinate. It's r
. That's it. And if that R
matches the R
that you used in the left side of the equation, then you know that the signer actually knew the secret. That's the key to what ECDSA is.
The signature algorithm is very much standard. z
is the hash of the message. You start with that. You know the public key. You know the secret. You generate a nonce, a k
. And this nonce stands for number used only once. You're only supposed to use it once. And you compute k*G
, which has that R-coordinate. And you're going to use that R-coordinate along with your secret and the z
and the k
to get this number s
. And that is essentially your signature, r
and s
. And it turns out everything goes through because of this equation,z/s*G + r/s*P
. We know that P = e*G
. So we can factor that out in the second part.z/s*G + r/s*G
. And then you can combine the terms because it's a scalar on front. And you get (z + r*e)/s
. And if you look at the definition of s
, two lines above, you get s = (z + r*e)/k
. And you sub that. And you get k*G
. k*G
, as before, was (r, y)
. So everything sort of just goes through. And you can prove that the target that you committed to is, the result that you committed to, is actually the result that actually happens. And that's how you sign something so that a verifier can know that you actually hold the secret. That's how ECDSA works.
This seems a little backwards. It's because it kind of is. Because Schnorr was created first before ECDSA. And there are lots of signature and verification algorithms. But in order to sort of get around the Schnorr patent, this is what Satoshi kind of had to use. But the big thing to notice about ECDSA is that both signing and verification rely on some sort of field division or group division or whatever you want to call it. It's division. And that is a very intensive computer for any sort of processor. If you study this stuff, the cycles that division uses is a lot higher than almost any other operation. And this is because we're doing finite field math. Again, you can read my book, Programming Bitcoin. It'll describe exactly why. But basically, it's a very large exponent that you have to take it to. The signer commits to just the X-coordinater
. So that means that it doesn't commit to the entire point. So you can't do certain things that would be nice. If it committed to the entire point, then we could do things like batch verification and signature aggregation, which Schnorr allows. But because ECDSA only commits to the x-coordinate, you can't add or subtract or anything like that. Because a lot of optimizations have been made on Bitcoin based on this property that you only need to commit to the x-coordinate. And the y-coordinate can be whatever. And if the x
is valid, then there are two possible y
's every time. So it makes it a little bit harder. The commitments to r
and s
are separate, right? Like r/s*G
, so that it's in a separate term. Then r/s*P
. And the z
is z/s*G
. So they're in separate terms because you have to commit to them separately. There are all sorts of weird attacks that you can do if you commit to them at the same time. So those are some of the weaknesses of ECDSA.
All right, so here's what a Schnorr signature is like. It relies on a cryptographically secure hash function, H
, instead of division. And that's the key. We can use this additional construct called a hashing function, a cryptographically secure hashing function, which has pre-image resistance, and collision resistance. And collision resistance, it's very difficult to find a hash that goes to a certain value. And the signer can commit to multiple things at once, just by hashing them together, right? Like you can hash multiple things and serialize them and add them together or whatever. And that commits you to those values because you can't find the hash pre-image. That's the key. So here's an overview of how that works.
This is the key equation, much like in ECDSA. It's s*G - H(R,P,z)*P = R
. And R
is the result that's actually committed, right? Like you notice that R
is on both sides of the equation. R
is the result that we're going to get, but it's also committed beforehand through that hash. Z
is the hash of the message, and it's also committed in that hash. You can kind of do it at once. And in fact, you can see there that it's not just committing to the result and the hash of the message, but we're also committing to the public key. That's actually, so that you don't do anything weird with it. And s
is computed by the signer, which requires the private key e
. So this is very similar to ECDSA in that regard. You have e
, which is used to compute s
.
So here's what the verification algorithm looks like.s*G - H(R,P,z)*P = R
. That's the equation. z
is the hash of the message, which this verifier knows. P
is the public key, which the verifier knows. The signature is the pair (R, s)
, which the verifier knows. Okay, that's enough to actually run this equation. And the verifier calculates to see if the equation on the left side, s*G - H(R,P,z)*P = R
. And if it does actually go to R
, then they know that whoever signed it knows the secret. That's the key. That's what all verification algorithms essentially do.
The actual signature algorithm is very similar to ECDSA in that the z
is the hash of the message. We know the secret e
, which has the public point P
. You generate a nonce, and you can go look up BIP 340. It has a very specific way to generate the nonce. But you generate some sort of nonce, a number used only once. And you compute the resulting point from k
, and you get R
. And you compute s
based on this formula, k
plus the hash of the committed result, the committed pub key, the committed hash of the message, times the secret. And that definitely requires the private key. The signature, again, is the pair, the resulting point, and s
. And you can see that everything kind of canceled. s*G - H(R,P,z)*P
. And we know that P = e*G
, so we can sort of collect the terms together. We also know what s
is. And you collect all the terms together, H(R,P,z)*e
cancel each other out. So you get k*G
, and k*G
is defined as R
. So you get what you would expect, and that's how the signature algorithm in Schnorr works.
So the key thing here is that both the signing and verification rely on that cryptographically secure hash function H instead of division, right? Like in ECDSA, you need a very expensive operation in order to do it. This just requires a hashing function, which is a lot less expensive for a CPU. And the signer commits to a resulting point, R
, instead of just the X coordinate. And this is what gives us sort of like the magical properties that we're gonna sort of like look at shortly. But it's not just the X coordinate, it's the entire point, right, R
. And that's huge, because, and also the commitments to R
and z
are combined in a single hash instead of having different terms for it. With ECDSA, you had z/s
and then r/s
. So different terms have different commitments. This one, it's just in a single hash. You could commit to all sorts of things in that hash. All right, so we can commit to even more stuff, like the public key that we're using. So what does this all mean? That means that we get certain advantages with Schnorr. We get certain abilities that we didn't have before, because the result is a point. We commit to a point, right? And that means we can do batch verification. We can do signature aggregation, and we can do pub key aggregation. So let's start with batch verification.
So the equation I gave you before is s*G - H(R,P,z)*P = R
. That was it. And you can have lots and lots of those. For a transaction with 30 inputs, you would have 30 of these. Or for a block with thousands of inputs, you can have thousands of these. The key is that because the right side is a point and you're not committing just to the x coordinate, this equation works. You can sum over all of the different i
's, all of the different inputs in a particular transaction, or all of the inputs in a particular block, or even all of the inputs in the entire blockchain, right? Like as long as everything's in Schnorr, you can verify them all at once. And you can check this, and you can move some of the terms around, and you can get this equation at the end. And the key to the efficiency of this particular equation is that the left side is just modulo arithmetic. And you just add up all the s
's. And the right side is a little more expensive, but you get a bunch of point addition in the last term here. The sum from i
to n
of R_i
, which is not that expensive. And then you have point multiplication with an arbitrary point in the middle. Those are the most expensive, but you get them all out of the way at once. So you get a lot more efficiency with batch verification. You can verify 40 transactions at once, right? If they only use Schnorr signatures, then you can do this, because you are committing to a point and not an X coordinate. That's the key, and that's one of the properties that we get out of Schnorr. The actual batch verification, by the way, is a little more complicated. And there are certain attacks that it protects against. So there are all these coefficients in front of R
and stuff like that. You can go look up the recommended way, and I believe in BIP 340. So you can go look that up. But that's the basic idea.
The other thing that we can do is we can do something called signature aggregation. And this is the ability to produce a single signature, (R,s)
, by a group of public keys. You can have lots of public keys, 1 through n, and they can produce a single signature that's essentially signed by all of them. So this is n of n, right? And each public key has a private key, and that's how we denote it, e_i
and P_i
. And L
is sort of like a commitment to this group of public keys. And we can use the same or different hash function. And every single one of those private key holders creates their own nonce, k_i
, and you get the resulting point, R_i
. And R
, the commitment, committed point that we're going to, is going to be the sum of all those points. And finally, c_i
is this hash commitment to L
, which is a commitment to every single one of those pub keys. And this particular pub key, along with R
and z
, which we know is needed as part of the signature algorithm. And the actual s
that makes everything work is k_i + c_i*e_i
, and the little s ends up being the sum of all of the S_i's
. And it turns out that if you do this, s sub, s*G - c_i*P_i
. You can, we know what P_i
is. It's e_i*G
, and we can combine the terms there. And s_i - c_i*e_i
, if you substitute k_i + c_i*e_i
, that ends up canceling out. So you just get k_i*G
, which is R_i
. And the sum of the R_i
's is R
as it is like three lines above. So you end up getting the result that you want. And that's what we call signature aggregation of various signatures. So you get a signature from each member of this pub key group. And you're able to combine them and have a single signature that represents all of them, which is kind of remarkable. And this was, I think, Bellara and Navin, and that's the paper that it comes from. But you can actually do better. Whoops. What you can do is you can go to pub key aggregation.
You can actually aggregate the pub key. So you can produce a single signature by a group of pub keys and represent the single pub key, P
, that satisfies this thing. That's the original equation for Schnorr signatures. And what that means is, essentially, you can't tell if it's multiple keys or not. And that's a huge property of Schnorr signatures. So once again, you have different public keys, you have different secrets. And you commit to the group of pub keys that you're signing with. And each individual private key holder generates their own nonce. And R
, the point that we're committing to, the result that we're committing to, is the sum of all of these points. And the pub key, the aggregated pub key, ends up being this thing, H(L,P_i)*P_i
. So it's like, okay, out of this group of pub keys, it's this particular pub key that's going to sign. And you add all of these pub keys multiplied by these scalars together, and you get capital P
. So then you compute c_i
, which is that hash times the other hash, which is what you want to commit to. And these are the things that are computed by each private key holder, and s
is the sum of these things. And you can work out the math. The actual equation ends up being s*G
minus this thing, because we know the definition of P = H(L,P_i)*P_i
. And this thing, H(R,P,z)*H(L,P_i) = c_i
, as it is two lines above, and you substitute that. And you combine terms and you get this thing. And if you look at the definition of s_i = k_i + c_i*e_i
, and everything cancels except for k_i
. So what you get are R_i
's, and the sum of those is the target that you want. So everything just sort of works, right? Like you get public key aggregation and signature aggregation. So as far as on-chain goes, no one knows whether or not you are five people or one person or 100 people. And that's huge, because it gives you privacy and it gives you sort of this ability to hide the fact this was actually a multi-signal. And this is one of the wonderful things about Schnorr is that because it's committing to a public point, you have this ability to sort of like manipulate things and make all of the math kind of work out.
So here are some observations. Schnorr uses hash functions, which makes it a lot faster, okay? And that's the key, right? Well, that's one of the keys, is that for a full node, when you're running Schnorr, you don't have to do all of this division, which is very CPU intensive. Instead, you can use hash functions, which are a lot faster. The other thing is that Schnorr uses a commit to an elliptic curve point, instead of just the x-coordinate. That means that we can do aggregation and batching. And that means really that you can have many signatures combined into one, and many pub keys combined into one, and on-chain you can't tell the difference. Ultimately, that means that with Schnorr, we're going to get some of these features in wallets, right? Like this ability to have sort of like multi-sig without anyone on-chain knowing about it. And that's what this taproot upgrade sort of allows, is the security around how you're actually spending things. And pretty much if you do two or three multi-sig, what you end up on taproot is three branches of the tap, the mast tree, Merkleized Abstract Syntax Tree, that are all two of two. And they all look like single keys. And as far as anyone on-chain is looking at it, they can't tell if it's one or seven or 20, gives you a lot more privacy, right? And this ability to save some blockchain space, because you only really ever have one signature. I mean, it's possible you can have more, and you can certainly write scripts that are more. But ultimately, if you're designing it correctly, you should be able to get away with one signature that's aggregated rather than multiple separate signatures that you have in ECDSA.
All right, so that's about it. You can follow me on Twitter, at Jimmy Song, and GitHub at Jimmy Song, Medium at Jimmy Song, Substack at Jimmy Song. And yeah, thanks. Do we have a few minutes for questions or anything, or no?
Q&A moderator [Q]: Hey, Jimmy, this is a real question. Somebody asks, could the rogue key attack be mitigated by using commitments instead of key transformations?
Jimmy Song [A]: Yeah, the rogue key attack is an interesting one. And that's essentially when you're generating all the nonces, there's a whole round where you have to commit to it. I think Musig 2 is an algorithm. There's a paper written by Blockstream where you can kinda commit to it some other way in an interesting way. I can't remember the details. I'm sure Andrew Poelstra could tell you if you were interested. But there is a way to mitigate specifically for the rogue key attack. Yeah, thank you. I enjoyed your hieroglyphics that you had up earlier. All LaTeX, yeah. Andrew, you can speak to that if you want. There we go.
Andrew Poelstra [A]: Yeah, I can answer the rogue key attack question. I can explain what a rogue key attack is. I actually don't fully understand the question. So a rogue key attack is like if you use the naive aggregation scheme for combining keys like Jimmy described, then somebody could... If you're trying to add three keys together, say A+B+C
, the person who chooses key C
could do so in a way where they can basically cancel out the other two keys. So if I'm an attacker in a three-party multi-signature, I could wait to hear the other two keys, then I think of a key, A-B
from that, and then present that as my key. And then when you add them all together, you get A+B-A-B+C
, and everything kind of collapses to just C
, my own key. And so everyone thinks that they're in a three-of-three multi-signature, but actually I have all the key material, so I can design by myself. So they are deceived. So the way to avoid this in Musig and in Musig 2 is rather than directly adding keys together, you first re-randomize the keys with a certain hash structure, which just prevents an attacker from being able to solve equations like this. And then the question was whether you could use commitments rather than key tweaking, I think. And I'm not sure what the asker means by that.
Jimmy Song [A]: So I think what the asker was asking was something like the commitment... Musig 1 uses sort of like a hash of the pub key that you have to send out to everybody, and that commits you to the pub key. I think that's the idea. And instead of that, Musig 2 does what you kind of described, which I didn't really get. But yeah, I think that's the mitigation, basically. Yeah. All right. Any other questions? All right. Thanks.
Community-maintained archive to unlocking knowledge from technical bitcoin transcripts