Skip to main content

Appendix A: Cryptography & Encoding

The appendix chapter contains various protocol details.

A.1. Cryptographic Algorithms

A.1.1. Hash Functions

A.1.1.1. BLAKE2

BLAKE2 is a collection of cryptographic hash functions known for their high speed. Their design closely resembles BLAKE which has been a finalist in the SHA-3 competition.

Polkadot is using the Blake2b variant, which is optimized for 64-bit platforms. Unless otherwise specified, the Blake2b hash function with a 256-bit output is used whenever Blake2b is invoked in this document. The detailed specification and sample implementations of all variants of Blake2 hash functions can be found in RFC 7693 (1).

A.1.2. Randomness

info

TBH

A.1.3. VRF

A Verifiable Random Function (VRF) is a mathematical operation that takes some input and produces a random number using a secret key along with a proof of authenticity that this random number was generated using the submitter’s secret key and the given input. Any challenger can verify the proof to ensure the random number generation is valid and has not been tampered with (for example, to the benefit of the submitter).

In Polkadot, VRFs are used for the BABE block production lottery by Block-Production-Lottery and the parachain approval voting mechanism (Section 8.5.). The VRF uses a mechanism similar to algorithms introduced in the following papers:

It essentially generates a deterministic elliptic curve based on Schnorr signature as a verifiable random value. The elliptic curve group used in the VRF function is the Ristretto group specified in:

Definition 181. VRF Proof

The VRF proof proves the correctness of an associated VRF output. The VRF proof, P{P}, is a data structure of the following format:

P=(C,S){P}={\left({C},{S}\right)}
S=(b0,b31){S}={\left({b}_{{0}},\ldots{b}_{{31}}\right)}

where C{C} is the challenge and S{S} is the 32-byte Schnorr poof. Both are expressed as Curve25519 scalars as defined in Definition Definition 182.

Definition 182. DLEQ Prove

The dleq_prove(t,i)\text{dleq\_prove}{\left({t},{i}\right)} function creates a proof for a given input, i{i}, based on the provided transcript, T{T}.

First:

t1=append(t,’proto-name’,’DLEQProof’){t}_{{1}}=\text{append}{\left({t},\text{'proto-name'},\text{'DLEQProof'}\right)}
t2=append(t1,’vrf:h’,i){t}_{{2}}=\text{append}{\left({t}_{{1}},\text{'vrf:h'},{i}\right)}

Then the witness scalar is calculated, sw{s}_{{w}}, where w{w} is the 32-byte secret seed used for nonce generation in the context of sr25519.

t3=meta-AD(t2,’proving00’,more=False)t4=meta-AD(t3,wl,more=True)t5=KEY(t4,w,more=False)t6=meta-AD(t5,’rng’,more=False)t7=KEY(t6,r,more=False)t8=meta-AD(t7,e_(64),more=False)(ϕ,sw)=PRF(t8,more=False)\begin{aligned} t_3 &= \text{meta-AD}(t_2, \text{'proving\\00'}, \text{more=False}) \\ t_4 &= \text{meta-AD}(t_3, w_l, \text{more=True}) \\ t_5 &= \text{KEY}(t_4, w, \text{more=False}) \\ t_6 &= \text{meta-AD}(t_5, \text{'rng'}, \text{more=False}) \\ t_7 &= \text{KEY}(t_6, r, \text{more=False}) \\ t_8 &= \text{meta-AD}(t_7, e\_(64), \text{more=False}) \\ (\phi, s_w) &= \text{PRF}(t_8, \text{more=False}) \end{aligned}

where wl{w}_{{l}} is the length of the witness, encoded as a 32-bit little-endian integer. r{r} is a 32-byte array containing the secret witness scalar.

l1=append(t2,vrf:R=gr,sw)l2=append(l1,vrf:hr,si)l3=append(l2,’vrf:pk’,sp)l4=append(l3,vrf:hsk,vrfo)\begin{aligned} l_1 &= \text{append}(t_2, \text{'}\text{vrf:R=g}^r\text{'}, s_w) \\ l_2 &= \text{append}(l_1, \text{'}\text{vrf:h}^r\text{'}, s_i) \\ l_3 &= \text{append}(l_2, \text{'}\text{vrf:pk}\text{'}, s_p) \\ l_4 &= \text{append}(l_3, \text{'}\text{vrf:h}^{sk}\text{'}, \text{vrf}_{o}) \end{aligned}

where

  • si{s}_{{i}} is the compressed Ristretto point of the scalar input.

  • sp{s}_{{p}} is the compressed Ristretto point of the public key.

  • sw{s}_{{w}} is the compressed Ristretto point of the wittness:

For the 64-byte challenge:

l5=meta-AD(l4,’prove’,more=False){l}_{{5}}=\text{meta-AD}{\left({l}_{{4}},\text{'prove'},\text{more=False}\right)}
l6=meta-AD(l5,e64,more=True){l}_{{6}}=\text{meta-AD}{\left({l}_{{5}},{e}_{{{64}}},\text{more=True}\right)}
C=PRF(l6,more=False){C}=\text{PRF}{\left({l}_{{6}},\text{more=False}\right)}

And the Schnorr proof:

S=sw(Cp){S}={s}_{{w}}-{\left({C}\cdot{p}\right)}

where p{p} is the secret key.

Definition 183. DLEQ Verify

The dleq_verify(i,o,P,pk)\text{dleq\_verify}{\left({i},{o},{P},{p}_{{k}}\right)} function verifiers the VRF input, i{i} against the output, o{o}, with the associated proof (Definition 181) and public key, pk{p}_{{k}}.

t1=append(t,’proto-name’,’DLEQProof’)t2=append(t1,’vrf:h’,si)t3=append(t2,vrf:R=gr,R)t4=append(t3,vrf:hr,H)t5=append(t4,’vrf:pk’,pk)t6=append(t5,vrf:hsk,o)\begin{aligned} t_1 &= \text{append}(t, \text{'}\text{proto-name}\text{'}, \text{'}\text{DLEQProof}\text{'}) \\ t_2 &= \text{append}(t_1, \text{'}\text{vrf:h}\text{'}, s_i) \\ t_3 &= \text{append}(t_2, \text{'}\text{vrf:R=g}^r\text{'}, R) \\ t_4 &= \text{append}(t_3, \text{'}\text{vrf:h}^r\text{'}, H) \\ t_5 &= \text{append}(t_4, \text{'}\text{vrf:pk}\text{'}, p_k) \\ t_6 &= \text{append}(t_5, \text{'}\text{vrf:h}^{sk}\text{'}, o) \end{aligned}

where

  • R{R} is calculated as:

    R=CP×pk+SP+B{R}={C}\in{P}\times{p}_{{k}}+{S}\in{P}+{B}

    where B{B} is the Ristretto basepoint.

  • H{H} is calculated as:

    H=CP×o+SP×i{H}={C}\in{P}\times{o}+{S}\in{P}\times{i}

The challenge is valid if CP{C}\in{P} equals y{y}:

t7=meta-AD(t6,’prove’,more=False){t}_{{7}}=\text{meta-AD}{\left({t}_{{6}},\text{'prove'},\text{more=False}\right)}
t8=meta-AD(t7,e64,more=True){t}_{{8}}=\text{meta-AD}{\left({t}_{{7}},{e}_{{{64}}},\text{more=True}\right)}
y=PRF(t8,more=False){y}=\text{PRF}{\left({t}_{{8}},\text{more=False}\right)}

A.1.3.1. Transcript

A VRF transcript serves as a domain-specific separator of cryptographic protocols and is represented as a mathematical object, as defined by Merlin, which defines how that object is generated and encoded. The usage of the transcript is implementation specific, such as for certain mechanisms in the Availability & Validity chapter (Chapter 8), and is therefore described in more detail in those protocols. The input value used to initiate the transcript is referred to as a context (Definition 184).

Definition 184. VRF Context

The VRF context is a constant byte array used to initiate the VRF transcript. The VRF context is constant for all users of the VRF for the specific context for which the VRF function is used. Context prevents VRF values generated by the same nodes for other purposes to be reused for purposes not meant to. For example, the VRF context for the BABE Production lottery defined in Section 5.2. is set to be "substrate-babe-vrf".

Definition 185. VRF Transcript

A transcript, or VRF transcript, is a STROBE object, obj\text{obj}, as defined in the STROBE documentation, respectively section "5. State of a STROBE object".

obj=(st,pos,posbegin,I0)\text{obj}={\left(\text{st},\text{pos},\text{pos}_{{\text{begin}}},{I}_{{0}}\right)}

where

  • The duplex state, st\text{st}, is a 200-byte array created by the keccak-f1600 sponge function on the initial STROBE state. Specifically, R is of value 166, and X.Y.Z is of value 1.0.2.

  • pos\text{pos} has the initial value of 0.

  • posbegin\text{pos}_{{\text{begin}}} has the initial value of 0.

  • I0{I}_{{0}} has the initial value of 0.

Then, the meta-AD operation (Definition 186) (where more=False) is used to add the protocol label Merlin v1.0 to obj\text{obj} followed by appending (Section A.1.3.1.1.) label dom-step and its corresponding context, ctx{c}{t}{x}, resulting in the final transcript, T{T}.

t=meta-AD(obj,’Merlin v1.0’,False){t}=\text{meta-AD}{\left({o}{b}{j},\text{'Merlin v1.0'},\text{False}\right)}
T=append(t,’dom-step’,ctx){T}=\text{append}{\left({t},\text{'dom-step'},\text{ctx}\right)}

ctx\text{ctx} serves as an arbitrary identifier/separator and its value is defined by the protocol specification individually. This transcript is treated just like a STROBE object, wherein any operations (Definition 186) on it modify the values such as pos\text{pos} and posbegin\text{pos}_{{\text{begin}}}.

Formally, when creating a transcript, we refer to it as Transcript(ctx)\text{Transcript}{\left({c}{t}{x}\right)}.

Definition 186. STROBE Operations

STROBE operations are described in the STROBE specification, respectively section "6. Strobe operations". Operations are indicated by their corresponding bitfield, as described in section "6.2. Operations and flags" and implemented as described in section "7. Implementation of operations"

A.1.3.1.1. Messages

Appending messages, or "data," to the transcript (Definition 185) first requires meta-AD operations for a given label of the messages, including the size of the message, followed by an AD operation on the message itself. The size of the message is a 4-byte, little-endian encoded integer.

T0=meta-AD(T,l,False){T}_{{0}}=\text{meta-AD}{\left({T},{l},\text{False}\right)}
T1=meta-AD(T0,ml,True){T}_{{1}}=\text{meta-AD}{\left({T}_{{0}},{m}_{{l}},\text{True}\right)}
T2=AD(T1,m,False){T}_{{2}}=\text{AD}{\left({T}_{{1}},{m},\text{False}\right)}

where T{T} is the transcript (Definition 185), l{l} is the given label and m{m} the message, respectively ml{m}_{{l}} representing its size. T2{T}_{{2}} is the resulting transcript with the appended data. STROBE operations are described in Definition 186.

Formally, when appending a message, we refer to it as append(T,l,m)\text{append}{\left({T},{l},{m}\right)}.

A.1.4. Cryptographic Keys

Various types of keys are used in Polkadot to prove the identity of the actors involved in the Polkadot Protocols. To improve the security of the users, each key type has its own unique function and must be treated differently, as described in this Section.

Definition 187. Account Key

Account key (ska,pka){\left({s}{k}^{{a}},{p}{k}^{{a}}\right)} is a key pair of type of either of the schemes in the following table:

Table 2. List of the public key scheme that can be used for an account key
Key SchemeDescription
sr25519Schnorr signature on Ristretto compressed ed25519 points as implemented in TODO
ed25519The ed25519 signature complies with (4) except for the verification process which adhere to Ed25519 Zebra variant specified in (5). In short, the signature point is not assumed to be in the prime-ordered subgroup group. As such, the verifier must explicitly clear the cofactor during the course of verifying the signature equation.
secp256k1Only for outgoing transfer transactions.

An account key can be used to sign transactions among other accounts and balance-related functions. Keys defined in Definition 187 and Definition 188 are created and managed by the user independent of the Polkadot implementation. The user notifies the network about the used keys by submitting a transaction.

Definition 188. Stash Key

The Stash key is a type of account that is intended to hold a large amount of funds. As a result, one may actively participate with a stash key, keeping the stash key offline in a secure location. It can also be used to designate a Proxy account to vote in governance proposals.

Controller accounts are deprecated

Controller accounts and controller keys are no longer supported. For more information about the deprecation, see the Polkadot wiki or a more detailed discussion in the Polkadot forum. If you want to know how to set up Stash and Staking Proxy Keys, you can also check thePolkadot wiki The following definition will be removed soon.

Definition 189. Controller Key

The Controller key is a type of account key that acts on behalf of the Stash account. It signs transactions that make decisions regarding the nomination and the validation of the other keys. It is a key that will be in direct control of a user and should mostly be kept offline, used to submit manual extrinsics. It sets preferences like payout account and commission. If used for a validator, it certifies the session keys. It only needs the required funds to pay transaction fees [TODO: key needing fund needs to be defined].

Definition 190. Session Keys

Session keys are short-lived keys that are used to authenticate validator operations. Session keys are generated by the Polkadot Host and should be changed regularly due to security reasons. Nonetheless, no validity period is enforced by the Polkadot protocol on session keys. Various types of keys used by the Polkadot Host are presented in Table 3:

Table 3. List of key schemes which are used for session keys depending on the protocol
ProtocolKey scheme
GRANDPAED25519
BABESR25519
I’m OnlineSR25519
ParachainSR25519
BEEFYsecp256k1

Session keys must be accessible by certain Polkadot Host APIs defined in Appendix B. Session keys are not meant to control the majority of the users’ funds and should only be used for their intended purpose.

A.1.4.1. Holding and staking funds

info

TBH

A.1.4.2. Designating a proxy for voting

info

TBH

A.2. Auxiliary Encodings

Definition 191. Unix Time

By Unix time, we refer to the unsigned, little-endian encoded 64-bit integer which stores the number of milliseconds that have elapsed since the Unix epoch, that is the time 00:00:00 UTC on 1 January 1970, minus leap seconds. Leap seconds are ignored, and every day is treated as if it contained exactly 86’400 seconds.

A.2.1. Binary Enconding

Definition 192. Sequence of Bytes

By a sequences of bytes or a byte array, b{b}, of length n{n}, we refer to

b=(b0,b1,,bn1)  such that  0bi255{b}\:={\left({b}_{{0}},{b}_{{1}},\ldots,{b}_{{{n}-{1}}}\right)}\ \text{ such that }\ {0}\le{b}_{{i}}\le{255}

We define Bn{\mathbb{{B}}}_{{n}} to be the set of all byte arrays of length n{n}. Furthermore, we define:

B=i=0Bi{\mathbb{{B}}}\:={\bigcup_{{{i}={0}}}^{\infty}}{\mathbb{{B}}}_{{i}}

We represent the concatenation of byte arrays a=(a0,,an){a}\:={\left({a}_{{0}},\ldots,{a}_{{n}}\right)} and b=(b0,,bm){b}\:={\left({b}_{{0}},\ldots,{b}_{{m}}\right)} by:

ab:=(a0,...,an,b0,...,bm){a}{\mid} b :=(a_0, ..., a_n, b_0, ..., b_m)
Definition 193. Bitwise Representation

For a given byte 0b255{0}\le{b}\le{255} the bitwise representation in bits bi{0,1}{b}_{{i}}\in{\left\lbrace{0},{1}\right\rbrace} is defined as:

b=b7b0{b}\:={b}_{{7}}\ldots{b}_{{0}}

where

b=27b7+26b6++20b0{b}={2}^{{7}}{b}_{{7}}+{2}^{{6}}{b}_{{6}}+\ldots+{2}^{{0}}{b}_{{0}}
Definition 194. Little Endian

By the little-endian representation of a non-negative integer, I{I}, represented as

I=(BnB0)256{I}={\left({B}_{{n}}\ldots{B}_{{0}}\right)}_{{256}}

in base 256, we refer to a byte array B=(b0,b1,,bn){B}={\left({b}_{{0}},{b}_{{1}},\ldots,{b}_{{n}}\right)} such that

bi=Bi{b}_{{i}}\:={B}_{{i}}

Accordingly, we define the function EncLE{\mathsf{\text{Enc}}}_{{{\mathsf{\text{LE}}}}}:

EncLE:Z+B;(BnB0)256(B0,B1,,Bn){\mathsf{\text{Enc}}}_{{{\mathsf{\text{LE}}}}}:{\mathbb{{Z}}}^{+}\rightarrow{\mathbb{{B}}};{\left({B}_{{n}}\ldots{B}_{{0}}\right)}_{{256}}{\mid}\rightarrow{\left({B}_{{{0},}}{B}_{{1}},\ldots,{B}_{{n}}\right)}
Definition 195. UINT32

By UINT32, we refer to a non-negative integer stored in a byte array of length 4{4} using little-endian encoding format.

A.2.2. SCALE Codec

The Polkadot Host uses Simple Concatenated Aggregate Little-Endian” (SCALE) codec to encode byte arrays as well as other data structures. SCALE provides a canonical encoding to produce consistent hash values across their implementation, including the Merkle hash proof for the State Storage.

Definition 196. Decoding

DecSC(d)\text{Dec}_{{\text{SC}}}{\left({d}\right)} refers to the decoding of a blob of data. Since the SCALE codec is not self-describing, it’s up to the decoder to validate whether the blob of data can be deserialized into the given type or data structure.

It’s accepted behavior for the decoder to partially decode the blob of data. This means any additional data that does not fit into a data structure can be ignored.

caution

Considering that the decoded data is never larger than the encoded message, this information can serve as a way to validate values that can vary in size, such as sequences (Definition 202). The decoder should strictly use the size of the encoded data as an upper bound when decoding in order to prevent denial of service attacks.

Definition 197. Tuple

The SCALE codec for Tuple, T{T}, such that:

T=(A1,An){T}\:={\left({A}_{{1}},\ldots{A}_{{n}}\right)}

Where Ai{A}_{{i}}’s are values of different types, is defined as:

EncSC(T)=EncSC(A1)||EncSC(A2)||||EncSC(An)\text{Enc}_{{\text{SC}}}{\left({T}\right)}\:=\text{Enc}_{{\text{SC}}}{\left({A}_{{1}}\right)}\text{||}\text{Enc}_{{\text{SC}}}{\left({A}_{{2}}\right)}\text{||}\ldots\text{||}\text{Enc}_{{\text{SC}}}{\left({A}_{{n}}\right)}

In the case of a tuple (or a structure), the knowledge of the shape of data is not encoded even though it is necessary for decoding. The decoder needs to derive that information from the context where the encoding/decoding is happening.

Definition 198. Varying Data Type

We define a varying data type to be an ordered set of data types.

T={T1,,Tn}{\mathcal{{T}}}={\left\lbrace{T}_{{1}},\ldots,{T}_{{n}}\right\rbrace}

A value A{A} of varying data type is a pair (AType,AValue){\left({A}_{{\text{Type}}},{A}_{{\text{Value}}}\right)} where AType=Ti{A}_{{\text{Type}}}={T}_{{i}} for some TiT{T}_{{i}}\in{\mathcal{{T}}} and AValue{A}_{{\text{Value}}} is its value of type Ti{T}_{{i}}, which can be empty. We define idx(Ti)=i1\text{idx}{\left({T}_{{i}}\right)}={i}-{1}, unless it is explicitly defined as another value in the definition of a particular varying data type.

In particular, we define two specific varying data which are frequently used in various parts of the Polkadot protocol: Option (Definition 200) and Result (Definition 201).

Definition 199. Encoding of Varying Data Type

The SCALE codec for value A=(AType,AValue){A}={\left({A}_{{\text{Type}}},{A}_{{\text{Value}}}\right)} of varying data type T={Ti,Tn}{\mathcal{{T}}}={\left\lbrace{T}_{{i}},\ldots{T}_{{n}}\right\rbrace}, formally referred to as EncSC(A)\text{Enc}_{{\text{SC}}}{\left({A}\right)} is defined as follows:

EncSC(A)=EncSC(idx(AType)||EncSC(AValue))\text{Enc}_{{\text{SC}}}{\left({A}\right)}\:=\text{Enc}_{{\text{SC}}}{\left(\text{idx}{\left({A}_{{\text{Type}}}\right)}\text{||}\text{Enc}_{{\text{SC}}}{\left({A}_{{\text{Value}}}\right)}\right)}

Where idx\text{idx} is an 8-bit integer determining the type of A{A}. In particular, for the optional type defined in Definition 198, we have:

EncSC(None,ϕ)=0B1\text{Enc}_{{\text{SC}}}{\left(\text{None},\phi\right)}\:={0}_{{{\mathbb{{B}}}_{{1}}}}

The SCALE codec does not encode the correspondence between the value and the data type it represents; the decoder needs prior knowledge of such correspondence to decode the data.

Definition 200. Option Type

The Option type is a varying data type of {None,T2}{\left\lbrace\text{None},{T}_{{2}}\right\rbrace} which indicates if data of T2{T}_{{2}} type is available (referred to as some state) or not (referred to as empty, none or null state). The presence of type none, indicated by idx(TNone)=0\text{idx}{\left({T}_{{\text{None}}}\right)}={0}, implies that the data corresponding to T2{T}_{{2}} type is not available and contains no additional data. Where as the presence of type T2{T}_{{2}} indicated by idx(T2)=1\text{idx}{\left({T}_{{2}}\right)}={1} implies that the data is available.

Definition 201. Result Type

The Result type is a varying data type of {T1,T2}{\left\lbrace{T}_{{1}},{T}_{{2}}\right\rbrace} which is used to indicate if a certain operation or function was executed successfully (referred to as "ok" state) or not (referred to as "error" state). T1{T}_{{1}} implies success, T2{T}_{{2}} implies failure. Both types can either contain additional data or are defined as empty types otherwise.

Definition 202. Sequence

The SCALE codec for sequence S{S} such that:

S=A1,An{S}\:={A}_{{1}},\ldots{A}_{{n}}

where Ai{A}_{{i}}’s are values of the same type (and the decoder is unable to infer value of n{n} from the context) is defined as:

EncSC(S)=EncSCLen(S)||EncSC(A2)||||EncSC(An)\text{Enc}_{{\text{SC}}}{\left({S}\right)}\:={\text{Enc}_{{\text{SC}}}^{{\text{Len}}}}{\left({\left|{{S}}\right|}\right)}\text{||}\text{Enc}_{{\text{SC}}}{\left({A}_{{2}}\right)}\text{||}\ldots\text{||}\text{Enc}_{{\text{SC}}}{\left({A}_{{n}}\right)}

where EncSCLen{\text{Enc}_{{\text{SC}}}^{{\text{Len}}}} is defined in Definition 208.

In some cases, the length indicator EncSCLen(S){\text{Enc}_{{\text{SC}}}^{{\text{Len}}}}{\left({\left|{{S}}\right|}\right)} is omitted if the length of the sequence is fixed and known by the decoder upfront. Such cases are explicitly stated by the definition of the corresponding type.

Definition 203. Dictionary

SCALE codec for dictionary or hashtable D with key-value pairs (ki,vi){\left({k}_{{i}},{v}_{{i}}\right)}s such that:

D={(k1,v1),(kn,vn)}{D}\:={\left\lbrace{\left({k}_{{1}},{v}_{{1}}\right)},\ldots{\left({k}_{{n}},{v}_{{n}}\right)}\right\rbrace}

is defined as the SCALE codec of D{D} as a sequence of key-value pairs (as tuples):

EncSC(D)=EncSCSize(D)||EncSC(k1,v1)||||EncSC(kn,vn)\text{Enc}_{{\text{SC}}}{\left({D}\right)}\:={\text{Enc}_{{\text{SC}}}^{{\text{Size}}}}{\left({\left|{{D}}\right|}\right)}\text{||}\text{Enc}_{{\text{SC}}}{\left({k}_{{1}},{v}_{{1}}\right)}\text{||}\ldots\text{||}\text{Enc}_{{\text{SC}}}{\left({k}_{{n}},{v}_{{n}}\right)}

where EncSCSize{\text{Enc}_{{\text{SC}}}^{{\text{Size}}}} is encoded the same way as EncSCLen{\text{Enc}_{{\text{SC}}}^{{\text{Len}}}} but argument Size\text{Size} refers to the number of key-value pairs rather than the length.

Definition 204. Boolean

The SCALE codec for a boolean value b{b} defined as a byte as follows:

EncSC:{False,True}B1\text{Enc}_{{\text{SC}}}:{\left\lbrace\text{False},\text{True}\right\rbrace}\rightarrow{\mathbb{{B}}}_{{1}}
b{0b=False1b=True{b}\rightarrow{\left\lbrace\begin{matrix}{0}&{b}=\text{False}\\{1}&{b}=\text{True}\end{matrix}\right.}
Definition 205. String

The SCALE codec for a string value is an encoded sequence (Definition 202) consisting of UTF-8 encoded bytes.

Definition 206. Fixed Length

The SCALE codec, EncSC\text{Enc}_{{\text{SC}}}, for other types such as fixed length integers not defined here otherwise, is equal to little-endian encoding of those values defined in Definition 194.

Definition 207. Empty

The SCALE codec, EncSC\text{Enc}_{{\text{SC}}}, for an empty type, is defined as a byte array of zero length and depicted as ϕ\phi.

A.2.2.1. Length and Compact Encoding

SCALE Length encoding is used to encode integer numbers of varying sizes prominently in an encoding length of arrays:

Definition 208. Length Encoding

SCALE Length encoding, EncSCLen{\text{Enc}_{{\text{SC}}}^{{\text{Len}}}}, also known as a compact encoding, of a non-negative number n{n} is defined as follows:

EncSCLen:NB{\text{Enc}_{{\text{SC}}}^{{\text{Len}}}}:{\mathbb{{N}}}\rightarrow{\mathbb{{B}}}
nb={l10n<26i1i226n<214j1j2j3j4214n<230k1k2km+1230n{n}\rightarrow{b}\:={\left\lbrace\begin{matrix}{l}_{{1}}&{0}\le{n}<{2}^{{6}}\\{i}_{{1}}{i}_{{2}}&{2}^{{6}}\le{n}<{2}^{{14}}\\{j}_{{1}}{j}_{{2}}{j}_{{3}}{j}_{{4}}&{2}^{{14}}\le{n}<{2}^{{30}}\\{k}_{{1}}{k}_{{2}}\ldots{k}_{{m}+{1}}&{2}^{{30}}\le{n}\end{matrix}\right.}

in where the least significant bits of the first byte of byte array b are defined as follows:

l11l10=00{{l}_{{1}}^{{1}}}{{l}_{{1}}^{{0}}}={00}
i11i10=01{{i}_{{1}}^{{1}}}{{i}_{{1}}^{{0}}}={01}
j11j10=10{{j}_{{1}}^{{1}}}{{j}_{{1}}^{{0}}}={10}
k11k10=11{{k}_{{1}}^{{1}}}{{k}_{{1}}^{{0}}}={11}

and the rest of the bits of b{b} store the value of n{n} in little-endian format in base-2 as follows:

n={l17l13l12n<26i27i20i17..i1226n<214j47j40j37j17j12214n<230k2+k328+k422×8++km+12(m1)8230n{n}\:={\left\lbrace\begin{matrix}{{l}_{{1}}^{{7}}}\ldots{{l}_{{1}}^{{3}}}{{l}_{{1}}^{{2}}}&{n}<{2}^{{6}}\\{{i}_{{2}}^{{7}}}\ldots{{i}_{{2}}^{{0}}}{{i}_{{1}}^{{7}}}..{{i}_{{1}}^{{2}}}&{2}^{{6}}\le{n}<{2}^{{14}}\\{{j}_{{4}}^{{7}}}\ldots{{j}_{{4}}^{{0}}}{{j}_{{3}}^{{7}}}\ldots{{j}_{{1}}^{{7}}}\ldots{{j}_{{1}}^{{2}}}&{2}^{{14}}\le{n}<{2}^{{30}}\\{k}_{{2}}+{k}_{{3}}{2}^{{8}}+{k}_{{4}}{2}^{{{2}\times{8}}}+\ldots+{k}_{{m}+{1}}{2}^{{{\left({m}-{1}\right)}{8}}}&{2}^{{30}}\le{n}\end{matrix}\right.}

such that:

k17k13k12=m4{{k}_{{1}}^{{7}}}\ldots{{k}_{{1}}^{{3}}}{{k}_{{1}}^{{2}}}\:={m}-{4}

Note that m{m} denotes the length of the original integer being encoded and does not include the extra byte describing the length. The encoding can be used for integers up to 2(63+4)81=253612^{(63+4)8} -1 = 2^{536} -1.

A.2.3. Hex Encoding

Practically, it is more convenient and efficient to store and process data which is stored in a byte array. On the other hand, the trie keys are broken into 4-bits nibbles. Accordingly, we need a method to encode sequences of 4-bits nibbles into byte arrays canonically. To this aim, we define hex encoding function Enc(HE)(PK)\text{Enc}{\left(\text{HE}\right)}{\left(\text{PK}\right)} as follows:

Definition 209. Hex Encoding

Suppose that PK=(k1,kn)\text{PK}={\left({k}_{{1}},\ldots{k}_{{n}}\right)} is a sequence of nibbles, then:

EncHE(PK)={Nibbles4BPK=(k1,kn){(16k1+k2,,16k2i1+k2i)n=2i(k1,16k2+k3,,16k2i+k2i+1)n=2i+1\text{Enc}_{{\text{HE}}}{\left(\text{PK}\right)}\:={\left\lbrace\begin{matrix}\text{Nibbles}_{{4}}&\rightarrow&{\mathbb{{B}}}\\\text{PK}={\left({k}_{{1}},\ldots{k}_{{n}}\right)}&\rightarrow&{\left\lbrace\begin{matrix}{\left({16}{k}_{{1}}+{k}_{{2}},\ldots,{16}{k}_{{{2}{i}-{1}}}+{k}_{{{2}{i}}}\right)}&{n}={2}{i}\\{\left({k}_{{1}},{16}{k}_{{2}}+{k}_{{3}},\ldots,{16}{k}_{{{2}{i}}}+{k}_{{{2}{i}+{1}}}\right)}&{n}={2}{i}+{1}\end{matrix}\right.}\end{matrix}\right.}

A.3. Chain Specification

Chain Specification (chainspec) is a collection of information that describes the blockchain network. It includes information required for a host to connect and sync with the Polakdot network, for example, the initial nodes to communicate with, protocol identifier, initial state that the hosts agree, etc. There are a set of core fields required by the Host and a set of extensions that are used by optionally implemented features of the Host. The fields of chain specification are categorized in three parts:

  1. ChainSpec
  2. ChainSpec Extensions
  3. Genesis State which is the only mandatory part of the chainspec.

A.3.1. Chain Spec

Chain specification contains information used by the Host to communicate with network participants and optionally send data to telemetry endpoints.

The client specification contains the fields below. The values for the Polkadot chain are specified:

  • name: The human-readable name of the chain.

    "name": "Polkadot"
  • id: The id of the chain.

    "id": "polkadot"
  • chainType: Possible values are Live, Development, Local.

    "chainType": "Live"
  • bootNodes: A list of MultiAddress that belong to boot nodes of the chain. The list of boot nodes for Polkadot can be found here

  • telemetryEndpoints: Optional list of "(multiaddress, verbosity)" pairs of telemetry endpoints. The verbosity goes from 0 to 9. With 0 being the mode with the lowest verbosity.

  • forkId: Optional fork id. Should most likely be left empty. Can be used to signal a fork on the network level when two chains have the same genesis hash.

"forkId": {}
  • properties: Optional additional properties of the chain as subfields including token symbol, token decimals, and address formats.
  "properties": {
"ss58Format": 0,
"tokenDecimals": 10,
"tokenSymbol": "DOT"
}

A.3.2. Chain Spec Extensions

ChainSpec Extensions are additional parameters customizable from the chainspec and correspond to optional features implemented in the Host.

Definition 210. Bad Blocks Header

BadBlocks describes a list of block header hashes that are known a priori to be bad (not belonging to the canonical chain) by the host, so that the host can explicitly avoid importing them. These block headers are always considered invalid and filtered out before importing the block:

badBlocks=(b0,bn){badBlocks}={\left({b}_{{0}},\ldots{b}_{{n}}\right)}

where bi{b_i} is a known invalid block header hash.

Definition 211. Fork Blocks

ForkBlocks describes a list of expected block header hashes at certain block heights. They are used to set trusted checkpoints, i.e., the host will refuse to import a block with a different hash at the given height. Forkblocks are useful mechanisms to guide the Host to the right fork in instances where the chain is bricked (possibly due to issues in runtime upgrades).

forkBlocks=(<b0,H0>,<bn,Hn>){forkBlocks}={\left(<{b}_{{0}},{H}_{{0}}>,\ldots<{b}_{{n}},{H}_{{n}} >\right)}

where bi{b_i} is an apriori known valid block header hash at block height Hi{H_i}. The host is expected to accept no other block except bi{b_i} at height Hi{H_i}.

info

lightSyncState describes a check-pointing format for light clients. Its specification is currently Work-In-Progress.

A.3.3. Genesis State

The genesis state is a set of key-value pairs representing the initial state of the Polkadot state storage. It can be retrieved from the Polkadot repository. While each of those key-value pairs offers important identifiable information to the Runtime, to the Polkadot Host they are a transparent set of arbitrary chain- and network-dependent keys and values. The only exception to this are the :code (Section 2.6.2.) and :heappages (Section 2.6.3.1.) keys, which are used by the Polkadot Host to initialize the WASM environment and its Runtime. The other keys and values are unspecified and solely depend on the chain and respectively its corresponding Runtime. On initialization, the data should be inserted into the state storage with the Host API (Section B.2.1.).

As such, Polkadot does not define a formal genesis block. Nonetheless, for compatibility reasons in several algorithms, the Polkadot Host defines the genesis header (Definition 212). By the abuse of terminology, "genesis block" refers to the hypothetical parent of block number 1 which holds the genesis header as its header.

Definition 212. Genesis Header

The Polkadot genesis header is a data structure conforming to block header format (Definition 10). It contains the following values:

Table 4. Table of Genesis Header Values
Block header fieldGenesis Header Value
parent_hash0B320_{\mathbb{B}_{32}}
number00
state_rootMerkle hash of the state storage trie (Definition 29) after inserting the genesis state in it.
extrinsics_rootMerkle hash of an empty trie: Blake2b(0B1)\text{Blake2b}{\left(0_{\mathbb{B}_1}\right)}
digest00
Definition 213. Code Substitutes

Code Substitutes is a list of pairs of the block numbers and wasm_code. The given WASM code will be used to substitute the on-chain WASM code starting with the given block number until the spec_version on-chain changes. The substitute code should be as close as possible to the on-chain wasm code. A substitute should be used to fix a bug that can not be fixed with a runtime upgrade if, for example, the runtime is constantly panicking. Introducing new runtime apis isn't supported, because the node will read the runtime version from the on-chain wasm code. Use this functionality only when there is no other way around and to only patch the problematic bug, the rest should be done with an on-chain runtime upgrade.

A.4. Erasure Encoding

A.4.1. Erasure Encoding

info

Erasure Encoding has not been documented yet.

Bibliography

1.
Saarinen MJ, Aumasson J-P. The BLAKE2 cryptographic hash and message authentication code (MAC) [Internet]. https://tools.ietf.org/html/rfc7693: -; 2015. Report No.: 7693. Available from: https://tools.ietf.org/html/rfc7693
2.
Papadopoulos D, Wessels D, Huque S, Naor M, Včelák J, Reyzin L, et al. Making NSEC5 Practical for DNSSEC [Internet]. Cryptology ePrint Archive, Paper 2017/099; 2017. Available from: https://eprint.iacr.org/2017/099
3.
Goldberg S, Papadopoulos D, Vcelak J. Internet Draft - Verifiable Random Functions (VRFs) [Internet]. draft-goldbe-vrf-01. 2017. Available from: https://tools.ietf.org/id/draft-goldbe-vrf-01.html
4.
Josefsson S, Liusvaara I. Edwards-curve digital signature algorithm (EdDSA). In: Internet Research Task Force, Crypto Forum Research Group, RFC. 2017.
5.
de Valence H. Explicitly Defining and Modifying Ed25519 Validation Rules [Internet]. 2020. Available from: https://github.com/zcash/zips/blob/master/zip-0215.rst