(This post was first published at medium.com on Jan 12, 2017. License: CC4.0-BY-SA)

It is interesting to notice, that, while all around the world people speak about blockchain and even get into fights whether it is “the blockchain” or “a blockchain” or simply “blockchain,” there is yet no commonly accepted definition of what a blockchain actually is.

In this article I argue that a “blockchain” is simply a way how to store data with 3 very important characteristics:

  1. A blockchain stores its data in form of an un-broken chain of information blocks, where each block references the previous block.
  2. A blockchain is massively de-centralised system, where each participant holds a complete copy of the chain.
  3. A defined consensus protocol creates a unique way how to add data to this chain of information.

We will see that these three characteristics lead to a system that is resilient against attacks and provides a secured data ledger where data can only be added to and not changed in the past — a so called immutable data storage.

A blockchain stores its data in form of an un-broken chain of information blocks, where each block reverences the previous block.

This may be the most widely accepted requirement of a “Blockchain” and is the characteristic that gives it its name.

Usually people do not really care how the data is organised in their databases, unless for optimisation purposes. But in a blockchain the organisation of the data is critical. All information is organised into “blocks” that are ordered in form of a chronological chain. Each block contains a reference to the previous block in form of a digital signature of the previous block. That signature (“cryptographic hash”) depends on the content of the block — even small changes alter the signature. This way starting from the newest block it is possible to test all blocks back to block 0— the genesis block — for validity. Such a system is also known as a Merkel-tree.

The chain of blocks makes a blockchain a write-once-read-many or “immutable” database: Once data is saved in the chain it cannot be altered without altering all blocks that come after the change. Still, an attacker that has access to all copies of the blockchain could still do such a change and thus rewrite history. One part of the protection against that is to de-centralise the data storage — create thousands of copies of the system.

A blockchain is massively de-centralised system, where each participant holds a complete copy of the chain.

Typically a blockchain has thousands of participants, each of them hosting a complete copy of the chain of data. The participants can all share the same power like in the case of a permission-less blockchain like the bitcoin blockchain. Or we can see differences regarding the rights of the participants, e.g., only some participants are allowed to add data to the blockchain.

The de-centralisation makes the system resilient against data loss and attacks from the outside, but at a cost: The system needs to invest time and energy into keeping the data up-to-date over all participants. And it needs to find a way how to settle disputes about what is the correct state of the system: A consensus protocol.

A defined consensus protocol creates a unique way how to add data to this chain of information.

The chain of hashes in the block secure the blockchain against changes of stored data and the de-centralisation protects data availability. But the system still needs to decide how to create a new block, how to secure the validity of the last block against attacks (this block is not secured by the chain) and how to protect the chain against someone simply switching it against a different one. This is done via the consensus protocol.

So, the consensus protocol or algorithm needs to describe how the network as a whole decides which is the correct chain if presented with different possibilities. In the normal state that means finding a unique (or unique within a certain time frame) way how to select the next block for the chain. In the state of a separated and re-united network, or in case of an attack, it means deciding which is the correct blockchain out of a set of possibilities existing in the network.

There are various ways how this consensus protocol can look like. Design decisions here define the over all performance and “social” characteristic of the block chain.

Bitcoin uses a so called “Proof of Work” based consensus. A new block has to start with a set of binary zeros in its cryptographic hash. There is no easy way to do it other than actually creating a new block, calculating its hash and testing it. Every one in the network can present a possible new block, but the probability of finding a suitable solution to the hash work problem is proportional to the computational power invested by each participant. The network re-calculates the required difficulty (the number of zeros) every once in a while so that the median time between each block addition is 10 minutes. It can happen that two different solutions are found at roughly the same time in the network and two chains exists for a limited time in parallel that differ in the last block in Bitcoin. The Bitcoin consensus also states that the chain with the bigger over all difficulty wins, so when a new block is found for only one of both chains, the network will quickly switch to the one chain with now the higher number of blocks. The same holds true if an attacker presents a different blockchain to the network — as long as that attacking chain does not have more computational power invested in it compared to what has been invested in the past in the correct one, it will simply be ignored. This makes changes in the history of the bitcoin blockchain practically impossible as long as computational “hashing” power invested it is high. This Proof of Work is the state of the art consensus protocol for big public permission-less blockchains.

Other consensus protocols also exist. As “Proof of Work” is very computation intensive and costs lots of energy, one of the first alternatives created for public permission-less blockchains was “Proof of Stake”. Here the network selects the next block based on the amount of stake (usually in form of a cryptocurrency) into the system the proponent of the next block is offering. The network also selects the “best” chain then on the overall stake that was invested in it so far. The idea is that to take over the network, you would have to own a majority of the stake in the network, which would make you a natural ally to the network itself. Quite often “Proof of Stake” and “Proof of Work” are also combined to mitigate possible attacks, like in peercoin.

If the set of participants in the blockchain is known, especially in case the blockchain is private, then “Proof of Work” is not a well suited consensus protocol. It can even be dangerous in itself as attackers may easily overpower the limited computational resources of the honest participants. In this case it may be better to use a permissioned blockchain with a validator based consensus. Here a subset of the participants is selected to vote for and validate the next block. The network then accepts the block based on the signatures of the known validators, and, if a majority of the signatures is required to accept any new block, the chain can never fork. An example of this type of consensus protocol is the Tendermint Blockchain as it is currently implemented.