We acknowledge the need of popular scientific descriptions of electronic voting technology to develop trust in the systems, so we briefly introduce the ideas and technologies used and also discuss advantages and disadvantages of electronic voting with focus on the approach we have implemented.
The following focuses on cryptographic components of voting systems. There are several proposals that combine traditional paper based systems with cryptographic techniques to make systems more secure and easier to verify.
The laws regulating elections are typically detailed to avoid legal processes in the event of a conflict, e.g., they may even specify the color and physical size of ballots. A multiparty protocol has the same role in an electronic voting system, but is much more precise, since the parties are now computers executing programs that operate independently of humans. Thus, a cryptographic protocol may be thought of both as an abstract description of what should be done as well as an implementation of such a protocol depending on the context.
The arguably most important property of an election is correctness which explains why many countries employ routines that to some extent sacrifice other properties for this purpose. It is often claimed that the correctness of tallying in an electronic voting system relies on the correctness of the software or hardware used, but this is wrong.
Correctness in classical elections is guaranteed by letting a third party audit the election. The form of auditing depends on the democratic culture, costs, and potential threats, but in general the handling and counting of all votes cannot be verified to be correct in practice. Instead schemes are used to make the choice of which parts of the election are audited unpredictable to an adversary, e.g., detailed inspections are carried out at some polling stations and some votes are recounted.
The basic principle is not different from that used in written exams. Students show that they know a large part of the content despite that they only answer a relatively small number of questions. This works because they could have answered many other questions and the questions were chosen randomly. An oral exam is of course more interactive, but based on the same idea.
This idea can be made precise mathematically as interactive proofs. An interactive proof is simply a well-defined way for a prover and a verifier to start from a claim of the prover, and communicate in such a way that at the end of the interaction the verifier accepts if the prover is honest and the claim is true, and if the claim is false it only accepts with probability very close to zero.
In some electronic voting protocols, like those we implement, all parties use interactive proofs to show that they executed all steps exactly as prescribed by the description of the protocol. The probability that the verifier accepts a wrong result is smaller than one in 1030, so it should be viewed as zero in practice.
There are several cryptographic techniques to keep information secret, but the best known is encryption. Using public key encryption the encryption key can be distributed to voters publicly. Each voter then submits its vote in encrypted form to be tallied. Provided that the cryptosystem is not broken, nobody can gain any knowledge of the contents of the submitted votes.
The cryptosystems we use are mature and considered to be the most secure available today. It is in principle possible that they are broken already tomorrow, but this must be put into context to understand the level of trust that is reasonable.
If the cryptosystems we use would be broken, then not only how people voted would be public, but also all information communicated on the Internet today (and has been communicated over the recent decades). Furthermore, virtually all forms of digital authentication used today would be broken. The whole Internet would literally grind to a halt.
This shows the trust all of society is comfortable to put in these systems based on decades of research and repeated evaluations, so we think it is not a great leap to trust the same systems for elections. The only valid objection is that it is hard to predict future development in algorithms and hardware and we need long term security in some applications of electronic voting. An informed political decision has to taken for each application whether this threat on its own should make the use of the type of cryptosystems we use unacceptable.
In the example of the student exam above there is no need for the students to hide what they know from the teacher, since the teacher arguably already knows the content, but in an election the tallying must not reveal anything about who voted for what. A ballot box is usually used for this purpose, i.e., the ballots are sealed by the voters and put in a box which mixes all ballots before they are counted. The ballot box itself can be inspected to make sure that it does not alter, remove, or adds votes.
Digital information is virtually impossible to inspect manually, which is why interactive proofs described above are used to guarantee correctness, but these protocols must not leak any information in themselves to be useful.
Fortunately, the protocols used are not only interactive proofs, but also zero-knowledge. This means that they reveal no knowledge what so ever beyond that the execution was indeed performed correctly. That this is possible is counter-intuitive, but a mathematical fact that is hard to explain without resorting to mathematics.
Today election officials keep track of who voted. Authentication of voters often relies on either receiving a voting card by post or by showing identication when entering the polling station or casting the vote.
This setup can of course be used as is if electronic voting is only used for tallying and vote casting is still done at a polling station. In an election carried out over the Internet standard authentication solutions can be used. What is suitable depends on the application and the available authentication infrastructure. Some countries, like Estonia and Sweden, have wide spread PKIs supported and used by the authorities, and some do not.
In a classical election no individual or authority is trusted to authorize voters and count the votes on their own. Instead multiple parties work together and mutually check that everything is done correctly. This is not only the case for correctness; indeed an election worker that could replace the ballot box with a maliciously constructed box and open sealed envelopes on his own could of course learn what everybody voted for.
This is mirrored in electronic voting systems in two ways. Firstly, no single party holds the complete decryption key. Instead the secret key is generated by multiple parties in such a way that only a sufficiently large subset of the parties could recover the secret key.
Thus, provided that not too many parties are malicious and collude, privacy is guaranteed. Forming a collusion carries a high risk of detection for the parties that are chosen to hold shares of the secret key and a severe penalty upon detection, so one would expect that this is not worthwhile to even attempt in most cases.
Secondly, anybody can verify the zero-knowledge proofs used to show that the tallying was done correctly. Thus, the correctness of the tallying in an electronic voting system is much stronger than in classical systems and literally allows any single auditor to either accept the proof or prove that the tallying was incorrect.
Mix-nets are used to tally the votes in an election. It is formed by a set of parties that in the basic case simply: (1) generates a public key for which the secret key is shared as explained above, (2) decrypts the ciphertexts and outputs the plaintext votes in random order without revealing anything else. We stress that the complete secret key is never recovered; the process is carried out solely using the shares of the secret key. Mix-nets are the core components of any electronic voting systems that use them.
Above we have explained that the voter encrypts its vote and we have outlined how a cryptographic protocol is used to tally the votes such that: (1) how the vote is cast is not revealed, and (2) the encrypted votes are decrypted in a way that can be verified to be correct.
However, a human can of course not encrypt its vote in their head when we are using a cryptosystem. Instead a computing device is used, either on dedicated hardware in a polling station, or on any device in an Internet election. The problem with this is of course that it is the device that prepares the actual vote and it could change the content unbeknownst to the voter before submitting the ciphertext.
Many solutions have been proposed for this problem, but they all boil down to running an interactive proof between the voter and the device, or between the device and another device. In some cases using the trivial protocol where the other device actually re-computes the same ciphertext as the first to verify that it was done correctly. Note that there is little need for this proof to be zero-knowledge as long as the voter can produce proofs for any choice of candidate.
Message authentication codes (MAC) have been used ever since the early days of cryptography by two parties holding a shared key to verify that a message was not altered or replaced during transit. The idea is that the sender takes both the message and secret key and computes a code. The code does not hide anything, but it cannot be computed without the secret key. The receiver then uses its copy of the secret key to check that a code matches the message.
MACs can be used to give another solution to the problem of malicious client devices with a different trust model. The cryptosystem used allows the mix-servers to compute blinded MACs of the plaintext votes in such a way that they do not learn the vote itself, so they can return a MAC of a blinded vote to the voter through a different communication channel. If the voter holds the blinding value and its individual MAC key then it can verify that the ballot collecting servers received an encryption of its intended vote.
It may seem that this brings us back to a situation where the voter would have to do complex computations, but the MAC codes can be pre-computed and represented with short codes. Thus, for elections with few candidates this is a viable option in practice. This cryptographic technique is often called return codes in the electronic voting literature. Somebody must of course print the return codes or communicate them otherwise to the voters, so this approach replaces trust in the devices with trust in other parties as far as correct encryption goes.
Even if a corrupted computing device encrypts the vote correctly, it could record it and collect the votes of many voters. A similar argument to that of trusting encryption can be applied here. If a device can do this on a grand scale, then there is likely much more valuable data to be collected than how somebody voted. However, if this is deemed necessary, then voting codes can be used. Here the idea is that a trusted party prepares codes that on their own do not say anything, but along with the codes their meaning is conveyed to the voter. This can be implemented using a few executions of our mix-net due to its flexible functionality.
This is trivially combined with a mix-net by letting a trusted party (or several) pre-compute lists of ciphertexts and provide hash values of the individual ciphertexts along with the plaintexts to the voter. The ciphertexts would be stored publicly, but in random order and the hash values are simply pointers to ciphertexts (in shortened form, unique over all ciphertexts).
The advantage of this approach is that it reduces the trust in individual devices and may allow voters to use paper to vote, but it increases trust in the parties that prepare the ciphertexts, the codes, and plaintexts, and post them to the voter. In particular, a printer must print the voting cards containing the voting codes.
One gap in the chain remains and this is that even if the voting device prepared a ciphertext correctly, it was received correctly, and tallying was done correctly, it could be the case that the wrong set of ciphertexts was processed, e.g., inputs could have been added or removed.
If signatures are used by voters to authenticate their votes, then the receiver can show that each vote it tallies was cast by a specific voter. Similarly, the receiver can sign each received vote to give the voter evidence that the vote was received. The problem is that this is not always practical, i.e., there may not be a PKI in place to use.
A weaker guarantee can be given by letting the receiver sign all received votes and publish the signature, or send it to multiple parties, each time it receives a ciphertext. This can of course be implemented more efficiently using standard hash chains or trees. The voter can then verify that its vote was received by simply checking that it appears among the received ciphertexts. The guarantee provided to the voter is as strong as before, since its encrypted vote has been signed, albeit along with other ciphertexts. However, the receiver can no longer prove that it received each vote from a voter without resorting to other techniques.
A common objection against Internet electronic voting systems is that anybody who can observe the voting act learns how the voter cast its vote and there is no way to protect the voter against this. The problem is not mainly that the vote is revealed, but that a bystander can coerce or pay a voter to cast its vote in a given way and then verify that this is done.
When we started doing research on electronic voting more than 20 years ago, the problem of people shoulder surfing, or viruses or other malware, recording the vote was considered a major problem, but security is always relative. Perhaps counter intuitively, we think that in many settings the problem is smaller Today.
The reason is the abundance of information freely provided publicly by most individuals in social media and on the Internet. There are certainly countries, or situations, where the vote cast by an individual remains an sensitive piece of information, but even a casual look at an account at Google, Facebook, Instagram, Twitter, LinkedIn, etc combined with location data or other public data gives a more fine-grained profile of a voter than simply the vote.
Furthermore, this information can be collected, using both legal and illegal methods, before the election and be used to craft targeted campaigns. The Cambridge Analytica scandal is a real-world example of this. Put bluntly, anybody who has the ability to infect a large number of devices with malware to gather information holds at most as much information as the major Internet companies. They already sell this information, more or less directly, for marketing purposes.
In theory coerciion is possible, but in practice the problem is smaller than what is often claimed. Firstly, voters can be allowed to vote multiple times where only the last vote is counted. This does not solve the problem entirely, since the bystander could observe the voter at the end of the voting period, and somebody must verify which vote is cast last without making this public. The former is unlikely to happen at large scale without detection and the latter can be solved using a set of auditors that generally has no relation with the relatively large number of bystanders needed to coerce even a small number of voters. Furthermore, the victim or another bystander, i.e., a family member, can document the attack and coercion is illegal.
Secondly, it is already easy to coerce individual voters, e.g., family members, to vote in a given way in many classical voting systems. Several countries allow voting from the home by simply posting an envelope. Other countries provide the voting material at the polling station in such a way that the coercer can make sure that the victim can only vote in one way.
Even more countries use systems where the so-called Italian attack is feasible. In this attack the coercer asks the voter to encode a random pattern into its vote without changing the part of the vote of primary interest to the coercer, e.g., the party vote may be the primary interest and votes for persons within the party could be used to encode the pattern. Then the coercer verifies that the random pattern turns up among the counted votes. This may seem contrived, but it is in fact feasible in practice. With a more elaborate attack, the adversary can distribute the verification during counting on the victims themselves.
In particular, the current Swedish national election system allows an anonymous large scale attack where the buyer can verify that sellers vote as instructed and sellers can verify that they will be paid as claimed.
Based on experience we know that this is not a big problem in many classical elections, and we are not aware of any reason why this would be more serious in an electronic system that is used in a responsible way, but in the end a political decision must be made on what is acceptable.
If we accept to use encryption systems of the standard type, then the tallying problem is completely solved using a mix-net, and we have the best implementation. There are several options for the remaining parts of the system and what is best depends on the application, and political decisions.