Encryption: Part I

Before we go any further I have to tell you that you shouldn't design encryption or crypto schemes on your own and should use existing, well-designed schemes that have been shown to be secure (at least as far as we know yet) because these schemes have been developed by actual cryptographers who know what they are doing which we without immense math and cryptography knowledge don't. As developers we only use and apply the schemes created by cryptographers... we neither develop our own nor break them. Yet, in this guide we will look at very simple homebrew encryption schemes.

This is going to be a series covering very basic aspects of encryption algorithms and shows how to break some of the most simple encryption algorithms.

What is encryption

Encryption transforms data (called the plaintext) into encrypted data (called the ciphertext) so that only the intended recipients can recover the plaintext from the ciphertext. This is usually done by using algorithms (designed by cryptographers) which use a key but some algorithm don't have keys. When using algorithms without keys the security relies on the algorithm being secret meaning only the intended circle of people know the algorithm to encrypt and decrypt data. Algorithms that are public use keys that are to be kept secret. Sometimes the algorithm required to decrypt and encrypt the data are the same (meaning encrypting the data twice results in the plaintext). The algorithms required to decrypt and encrypt may differ and some algorithms have a key used for encryption and a key used for decryption. If the key for encryption and decryption is the same it is called symmetric and if seperate keys for encryption and decryption are needed it is called asymmetric. Asymmetric algorithms have the advantage that one key can be kept secret (called the private key) and one key can be published (called the public key) thus allowing for scenarios where only one person can encrypt but everybody can decrypt or that only one person can decrypt but everybody can encrypt it. Of course, algorithms may still employ keys and be kept hidden from the public.

Hashing on the other hand (in case you know this term) is NOT encryption. Hashing is something completely different. Hashing is a so called one-way function meaning it's a function that transforms some input into some output. Hash algorithms are used for different purposes and different hash algorithms have different properties and not all hash algorithms are considered cryptographically secure as some hash algorithms are only used to calculate a checksum of some data.

Hash algorithms

Hash algorithms usually take some input and return an output which is a number of bytes where the length of the output is fixed regardless of the size of the input. The output of hashing the input is called the hash of the input. Since the length is fixed but the input can be of arbitrary length we can immediately see that there multiple inputs that produce the same output. This is called a collision. A collision is if hash(a) == hash(b) where a != b. Hashes are used to created a fingerprint of data for example a 2GB file. If we want to transmit this file from a sender Alice to a recipient Bob how do we check that the file arrived unchanged? Well, we could transmit it twice and then compare the two but that's not really practical. Hashing in this case allows us to reduce the 2GB file to a let's say 16 bytes hash (or fingerprint). Bob can then calculate the hash of the version he received and compare that against the hash computed by Alice. This allows us to detect whether somebody changed the data during transmission (of course this requires that the hash is transmitted through a secure channel because an attacker might also alter the hash we transmit). Now, since there are collisions it is possible that the data was altered during transmission because there's an infinite amount of inputs that produce the same hash as output but here's the thing: in good hash algorithms this is highly unlikely. From a cryptographically strong hash algorithm we expect that even slight changes result in a completely different hash. What's a cryptographically strong hash algorithm? Well, it's an algorithm that has the properties that:

By hard we mean that while it's theoretically possible to do so we expect it to be so hard that no machine can ever do that in a reasonable amount of time (meaning we expect that it'd take a machine hundreds of years to do so). However, not all hash algorithms fulfill these properties. Hash algorithms whose sole purpose is to detect random changes during transmission do not have these. Also, hash algorithms that were thought to have these properties might not have them forever as researches figure out attacks on the hash algorithm making it weaker as once thought meaning cryptographers need to create new hash algorithms and we need to switch to using them. A simple hash algorithm is for example hash(a,b,c) = (a + b + c) % 256. If we transmit a,b,c and one value changes then the hash will change. However, if two values change these changes may cancel out producing the same hash. This would be a suitable hash algorithm to catch transmission errors (if we assume that there's no more than one error) but this hash algorithm doesn't fulfill any of the three properties of a cryptographically strong hash algorithm. In short: Unless you just want to catch random I/O errors: Use cryptographically strong hash algorithms. For example if we expect there to be attackers between Alice and Bob we can't just use a hash algorithm that are used for checksum purposes because these are too weak to resist actual attacks and we MUST use a cryptographically strong hash algorithm otherwise we risk that attackers can easily create malicious versions of the file that produce the same hash and thus Bob can't detect that the file was altered by an attacker.

A simple symmetric encryption algorithm

A very simple symmetric encryption algorithm is to rotate each letter (or byte) by some number for example turning A into C, B into D and so on. If we look at bytes this simply means performing (x + key) mod 256. In this case to decrypt the data we need another algorithm that performs (x - key) mod 256 or we use the same algorithm but the key required to decrypt is -key and the key to encrypt is key. However, this isn't a useful asymmetric encryption scheme because it's of no use if we can calculate one key from the other. Let's look at some example code:

--- Code ---

func encrypt(key int8, data []byte) {
	for i, v := range data {
		data[i] = byte(uint16(uint16(v) + uint16(key)) % 256)
	}
}

func main() {
	data := []byte{1, 3, 5, 17}
	fmt.Printf("plain: %d\n", data)
	encrypt(2, data)
	fmt.Printf("encrypted: %d\n", data)
	encrypt(-2, data)
	fmt.Printf("decrypted: %d\n", data)
}

--- Output ---

plain: [1 3 5 17]
encrypted: [3 5 7 19]
decrypted: [1 3 5 17]

Is this a strong encryption algorithm? No, obviously and absolutely not. It's easy to write a program to output all possible 256 plaintexts and then check them if they make some sense (which is very easy to do if it's textual data). More than that, it's vulnerably to numerous attacks. One such attack is a known plaintext attack which as the name suggests means that the attacker knows either the plaintext or part of the plaintext. In this case, an attacker would only have to know A SINLGE byte of plaintext to calculate the key using key := ciphertext - plaintext. If we for example know that the plaintext starts with 1 and we know the ciphertext starts with 3 then we know the key is 3 - 1 = 2. The above algorithm is known as Caesar cipher and now you know why it's insecure.

Summary

What next?

In the next part we'll look at a bunch of other simple symmetric encryption algorithms (stream ciphers to be specific) and how to attack them.