Turid Herland - Master's Thesis

I have just completed a master programme in Information Security at Gjøvik University College; I handed in my thesis on July 1st, 2006. The title of the thesis is "The use of k-best paths algorithms in clock control sequence reconstruction". The thesis can be downloaded from the link below; please be advised that if you want to print a hard copy of the document, the graphs that show the experimental results should be printed in colour.

MSc Thesis

The main topic of the thesis is cryptanalysis of stream ciphers using keystream generators based on clock-controlled linear feedback shift registers (LFSRs). The general statistical model of such a stream cipher is an LFSR that produces a bitsequence x that is then decimated irregularly according to a clock control sequence c. The result of this decimation is the keystream sequence y, which is XORed with the plaintext sequence b to obtain the ciphertext sequence z, a.k.a. the encrypted plaintext. The clock control sequence is produced by a general type subgenerator, often an LFSR or a collection of LFSRs.

Statistical model of stream cipher

The decimation deletes bits from x, so y is a copy of x with some bits missing. The clock control sequence dictates how many positions the decimation function should advance in x before outputting the current bit as the next bit in y. Let k(t) be the clock function that indicates the total sum of the clock at time t; then y_t = x_k(t), and k(t) = k(t-1) + c_t. That is, the t^th bit of y is the same as the k(t)^th bit of x.

The goal in cryptanalysis is, given the ciphertext, to recover the plaintext without knowledge of the secret key that was used to encrypt the plaintext. In stream ciphers the secret key is usually the initial state of the generator that generates the keystream. It is assumed that details about the generator are known to the cryptanalyst, however the initial state is to be kept secret.

The attack that I suggest in my thesis consist of two phases:

Determine the candidate initial states of the LFSR.
Reconstruct the clock control sequence.

The first phase determines a set of candidate initial states for the LFSR by calculating the edit distance between the ciphertext sequence z and the sequence x generated by each possible initial state of the LFSR. The edit distances are compared to a predetermined threshold, and those initial states that result in edit distances below the threshold are added to the set of candidate initial states, the rest of the initial states are discarded.

The edit distances in phase 1 are calculated recursively, filling one matrix of partial edit distances between all possible prefixes of the sequences for each edit distance calculation. Different paths through each such matrix correspond to different clock control sequences, and thus, in phase 2, the clock control sequence can be reconstructed by backtracking paths through the edit distance matrices. In the known plaintext scenario, an optimal path, which can be found very efficiently, would give the correct clock control sequence. However, when the plaintext is not known, it behaves like noise in the cryptanalytic process, and this noise makes it necessary to also consider suboptimal paths through the matrices. The goal of this thesis is to use a k-best-paths algorithm, well known in several other fields, to find the k shortest paths through the matrices in order to reconstruct the clock control sequence.

Results

This proposed attack was implemented and tested on randomly generated pairs of plaintext- and keystream sequences, and it found the correct keystream in all the experiments. The results also showed that the average number of paths that the attack goes through, before it finds the path that represents the correct clock control sequence, increases linearly with the noise level, and exponentially with the length of the keystream generator.

The attack was also compared to a similar attack, introduced in the article listed as literature entry number 3 in the list below. This paper describes an attack that is identical to mine, except that it uses a depth-first search instead of the k-best paths search. The results show that the average number of paths is lower for the k-best paths attack if the noise level is zero, that is if it is a known plaintext attack. However, when the noise level increases, the depth-first search attack seams to perform slightly better. More detailed studies of the results of each individual experiment show that the k-best paths search encounters "bad experiments", that is experiments where the number of extracted paths is much higher than the average, more often than the depth-first search. There is no indication that the depth-first search performs better than the k-best paths when the k-best paths does not have a "bad experiment". Thus the k-best paths might perform as good as, or even better than, the depth-first search if several attacks were run in parallel.

Literature

The most important literature resources for my thesis are listed below. 1. is a general introduction to cryptography, with a good chapter on stream ciphers. 2. is the paper that introduced the edit distance attack for finding the candidate initial states of the LFSR. 3. introduces an attack that is very similar to mine, only it uses a directed depth-first search through the edit distance matrices instead of a k-best paths search. My attack is based heavily on this paper, and my results are compared with results from this attack. 4. is the paper that introduced the k-best paths algorithm that I use in my attack.

Menezes, A. J., Vanstone, S. A., & Oorschot, P C. V 1996. Handbook of Applied Cryptography. CRC Press, Inc., Boca Raton, FL, USA.
Golic, J. D. & Mihaljevic, M. J. 1991. A generalized correlation attack on a class of stream ciphers based on the Levenshtein distance. Journal of Cryptology, 3(3), 201-212.
Petrovic, S. & Fuster-Sabater, A. 2004. Clock control sequence reconstruction in the ciphertext only attack scenario. In ICICS, 427-439.
Eppstein, D. 1997. Finding the k shortest paths.