Here is a Python dictionary of the relative frequency of letters in English text: { “A”: 08167, “B”: 01492, “C”: 02782, “D”: 04253, “E”: 12702, “F”: 02228, “G”: 02015, “H”: 06094 “I”: 06996 “J”: 00 153 “K”: “00772 “L”: 04025 “M”: 02406, “N”: 06749, “O”: 07507, “P”: 01929, “Q”: 00095, “R”: 05987, “S”: 06327, “T”: 09056, “U”: 02758, “V”: 00978, “W”: 02360, “X”: 00150, “Y”: 01974, “Z”: 00074 Here is some plaintext: The population variance of a finite population X of size N and mean mu is given by Var(X) = 1/N sigma^N_i = 1 (x_i – mu)^2 (a) What is the population variance of the relative letter frequencies in English text? (b) What is the population variance of the relative letter frequencies in the given plaintext? (c) For each of the following keys – yz, xyz, wxyz, vwxyz, uwxyz – encrypt the plaintext with a Vigenere cipher and the given key, then calculate and report the population variance of the relative letter frequencies in the resulting ciphertext. Describe and briefly explain the trend in this sequence of variances. (d) Viewing a Vigenere key of length k as a collection of k independent Caesar ciphers, calculate the mean of the frequency variances of the ciphertext for each one. (E.g., for key yz, calculate the frequency variance of the even numbered ciphertext characters and the frequency variance of the odd numbered ciphertext characters. Then take their mean.) Report the result for each key in part (c). Is the mean variance like those observed in part (b)? Part (c)? Briefly explain. (e) Consider the ciphertext that was produced with key uvwxyz. In part (d), you calculated the mean of six variances for this key. Revisit that ciphertext, and calculate the mean of the frequency variances that arise if you had assumed that the key had length 2, 3, 4, and 5. Does this suggest a variant to the Kasiski attack? (Don’t say no!) Briefly explain.

Code for parts a-d:

import string
d={“A”: 0.08167, “B”: 0.01492, “C”:.02782, “D”:.04253, “E”:.12702, “F”: .02228,
“G”:.02015, “H”:0.06094, “I”:.06996, “J”:.00153, “K”:.00772, “L”:.04025, “M”:.02406, “N”:.06749, “O”:.07507, “P”:.01929, “Q”:.00095, “R”:.05987, “S”:.06327, “T”:.09056, “U”:.02758, “V”:.00978, “W”:.02360, “X”:.00150, “Y”:.01974, “Z”:.00074}

def relative_freq(text):
n=float(len(text))
d = dict()
#initialize all frequencies to 0
for x in string.ascii_lowercase:
d[x] = 0
#for every instance of letter, increment count by 1
for l in text:
d[l] +=1
#for every letter, divide the frequency by total letters to get
#relative frequency
for x in string.ascii_lowercase:
d[x] = d[x]/n
return d

#function to get mean of frequencies from dictionary of relative freq
def mean(d):
total = 0;
for i in d:
total +=d[i]

#calculate population variane by taking input as dictionary of frequency
#and the mean
def population_variance(d,mu):
var = 0;
for i in d:
var += (d[i] – mu)**2
return var/len(d)

#utility function to rotate a list. We will use it to build a lookup table
#to get cipher text
def rotate(l,n):
return l[n:] + l[:n]

#function to build lookup table
def build_lookup_table():
#It is implemented as a nested dictionary.
lut = dict()
l = list()
l = string.ascii_lowercase[:]
counter = 0
for i in string.ascii_lowercase:
counter = 0
lut[i] = {}
for j in string.ascii_lowercase:
lut[i][j] = l[counter]
counter+=1
l = rotate(l,1)
return lut

def ciphertext(plaintext,key,lut):
key_replaced_text=””
while(len(key_replaced_text)+len(key)<=len(plaintext)):
key_replaced_text+=key
if len(key_replaced_text) != len(plaintext):
for i in key:
key_replaced_text+=i
if len(key_replaced_text) == len(plaintext):
break
cipher = “”
for i,j in zip(key_replaced_text,plaintext):
cipher+=lut[i][j]
return cipher

def analyze(text):
d = relative_freq(text)
return population_variance(d,mean(d))

def partd(plaintext,key,lut):
cipher = ciphertext(plaintext,key,lut)
variances = 0
for i in range(len(key)):
s = “”
for j in range(i,len(cipher),len(key)):
s+=cipher[j]
variances += analyze(s)
return variances/len(key)

print “A) Population variance for given dictionary:”
print population_variance(d,mean(d))
pt_d = relative_freq(plaintext)
print “B) Population variance for given plaintext:”
print population_variance(pt_d,mean(pt_d))
print “C) Population variance for cipher text for given keys:”
lut = build_lookup_table()
print analyze(ciphertext(plaintext,”yz”,lut))
print analyze(ciphertext(plaintext,”xyz”,lut))
print analyze(ciphertext(plaintext,”wxyz”,lut))
print analyze(ciphertext(plaintext,”vwxyz”,lut))
print analyze(ciphertext(plaintext,”uvwyz”,lut))
print “D) Result for each key:”
print partd(plaintext,”yz”,lut)
print partd(plaintext,”xyz”,lut)
print partd(plaintext,”wyz”,lut)
print partd(plaintext,”vwyz”,lut)
print partd(plaintext,”uvwyz”,lut)

Output:

A) Population variance for given dictionary:
0.00104056677352
B) Population variance for given plaintext:
0.001035960098
C) Population variance for cipher text for given keys:
0.000522474397184
0.000359166293935
0.00023525436279
0.000189753214864
0.000221136606201
D) Result for each key:
0.00106591263562
0.00111744202495
0.00111744202495
0.00114924294535
0.00122492441558

The lookup table for cipher, which is written as a nested dictionary in python. I am using the build_lookup_table() function to do it. Each row in the table is rotated list of alphabets from a to z. I have used the rotate() function to do it.

