GS 2006/Math - Collins

Lab 13
Hill Ciphers


Setup

Mod Arithmetic
In MATLAB you can do modular arithmetic by using the mod command. Type help mod to learn more about this command.
Suppose you want to compute various things mod 26, then you would compute
    4928   as   mod(4928,26)
    22+19  as   mod(22+19,26)
    4*16   as   mod(4*16,26)
    86-123 as   mod(86-123,26)
    -12    as   mod(-12,26)
Modular Inverses
To compute additive inverses, there are two obvious methods. First, since 0=26 (mod 26), the additive inverse of a number x is the number y such that x + y = 26. Thus y = 26 - x. The second way is to use mod(-x,26). The second way works even if x is not in the range 0-25.
Computing multiplicative inverses is more challenging. There is an algorithm that can be used to find the multiplicative inverse, but it is pretty complicated (It is based on the Euclid's Algorithm, which is a way of finding the GCD of 2 numbers). The easiest way (in MATLAB) to find multiplicative inverses is by exhaustion. Try the following:
    x = 4          % the number you want to find the inverse of
    num = [1:25]   % all the numbers from 1 to 25
    res = mod(x*num,26)   %  result of x*(1,...,25) mod 26
    find(res==1)   %  this finds which value gives an answer of 1
            % find will either produce a number (the inverse) or will
            % result in  "ans = []" meaning there is no inverse.
Alpha -> Numeric -> Alpha
To convert from letters to numbers and back we need to use 3 different commands upper, real and char. Try the following:
     P = 'your name'
     P = upper(P)
     cP = real(P
Note how the result is a list of numbers between 65 and 90, with 32s for the spaces. If you then subtract 65, the result is only numbers in the range 0-25, plus negative numbers for the spaces (and punctuation). Try the following:
     cP = cP - 65
     ind = find(cP<0)   %  this finds where non-letters were
     cP(ind) = []          %  and removes them
Now to convert back, try the following:
     xP = [2 7 20 2 10 2 14 11 11 8 13 18];
     xP = xP + 65;
     char(xP)
Now convert the coded version of your name (in cP) back to text.
Automating the Process
You can put all the commands to go from text to numbers and from numbers back to text into a script or function and then run it from MATLAB. Write this program and call it code1.m

If you want, a copy of this program is at the end of this file. To run it set P to the string you want to convert and then type code1. It will then convert P to a list of numbers.

Repeat this process with the commands to convert from numbers back to text and call this file "decode1.m" For example, try this

   P = 'I like math more than food'
   code1
   decode1
Try converting messages back and forth from text to numbers and back.

Additive and Multiplicative Ciphers

Additive ciphers are the classic codes you used as a kid where you shift each letter a certain number of letters forward or backward in the alphabet. In MATLAB, if you want to shift by x, you just do this to the number encoded data:
   x = 11;
   cP = mod(cP+x,26)
   decode1
To decode a message, set P to the encoded message and use the inverse shift in the addition above.

For a multiplicative cipher, just switch the addition to a multiplication.

   x = 5;
   cP = mod(cP*5,26)
   decode1
Exercise Set 1
1. For each of the numbers from 1 to 26 find its additive and multiplicative inverse. Note, some numbers don't have multiplicative inverses.

2. Decode the following messages by finding the right additive or multiplicative factor that was used to encode it:

  Msg1:

  XYGSCDRODSWOPYBKVVQYYNWOXDYMYWODYDROKSNYPDROSBMYEXDBIGKV
  UCYPDVIKXNMKBBIKLSQCDSMUNYXYDKCUGRKDIYEBMYEXDBIMKXNYPYBI
  YELEDGRKDIYEMKXNYPYBIYEBMYEXDBI

  Msg2:

  YNSYDBZSJZKRYQYJZKWARUMSQBKYNQKNCABZSCABKWQKQBZSUMSQBKYN
  YDIZSBZSFCABZSCABKWARHKQWYXSFKSQAFSWFSABSHYFHKQWYXSFSH

  Msg3:

  CLAQOYYAIOFAYNKNYONYOMKBRYTCWOJKWRCAYRNRKBTAYRWCOCRCNCVK
  BTKVYKLQCYYYYHOTTTCNIYVFONCVCYQSUFKFABROBVKROUKRO

Decoding Tricks

Substitution codes don't change the distribution of the letters in a message, i.e. every E is converted to the same letter. Thus if the original message has a typical English letter distribution then the encoded message's distribution will indicate what letter is what. For example, if the most common letter in the encoded message is 'X' then it is probably the encode of 'E'

To find the distribution, convert the message to numbers (code1) and then use the hist command (it produces a histogram of the data), like this

   hist(cP,26)
or
   val = hist(cP,26)
Use the first to get a graph, the second to get the count for each letter.

For a more sophisticated analysis, you can use the frequency for pairs of letters. I'll leave it you to figure out how to compute that frequency.

Hill Ciphers

Background
In MATLAB you can compute the determinant of a matrix by using det and the inverse using inv. For example, try
     A = [1 2; 3 4];
     det(A)
     inv(A)
     det(A)*inv(A)
Recall that if A is a matrix with integer entries then det(A)*inv(A) also has integer entries, no matter how big A is.
Encoding
Let E be our 2x2 encoding matrix. And let P be our message. The first step is to use code1 to get the matrix of numbers for P: code1

Check the length of cP, (length(cP). If it is odd, add another value to the end cP(end+1) = 0 else, leave it alone.

To put our message in the right 'shape' for the matrix multiplication (it must by 2xM), we use reshape cP = reshape(cP,2,length(cP)/2)

Now we can encode our message (mod 26) cP = mod(E*cP,26);

Before we can use decode1 to turn this back into text, we need to 'unfold' it (the result is 2xM). So we finish our encoding by cP = cP(:)'; decode1

Use the following matrix E and message P and see what you get: E = [1 2; 3 5]; P = 'Bobby builds baby bottles';
You should get: DVDIAZKWRWUHCFAZAHPIOY

Note that the 7 Bs in the original message get transformed into DDIZHFZ thus the character frequencies get all messed up.

Decoding
Decoding is just like encoding, except that we need to use the inverse of our encoding matrix. Since we are using mod 26 arithmetic we can't just use inv(E). Remember the formula for the inverse:

                     1    (  d  -b )             ( a   b )
   inverse of E =  -----  (        ), where  E = (       ).
                   det E  ( -c   a )             ( c   d )

We can use this same formula, but instead of dividing by the determinant, we multiply by the inverse (mod 26) of the determinant. And instead of using -b and -c, we use their additive inverses. In the case when E = [1 2; 3 5]; we have det(E) = -1 = 25. The inverse of 25 is 25, so the inverse of E (mod 26) is
         ( 5   24 )    (125  600)   (21   2)
      25 (        ) =  (        ) = (      )   (mod 26)
         (23    1 )    (575   25)   ( 3  25)
Let D = [21 2; 3 25]; and repeat the steps above that we used to encode the message, replacing E by D and starting with the encoded message. That would look like
    P = 'DVDIAZKWRWUHCFAZAHPIOY'
    code1;                     % convert to numbers
    cP = reshape(cP,2,11);     % changes cP into 2xM
    cP = mod(D*cP,26);         % decode message
    cP = cP(:)';               % make it a row vector
    decode1                    % convert to characters

Exercise Set 2

1. For each of the following matrices, find its inverse mod 26 and check that the matrix times its inverse equals the identity (mod 26):
E1 = [1, 2; 3, 5];
E2 = [23, 12; 20, 1];
E3 = [21, 20, 5; 12, 24, 11; 16, 19, 24];

2. Use the encoding matrix E1 from above to encode the message

BOBBY BUILDS BABY BOTTLES
What letters do the 7 Bs in the original message become?
Encode:
ROBERT BUILDS BABY BOTTLES
What letters do the last 4 Bs become? Note how the slight change in length changes the last part of the encoded message.

3. The following message was encoded using matrix E2 from above. Decode the message.

QGRHSBWANFATLUUONYKHKWZODVSX

4. (Extra) The following message was encoded using a 2x2 matrix in the Hill Cipher. Decode it.

TXPTPGLNWXVGPYFDFXZEXDQSEYFXXGLCPYXDCLSWLNOTT
YPWUXQYPWSFAGVQDRCLXDPEPWYKNARFGLSGWEJRTQBLAK
WGBEPGHXFNPGTXVXOIFAGUOAKADNVXQEYAMRZLDQLBYE
The number of possible decoding matrices is some number less than $25^4$. It could be worse: there are 26! different substitution ciphers.

Programs
code1.m
% code1.m  by Dr. Collins 6/19/03 for GSS:Math
% assumes P contains the message to be converted

P = upper(P);
cP = real(P);
cP = cP - 65;
ind = find(cP<0);
cP(ind) = [];
disp(cP)
decode1.m
% decode1.m  by Dr. Collins 6/19/01 for GSS:Math
% assumes cP contains the message as numbers

cP = cP + 65;
P = char(cP);
disp(P)
Mail: ccollins@math.utk.edu