GS 2006/Math - Collins

Lab 19


Basically people get along better if they have more interests in common than if they don't. The purpose of this lab is to attempt to quantify this idea. Disclaimer: The results stated here or derived from this lab are for amusement purposes only. There is no scientific basis for any of the results


Suppose a bunch of people take a 10 question quiz on their likes and dislikes where they can score their answer on each question from -5 meaning 'severly dislikes' to +5 meaning 'loves it to death'. We could write a person's 10 answers in a vector. For person k, call this vector vk. Now if we wanted to know if person i and person j would get along, we could compare the answers, but, being mathematicians, a more mathematical way would be to compute the inner product of their 'answer vectors', i.e. <vi,vj>. Then based on the value we could rate their compatibility. If we computed this 'compatibility' index for all the people we could see who each person was most compatible with (the larger the index the better). For example if we did this with 3 people with 'answer vectors' like this:
v1 = [ 5    -3     1     0     4     3     0    -5     4    -1];
v2 = [ 1     3     5     3    -4    -1     5     5    -1     4];
v3 = [-5    -2     3    -5    -4    -3    -3     1    -3    -3];
Then by computing all the inner products (using dot) we'd get the following results
dot(v1,v2) = -51
dot(v1,v3) = -55
dot(v2,v3) = -11
From this we might conclude that person 1 wouldn't really get along with either #2 or #3 but might get along slightly better with #2. #2 would be somewhat neutral with #3 (and vice-versa) and of the three #2 and #3 are the most compatible.


For small samples this is easy to compute, but for a larger set we need to be more efficient. Write each 'answer vector' as a column vector and combine them into a large 10 x M matrix V where M is the number of people. Then since the Euclidean inner product of two column vectors u and v can be computed as uTv, we can compute the inner products of all the vectors with all the other vectors simply by computing VTV. This will be a M x M matrix and the (i,j) entry will be the compatibility index for person i with person j. From the example above, we get
V = [v1',v2',v3'];
C = V'*V
   102   -51   -55
   -51   128   -11
   -55   -11   116
For further processing (i.e. finding maximums) we need to first replace the diagonal elements with large negative values (as we know each person is compatible with themselves or at least we hope so, and we don't want your match to be you). Then we can find the maximum of each column. In MATLAB we do this, like:
D = diag(C);                    % save the diagonal elements
C = C - diag(D) - 1000*eye(3);  % replace the diagonal values with -1000
[z,ind] = max(C);               % compute the max and its location of each column
Then z will contain the maximum of each column, while ind(i) will be the number of the person most compatible with person i. In our sample problem, you'd get z = (-51,-11,-11) and ind = (2,3,2). Meaning person 1 is most compatible with person 2, and persons 2 and 3 are most compatible with each other.


1. Make up a 10 question like/dislike questionaire and ask at least 10 people to fill it out. You might want to take it to the dorm and ask other GSS students so you can become the e-Harmonizer of GSS.

Once you have the answers, process them to find the most compatible person for each person.

Turn in your questionaire and the results (you can just give me the numbers, i.e. the ind vector from above).

2. (Extra Fun) In a more scientific situation the approach would be similar except that the questions would be chosen more carefully and instead of using the standard Euclidean norm, they would use some sort of weighted norm. For example if they thought that similarities or differences in one category had more impact, they would weight the inner product in favor of that question, i.e. instead of using the term uivi to contribute to the inner product, they'd use 10uivi. Also, because answers to questions might be linked they might introduce a matrix A representing those links, e.g. instead of using the answers to 1 and 2 separately they might look at a combination of those two answers before they did the inner product. In this case, you'd compute the compatibility matrix via V'*A'*A*V.

Create a 10 x 10 matrix A which changes the weights on each question or combines the answers to the questions and compute the compatiblilty results for your sample.

Turn in the matrix and the results.