Me: Josh Bevan - jbevan@bu.edu

Helper: Dustin Clark - clarkdc@bu.edu

Get Help from RCS: help@scc.bu.edu

Our website: rcs.bu.edu

Tutorial eval: rcs.bu.edu/eval

This notebook: http://scv.bu.edu/examples/matlab/Tutorials/Intro/

Matlab_Logo2.png

MatrixLaboratory $\Rightarrow$ Matlab

is an interpreted programming language. Originally developed for linear algebra and engineering problems, but now with wide applicability and toolboxes for areas ranging from medicine, economics, and machine learning.

A good way to introduce yourself to a new language is by trying to solve a "non-trivial" problem; learning the tools and syntax necessary to solve the problem along the way. This motivates the syntax/tools in a "why" versus "what" way!

It is good for being "expressive", allowing you to use only a few lines of code to implement Conway's Game of Life:

In [ ]:
format compact
In [ ]:
%plot native
S=zeros(100); S(54:56,81:87)=[0,1,0,0,0,0,0;0,0,0,1,0,0,0;1,1,0,0,1,1,1;];
A= single(toeplitz([[1 1] zeros(1,100-2)])); pause(5)
for it=1:500
    S= min(1,max(0,S+ 1-mod(A*S*A +4,7)));    %Magic
    imagesc(S); axis square; axis off; drawnow;
end

Let's solve an example problem:

ProjectEuler is a good source of "starter" problems.

Problem 1:

"If we list all the natural numbers below 10 that are multiples of 3 or 5, we get 3, 5, 6 and 9. The sum of these multiples is 23.

Find the sum of all the multiples of 3 or 5 below 1000."

Pseudo-code:

1 "total" starts at 0
2 Loop over all the numbers 1 to 999
3     Is the number a multiple of 3 or 5?
4         If yes, add to our cumulative sum "total"
5 Try the next number

Line 1: We need to be able to store "variables". It can also be useful to display variables/info:

In [ ]:
% We sometimes want to leave helpful comments/explanations right in the
% code. This is a comment because it starts with: %

% A function optionally takes some input(s) and optionally has some output(s)
% it usually looks like:
% output = myfunctions(input)
disp("We can display things using this function.")

% ===== Let's play with variables to figure out line 1 =======
total = 0
total = 1
disp(total)
%Here we are using the equals sign "=" to indicate variable assignment
In [ ]:
% Appending a semicolon to the end of a line suppresses output
total = 6;
disp(total)
In [ ]:
% This adds 2 to the current value of the variable "total", and saves the sum
% back in total
total = total + 2

% We can also save things other than numbers to a variable.
% Like a "string" of characters:
my_secret_password = "joshiscool"

% Or even a range of numbers: using the first number, a colon, and the last number:
the_first_ten = 1:10

% That is called a "vector", you could also call it an "array"
% It has a "size" that depends on it's shape and how big it is along each
% of its dimensions
my_size = size(the_first_ten)

Line 2: We'd like to do the same action over and over, but on varying inputs. This is called a loop; there are several kinds including a "for" loop. We do the actions inside the loop "for" each value that the loop variable takes on

In [ ]:
% ======================= Now for line 2 =====================

start_num = 1;
end_num = 4;
for loop_variable = start_num : end_num
    loop_variable
end % Matlab knows where the loop ends because we "end" the loop.

We can see how we'd use this to solve the problem: we'll loop over all the numbers 1 to 999. Within the loop we'll test to see if they are a multiple of 3 or 5 and if so add it to our "total" variable.

Line 3: How do we test if a number is a multiple? How do we take an action depending on some question (in this case if it is a multiple). Let's answer the second question first:

In [ ]:
% ======================= Now for line 3 =====================

% Matlab represents "true" as 1 and "false" as 0
obvious_fact = 10 > 1
obviously_wrong = 999 < 2
a = 10+2;
b = 12;
also_true = (a==b)
% Here "=="re checking for equality, *not* doing variable assignment

% We can take conditional actions using an "if" statement
if a==b
    disp("I assure you a equals b")
else
    disp("Sadly a doesn't equal b")
end % Just like a "for" loop, we need to "end" an if statement
% We can also combine logical statements
both_arent_true = obviously_wrong & obvious_fact            % "and" statement
but_at_least_one_is_true = obviously_wrong | obvious_fact   % "or" statement

You need to be careful when using floating point numbers and testing for equality:

In [ ]:
format long %See all decimal digits

a = 2+2;
b = 1+3;
if a==b
    disp("a equals b") %What do you expect here?
else
    disp("a doesn't equal b")
end

c = 0.15 + 0.15;
d = 0.10 + 0.20;
if c==d
    disp("c equals d") %What do you expect here?
else
    disp("c doesn't equal d")
end

c
f=2/3
In [ ]:
c_minus_d = abs(c-d)

absolute_FP_error = eps(c)

if abs(c-d)<=1.5*eps(c)
    disp("c equals d, within floating point tolerance")
else
    disp("c doesn't equal d, within floating point tolerance")
end

format short %Hide some precision
In [ ]:
bank_account =1.37
bank_account = bank_account + 10.00
bank_account = bank_account - 11.37
if bank_account == 0
    disp("Ooops")
else
    disp("Buy even more stuff!")
end

Floating point numbers have limited precision: https://en.wikipedia.org/wiki/Floating-point_arithmetic#Internal_representation

There are many pitfalls to working with floating point numbers: https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html

So that answers the second question, but how do we test if something is a multiple? This is more a logic/math question than exclusively programming, but any non-trivial program will need you to figure things like this out. A number is a multiple of 3 if it is evenly divided by 3:

  • 3 evenly divides into 12, so 12 is a a multiple of 3
  • 3 doesn't evenly divide 13, so 13 isn't a multiple

So our problem is now to test for divisibility. Let's cook up a couple ways:

In [ ]:
% Test for divisibility
% Way 1
quotient = 13/3

% Way 1: Is the quotient the same rounded or not?
rounded = round(quotient,0)
% We can round to the nearest whole number
is_multiple = rounded==quotient

quotient = 12/3
rounded = round(quotient)
is_multiple = rounded==quotient
% Can you think of a different way?
test = mod(13,3)
test2 = mod(12,3)

mod(13,3)==0
~mod(13,3)

Let's apply these ideas to test if a number is divisible by 3 or 5.

In [ ]:
n=34095689
divisible3 = round(n/3)==(n/3);
divisible5 = round(n/5)==(n/5);

if divisible3 | divisible5
    disp("Divisible by 3 or 5")
    joshisawesome=2
elseif divisible3 & ~divisible5
    disp("Divisible by only 3")
    joshisawesome=3
elseif ~divisible3 & divisible5
    disp("Divisible by only 5")
    joshisawesome=4
else 
    joshisawesome=0
    disp("Error!")
end

joshisawesome+12

Putting it all together to solve Problem 1

In [ ]:
%Total starts at 0
total=0;
%Loop over all the numbers 1 to 999
%for i =1:999
Start=0; ender=999;
for i=Start:ender
    %Is the number a multiple of 3 or 5?
    rounded5 = round(i/5)==(i/5);
    rounded3 = round(i/3)==(i/3);
    if  rounded3 | rounded5
        total = total + i;
    end

    %if mod(i,3)==0 | mod(i,5)==0
        %total = total + i;
    %end
    
        %If yes, add to our cumulative sum
%Try the next number
end
total
In [ ]:
%Total starts at 0
total = 0;
%Loop over all the numbers 1 to 999
for n = 1:999
    %Is the number a multiple of 3 or 5?
    if round(n/3)==(n/3) | round(n/5)==(n/5)
        %If yes, add to our cumulative sum
        total = total + n;
    end
%Try the next number
end
disp(total)

What about the other loop construct?

In [ ]:
age = 0;
adult = 18;
sick =0;
disp("Am I an adult?")
while 1
    age = age + 1;
    sick = rand(1)>0.9;
    if sick
        break
    end
    if age>=adult
        break
    end
    disp (num2str(age)+" years old, not yet!")
end
disp(num2str(age)+" years old, finally an adult or I need to go to the doctor.")
In [ ]:
num=2
let="2"
word="two"

let + word

Vectors/Matrices and Vectorization

Is there a "faster" way to solve our Project Euler problem? We could vectorize.

Vector operations perform the same operation on multiple pieces of data. Consider a vector of the numbers 1 through 10. Recall we can make a vector like this easily:

In [ ]:
v = 1:10

If we want to find the sum of each of these numbers, plus 3, squared we could do:

In [ ]:
total = 0;
for i=v
    operation = (i+3)^2;
    total = total + operation;
end
total

We can actually do this all in one set of operations:

In [ ]:
sum((v+3).^2)

Notice that we need to put a "." before the exponentiation. This is because MATLAB treats matrix algebra as the expected operation. The dot tells it to perform the "scalar" version of the operation, going one-by-one through each element. See the difference here:

In [ ]:
v^2
In [ ]:
v.^2

Be careful as accidentally doing the wrong scalar/vector operation deosn't always result in an error message, some operations will work both ways.

In [ ]:
v = 1:10
u = [1:10]' %The single qoute mark gives the "transpose" of the matrix, making columns into rows and vice versa
In [ ]:
vector = v*u %This is actually an inner product
scalar = v.*u %This is effectively an outer product

So we can use this idea to make a more compact (and faster) way to do this:

In [ ]:
n = 1:999;
sum(n(round(n/3)==(n/3) | round(n/5)==(n/5)))

Functions

Another improvement we can make will make the code more readable, and also introduce how we can make our own functions. Imagine we want to do the same idea, but include several other allowable factors:

In [ ]:
n = 1:999;
sum(n(round(n/3)==(n/3) | round(n/5)==(n/5) | round(n/7)==(n/7) | round(n/11)==(n/11)))

Generally copying-pasting the same code over and over is bad. It is harder to read, but even worse also changing the copy-paste in one area doesn't update everywhere else. We can encapsulate our code to test for divisibility into our own function:

In [ ]:
%%file test_divis.m

function dog = test_divis(pancakes,divisor)
    dog = round(pancakes/divisor)==(pancakes/divisor);
end
In [ ]:
test_divis(998,3)

MATLAB requires functions to be defined in a separate file with the filename the same as the function name, though the "parent" function can also contain other function definitions. For simple functions it is possible to define them inline; these are called "anonymous" functions. You can define them this way:

In [ ]:
test_divis_anon = @(n,divisor) round(n/divisor)==(n/divisor);
In [ ]:
test_divis_anon(999,3)

So now we can use our own function to make it even cleaner:

In [ ]:
n = 1:999;
sum(n(test_divis(n,3) | test_divis(n,5)))
In [ ]:
write_paper_that_gets_into_nature()

A New Example Problem

Suppose you are given a data file with infection statistics by state in the USA and you want to plot them. Additionally, you want to set the "day 1" on the plot for each state to be the first actual day they had more 100 positive cases.

Source data: https://github.com/nytimes/covid-19-data On the SCC I downloaded the states file with:

wget https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-states.csv

How do we go about doing this?

Pseudo-code:

Read-in data file
Format contents how we want
Pre-allocate "data" matrix (to be all zero?)
Loop through formatted data
    Are we in a new day?
        Add one to the day counter
    end
    For each record, put it in the correct row by state, and column by day
end
Find first day where infection>100, use this as starting day in plot
Plot data for all states, starting from selected day

1,1,1,1,3,0,5,0,0,12,99

Any issues with this approach?

Alternate way:

Read-in data file
Format contents how we want
Pre-allocate "infection" and "day" matrices
Pre-allocate "state_counter" to be 1 for all states
Loop through formatted data
    For next record, get correct row=state and column=state_counter(state)
    Put # infections in infection(row,column)
    Put day in day(row,column)   
end
Find first day where infection>100, use this as starting day in plot
Plot data for all states, starting from selected day

How do we read in a csv?

When I first was making this tutorial I forgot, since I don't usually need to for my background. So I googled...

readcsv()
readmatrix()
readtable()
opts=detectImportOptions()
In [ ]:
opts=detectImportOptions("us-states.csv")
readmatrix("us-states.csv",opts)
In [ ]:
T = readtable("us-states.csv");
disp(T(1,:))
disp(T(1:5,2))
T(1,4)

How do we index vectors/matrices (aka "learn to love colons"). A quick sidebar:

In [ ]:
% Quick way to make a "test" matrix of any size
M = magic(5)
In [ ]:
% Matlab is "column-major" and "one indexed"
disp(M(1,1))
disp(M(1,2))
disp(M(3,1))
disp(M(5,5))
In [ ]:
v=1:10
v(3)
In [ ]:
% Slices
M(1,:)
M(2,:)
% Ranges
M(1:3,2)

To make things easier let's do some work on our data table; turning numbers into numbers, dates into days elapsed, and "mapping" states to a number (or cheat with fips).

We can do this with table2array()

In [ ]:
T(200:205,:)
In [ ]:
cases = table2array(T(:,4));
cases(205)
In [ ]:
date = table2array(T(:,1));
In [ ]:
date(1)
date(3)
date(3)-date(1)
days(date(3)-date(1))
In [ ]:
elapsed = days(date - date(1));
elapsed(205)
In [ ]:
states = table2array(T(:,2));
states(1:4)
In [ ]:
unique(states)
In [ ]:
mymap = containers.Map(unique(states),1:55);
In [ ]:
mymap('Wyoming')
In [ ]:
states(end-10:end)
In [ ]:
mymap(states)

How do we convert our list of states into the equivalent state numbers?

What do you do when you aren't sure how something works in Matlab, google it:

https://www.mathworks.com/matlabcentral/answers/64559-how-do-i-extract-a-vector-of-results-from-a-map

In [ ]:
mymap.values(states(1:10))
In [ ]:
state_nums=cell2mat(mymap.values(states));
state_nums(end)
In [ ]:
% If you want to avoid using a map you can kinda cheat using the "fips"
% but it's not perfect
unique(table2array(T(:,3)));

What else do we need?

Recall pseudo-code:

Read-in data file
Format contents how we want
Pre-allocate "infection" and "day" matrices
Pre-allocate "state_counter" to be 1 for all states
Loop through formatted data
    For next record, get correct row=state and column=state_counter(state)
    Put # infections in infection(row,column)
    Put day in day(row,column)   
end
Find first day where infection>100, use this as starting day in plot
Plot data for all states, starting from selected day
In [ ]:
elapsed(end)
In [ ]:
% Read-in data file
T=readtable("us-states.csv");
% Format contents how we want
cases = table2array(T(:,4));

date = table2array(T(:,1));
elapsed = days(date - date(1));


states = table2array(T(:,2));
unique_states = unique(states);
mymap = containers.Map(unique_states,1:55);
state_nums=cell2mat(mymap.values(states));

% Pre-allocate "infection" and "day" matrices
infection = zeros(size(unique_states,1), elapsed(end)+1);
day = infection;
% Pre-allocate "state_counter" to be 1 for all states
state_counter = ones(1,size(unique_states,1));
% Loop through formatted data
for record = 1:size(cases,1)
%    For next record, get correct row=state and column=state_counter(state)
    row = state_nums(record);
    column = state_counter(row);
    state_counter(row) = state_counter(row) + 1;
%    Put # infections in infection(row,column)
    infection(row,column) = cases(record);
    day(row,column) = elapsed(record);
%    Put day in day(row,column)   
end
% Find first day where infection>100, use this as starting day in plot
% Plot data for all states, starting from selected day
In [ ]: