is an interpreted programming language. Originally developed for linear algebra and engineering problems, but now with wide applicability and toolboxes for areas ranging from medicine, economics, and machine learning.
A good way to introduce yourself to a new language is by trying to solve a "non-trivial" problem; learning the tools and syntax necessary to solve the problem along the way. This motivates the syntax/tools in a "why" versus "what" way!
help@scc.bu.edu
jbevan@bu.edu
bgregor@bu.edu
format compact
Github repository of data we will use
https://github.com/bu-rcs/bu-rcs.github.io/tree/main/Bootcamp/Data
Citation
University of Wisconsin Population Health Institute. County Health Rankings & Roadmaps 2019.www.countyhealthrankings.org.
Data Source
https://www.countyhealthrankings.org/explore-health-rankings/rankings-data-documentation
T = readtable("NE_HealthData.csv");
states=unique(T.State)
How do we find a particular state in the table?
state_inds=cellfun(@(c)strcmp(c,T.State),states,'UniformOutput',false)
mymap = containers.Map(states,state_inds)
T(mymap("Connecticut"),{'FIPS','County','x_Smokers'})
x_FairOrPoorHealth x_Smokers x_AdultsWithObesity x_PhysicallyInactive x_WithAccessToExerciseOpportunities x_ExcessiveDrinking x_SomeCollege x_ChildrenInPoverty x_SevereHousingProblems x_DriveAloneToWork x_LongCommute_DrivesAlone
table2array(T(mymap("Connecticut"),[4:9,11:15]))
averages = zeros(numel(states),11);
it = 1;
for state = states
averages(it,:) = mean(table2array(T(mymap(state),[4:9,11:15])))
it = it + 1;
end
T.County(T.x_ChildrenInPoverty>10)
Things to explore:
-Correlation plots
-Crude geographic plots
-Unrelated data to show time series example
Correlation Plots
plot(T.x_ChildrenInPoverty, T.x_Smokers)
scatter(T.x_ChildrenInPoverty, T.x_Smokers)
%plot inline -s 800,800
scatter(T.x_ChildrenInPoverty, T.x_Smokers,'.r')
title("Poverty vs Smoking for NE Counties")
xlabel('% Children in Poverty')
ylabel('% Smokers')
ax = ancestor(h,'axes');
ax.YAxis.Exponent=0;
ytickformat('%2.1f')
axis tight
%plot inline -s 800,800
scatter(T.x_ChildrenInPoverty, T.x_Smokers,'.r')
title("Poverty vs Smoking for NE Counties")
xlabel('% Children in Poverty')
ylabel('% Smokers')
ax = ancestor(h,'axes');
ax.YAxis.Exponent=0;
ytickformat('%2.1f')
axis tight
hold on
%Plot linear regression and 95% confidence interval
[coeff, S] = polyfit(T.x_ChildrenInPoverty, T.x_Smokers,1);
xFit = linspace(min(T.x_ChildrenInPoverty), max(T.x_ChildrenInPoverty), 100);
[yFit, delta] = polyval(coeff , xFit, S);
plot(xFit, yFit, 'b-', 'LineWidth', 2);
plot(xFit,yFit+2*delta,'m--',xFit,yFit-2*delta,'m--')
legend('Data','Linear Fit','95% Prediction Interval','Location','northwest')
Crude Geographic Plots
Census Data Source:
https://www.census.gov/geographies/reference-files/time-series/geo/gazetteer-files.html
Direct link:
https://www2.census.gov/geo/docs/maps-data/data/gazetteer/2019_Gazetteer/2019_Gaz_counties_national.zip
C=readtable('counties.txt');
C(1,:)
lat = C.INTPTLAT;
long = C.INTPTLONG;
scatter(lat,long)
contiguous = long>-125 & long<-50 & lat>24;
scatter(lat(contiguous),long(contiguous),'.')
%plot -s 900,400
contig = long>-125 & long<-50 & lat>24;
clong = long(contig);
clat = lat(contig);
scatter(clong,clat,'.')
axis tight equal
%plot native -s 2000,800
%Neat MATLAB built-in function:
viscircles([clong,clat],ones(size(clat))*.2)
axis tight equal
%plot native
dt = delaunayTriangulation(clong,clat);
tri = dt.ConnectivityList;
xy = [clong, clat];
maxEdge = 2.6;
edges = [tri(:,[1 2]);tri(:,[1 3]);tri(:,[2 3])];
tri(any(reshape(sqrt((xy(edges(:,1),1) - xy(edges(:,2),1)).^2 + (xy(edges(:,1),2) - xy(edges(:,2),2)).^2),[],3) > maxEdge,2),:) = [];
figure
trisurf(tri,clong,clat,ones(size(clong)))
view(2)
Source data:
https://github.com/nytimes/covid-19-data
Direct link:
https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-states.csv
% Read-in data file
T=readtable("us-states.csv");
% Format contents how we want
cases = table2array(T(:,4));
date = table2array(T(:,1));
elapsed = days(date - date(1));
states = table2array(T(:,2));
unique_states = unique(states);
mymap = containers.Map(unique_states,1:55);
state_nums=cell2mat(mymap.values(states));
% Pre-allocate "infection" and "day" matrices
infection = zeros(numel(unique_states), elapsed(end)+1);
day = infection;
% Pre-allocate "state_counter" to be 1 for all states
state_counter = ones(1,numel(unique_states));
% Loop through formatted data
for record = 1:numel(cases)
% For next record, get correct row=state and column=state_counter(state)
row = state_nums(record);
column = state_counter(row);
state_counter(row) = state_counter(row) + 1;
% Put # infections in infection(row,column)
infection(row,column) = cases(record);
day(row,column) = elapsed(record);
% Put day in day(row,column)
end
% Find first day where infection>100, use this as starting day in plot
% Plot data for all states, starting from selected day
%plot -s 700,500
[vals,inds] = sort(max(infection'));
inds=inds(10000<vals & vals<700000);
for i = numel(inds):-5:1
want = infection(inds(i),:)>100;
h = plot(infection(inds(i), want));
hold on
end
title("Infections by State, starting from infections>100")
xlabel('Relative Day')
ylabel('Infections')
legend(unique_states(inds(numel(inds):-5:1)),'Location','northwest')
ax = ancestor(h,'axes');
ax.YAxis.Exponent=0;
ytickformat('%2.1f')
axis tight