{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Me: Josh Bevan - jbevan@bu.edu  \n",
    "Get Help from RCS: help@scc.bu.edu\n",
    "\n",
    "Our website: [rcs.bu.edu](http://rcs.bu.edu)  \n",
    "Tutorial eval: [rcs.bu.edu/eval](http://rcs.bu.edu/eval)  \n",
    "This notebook: http://scv.bu.edu/examples/matlab/Tutorials/PerfOpt/"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# **What is \"performance optimization\"?**\n",
    "Programs are run with finite resources: memory, cpu, disk, time, etc. For small or simple programs these limitations are unimportant; the program finishes running fast enough and within your constraints so that these limitations have no noticeable effect. However, most non-trivial programs eventually reach one of these constraints; usually memory or running time.\n",
    "\n",
    "Performance optimization in the context of this tutorial then means improvement of a program to minimize the effect of our computing environments limitations.\n",
    "\n",
    "You can break techniques to \"PO\" into several categories:\n",
    "1. Memory access efficiency\n",
    "2. Vectorization\n",
    "3. Use of \"a priori\" knowledge to specialize approach/methods\n",
    "4. Making use of/avoiding computer and language strengths/weaknesses\n",
    "5. Algorithmic improvement: asymptotic/\"big O\"\n",
    "\n",
    "Not 6: Parallelization. Improves performance, but does not \"optimize\" (actually usually has efficiency penalty)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "format compact"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Brief sidebar: Recall vectorization...**\n",
    "Vectorization is the process of performing the same operation on multiple pieces of data at the same time. This has to do with MATLAB's language/implementation specifics, but is also generally applicable to most other *interpreted* languages.\n",
    "\n",
    "It often takes the form of converting \"tight\" loops into operations on vectors/matrices.\n",
    "\n",
    "Here's an example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Elapsed time is 0.136295 seconds.\n",
      "Elapsed time is 0.081715 seconds.\n"
     ]
    }
   ],
   "source": [
    "nums = rand(10000,1);\n",
    "%============\n",
    "tic\n",
    "calc = zeros(10000,1);\n",
    "for t=1:1000\n",
    "    for i=1:10000\n",
    "        calc(i) = sin(nums(i));\n",
    "    end\n",
    "end\n",
    "toc\n",
    "%============\n",
    "tic\n",
    "for t=1:1000\n",
    "    calc2 = sin(nums);\n",
    "end\n",
    "toc"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Depending on MATLAB version, they may be close or not. Newer versions are able to better JIT (just-in-time) compile simple patterns. Consider this very similiar version and compare:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Elapsed time is 0.631402 seconds.\n",
      "Elapsed time is 0.080862 seconds.\n"
     ]
    }
   ],
   "source": [
    "nums = rand(10000,1);\n",
    "%============\n",
    "tic\n",
    "total = zeros(10000,1);\n",
    "for t=1:1000\n",
    "    for i=1:10000\n",
    "        sin(nums(i));\n",
    "    end\n",
    "end\n",
    "toc\n",
    "%============\n",
    "tic\n",
    "for t=1:1000\n",
    "    sin(nums);\n",
    "end\n",
    "toc"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can easily make situations where MATLAB has a hard time JITing. Try encapsulating operations inside functions:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%file test.m\n",
    "\n",
    "function out = test(a,b)\n",
    "    out = a + b;\n",
    "end"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%file dummy.m\n",
    "\n",
    "function out = dummy(in)\n",
    "end"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Elapsed time is 0.195207 seconds.\n",
      "Elapsed time is 2.584963 seconds.\n",
      "Elapsed time is 3.092294 seconds.\n",
      "Elapsed time is 0.037965 seconds.\n"
     ]
    }
   ],
   "source": [
    "nums = randi(100,10000,1); % Creates a 10000x1 matrix with  random integers values 1 to 100 inclusive\n",
    "\n",
    "% Original task:\n",
    "tic\n",
    "for t=1:10000\n",
    "    total = 0;\n",
    "    for i=1:10000\n",
    "        total = total + nums(i);\n",
    "    end\n",
    "end\n",
    "toc\n",
    "\n",
    "% Function call overhead:\n",
    "tic\n",
    "for t=1:10000\n",
    "    total = 0;\n",
    "    for i=1:10000\n",
    "        total = total + nums(i);\n",
    "        dummy(42)\n",
    "    end\n",
    "end\n",
    "toc\n",
    "\n",
    "% Original task encapsulated in function to prevent JITing:\n",
    "tic\n",
    "for t=1:10000\n",
    "    total = 0;\n",
    "    for i=1:10000\n",
    "        total = test(total, nums(i));\n",
    "    end\n",
    "end\n",
    "toc\n",
    "\n",
    "% Optimized built-ins are even better and should be used when possible:\n",
    "tic\n",
    "for t=1:10000\n",
    "    total = sum(nums);\n",
    "end\n",
    "toc"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Consider a simple example that shows all of the above categories of optimization:\n",
    "We want to calculate cumulative sum (prefix sum) for an array of numbers"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "nums =\n",
      "    83\n",
      "    10\n",
      "    21\n",
      "    80\n",
      "    69\n",
      "    47\n",
      "    90\n",
      "    41\n",
      "    66\n",
      "    96\n"
     ]
    }
   ],
   "source": [
    "nums = randi(100,10,1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "total =\n",
      "   603\n"
     ]
    }
   ],
   "source": [
    "total = 0;\n",
    "for i=1:numel(nums)\n",
    "    total = total + nums(i);\n",
    "end\n",
    "total"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%file csum.m\n",
    "\n",
    "% Calculate a cumulative sum from a to b\n",
    "% Does same amount of work as we would for a \"real\" csum\n",
    "function total = csum(a,b)\n",
    "    total = 0;\n",
    "    for i=a:b\n",
    "        total = total + i;\n",
    "    end\n",
    "end"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%file csum2.m\n",
    "\n",
    "% Calculate a cumulative sum from a to b\n",
    "function total = csum2(a,b)\n",
    "    total = zeros(numel(a:b),1);\n",
    "    for i=a:b\n",
    "        total(i) = total(i-1) + i;\n",
    "    end\n",
    "end"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Elapsed time is 0.983886 seconds.\n"
     ]
    }
   ],
   "source": [
    "% Get all the cumulative sums from 1, for the numbers 1 to 100\n",
    "tic\n",
    "for i=1:(1*43000)\n",
    "    myresults(i) = csum(1,i);\n",
    "end\n",
    "toc"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "ans =\n",
      "   1.0e+05 *\n",
      "  Columns 1 through 7\n",
      "    0.0001    0.0002    0.0005    0.0010    0.0022    0.0046    0.0100\n",
      "  Columns 8 through 13\n",
      "    0.0215    0.0464    0.1000    0.2154    0.4642    1.0000\n"
     ]
    }
   ],
   "source": [
    "logspace(1,5,13)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "% Look at the running time as we increase the range of numbers we get the cumsum for\n",
    "c = 1;\n",
    "for N=logspace(1,5,13)\n",
    "    tic\n",
    "    for i=1:N\n",
    "        myresults(i) = csum(1,i);\n",
    "    end\n",
    "    timings(c) = toc;\n",
    "    c = c+1;\n",
    "end\n",
    "plot(logspace(1,5,13),timings,'o-')\n",
    "xlabel(\"N\")\n",
    "ylabel(\"Runtime (s)\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can see the above plot shows our running time does not increase linearly as we increase $N$. We can talk about the \"asymptotic\" behavior of our program using \"big O\" notation. This describes the running time dependent on some critical parameter(s). In this case our critical parameter is $N$ and we'll show that our program scales as: $\\mathcal{O}(N^2)$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "ans =\n",
      "    5.5452\n",
      "ans =\n",
      "    5.5452\n"
     ]
    }
   ],
   "source": [
    "% Remember \"log rules\":\n",
    "log(2^8)\n",
    "8*log(2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "ans =\n",
      "    1.6750  -18.0474\n"
     ]
    }
   ],
   "source": [
    "% log-log plot of above, with linear regression in asymptotic region\n",
    "loglog(logspace(1,5,13),timings,'o-')\n",
    "xlabel(\"N\")\n",
    "ylabel(\"Runtime (s)\")\n",
    "polyfit(log(logspace(3,5,7)),log(timings(7:end)),1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What can we improve?\n",
    "1. Memory access efficiency\n",
    "2. Vectorization\n",
    "3. Use of \"a priori\" knowledge to specialize approach/methods\n",
    "4. Making use of/avoiding computer and language strengths/weaknesses\n",
    "5. Algorithmic improvement: asymptotic/\"big O\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Memory access efficiency\n",
    "- preallocation versus resizing\n",
    "- memory access patterns (e.g. column vs row \"strides\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "mysize=4300000;"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "myresults =\n",
      "     1     3\n",
      "myresults =\n",
      "     1     3     6\n",
      "myresults =\n",
      "     1     3     6    10\n",
      "myresults =\n",
      "     1     3     6    10    15\n",
      "myresults =\n",
      "     1     3     6    10    15    21\n",
      "myresults =\n",
      "     1     3     6    10    15    21    28\n",
      "myresults =\n",
      "     1     3     6    10    15    21    28    36\n",
      "myresults =\n",
      "     1     3     6    10    15    21    28    36    45\n",
      "Elapsed time is 0.445246 seconds.\n"
     ]
    }
   ],
   "source": [
    "clear myresults\n",
    "tic\n",
    "myresults(1)=1;\n",
    "for i=2:mysize\n",
    "    myresults(i) = myresults(i-1)+i;\n",
    "    if i<10\n",
    "        myresults\n",
    "    end\n",
    "end\n",
    "toc\n",
    "clear myresults"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "ans =\n",
      "     1     3     0     0     0     0     0     0     0     0\n",
      "ans =\n",
      "     1     3     6     0     0     0     0     0     0     0\n",
      "ans =\n",
      "     1     3     6    10     0     0     0     0     0     0\n",
      "ans =\n",
      "     1     3     6    10    15     0     0     0     0     0\n",
      "ans =\n",
      "     1     3     6    10    15    21     0     0     0     0\n",
      "ans =\n",
      "     1     3     6    10    15    21    28     0     0     0\n",
      "ans =\n",
      "     1     3     6    10    15    21    28    36     0     0\n",
      "ans =\n",
      "     1     3     6    10    15    21    28    36    45     0\n",
      "Elapsed time is 0.044021 seconds.\n"
     ]
    }
   ],
   "source": [
    "clear myresults\n",
    "tic\n",
    "myresults = zeros(mysize,1);\n",
    "myresults(1)=1;\n",
    "for i=2:mysize\n",
    "    myresults(i) = myresults(i-1)+i;\n",
    "    if i<10\n",
    "        myresults(1:10)'\n",
    "    end\n",
    "end\n",
    "toc\n",
    "clear myresults"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "ans =\n",
      "   0.969016065313065 -15.958886319673882\n"
     ]
    }
   ],
   "source": [
    "c = 1;\n",
    "xx = logspace(1,7,13);\n",
    "for N=xx\n",
    "    tic\n",
    "    myresults(1)=1;\n",
    "    for i=2:N\n",
    "        myresults(i) = myresults(i-1)+i;\n",
    "    end\n",
    "    timings(c) = toc;\n",
    "    c = c+1;\n",
    "end\n",
    "loglog(xx,timings)\n",
    "polyfit(log(xx(end-5:end)),log(timings(end-5:end)),1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Vectorization"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "% If time permits..."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### *a priori* knowledge"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "nums =\n",
      "    63\n",
      "    10\n",
      "     4\n",
      "    33\n",
      "    65\n",
      "    59\n",
      "    68\n",
      "    80\n",
      "    39\n",
      "     8\n"
     ]
    }
   ],
   "source": [
    "nums=randi(100,10,1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "before =\n",
      "    63\n",
      "    73\n",
      "    77\n",
      "   110\n",
      "   175\n",
      "   234\n",
      "   302\n",
      "   382\n",
      "   421\n",
      "   429\n"
     ]
    }
   ],
   "source": [
    "% What if we want to calculate a cumulative sum every time an entry changes?\n",
    "total = zeros(10,1);\n",
    "total(1)=nums(1);\n",
    "for i=2:numel(nums)\n",
    "    total(i) = total(i-1) + nums(i);\n",
    "end\n",
    "before=total"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "changed_index =\n",
      "     5\n",
      "nums =\n",
      "    40\n",
      "    90\n",
      "    54\n",
      "     9\n",
      "    99\n",
      "     2\n",
      "    91\n",
      "    38\n",
      "     2\n",
      "    51\n"
     ]
    }
   ],
   "source": [
    "changed_index = randi(numel(nums))\n",
    "nums(changed_index)=99"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "after =\n",
      "    63\n",
      "    73\n",
      "    77\n",
      "   110\n",
      "   175\n",
      "   234\n",
      "   302\n",
      "   382\n",
      "   421\n",
      "   429\n"
     ]
    }
   ],
   "source": [
    "total(1)=nums(1);\n",
    "for i=2:numel(nums)\n",
    "    total(i) = total(i-1) + nums(i);\n",
    "end\n",
    "after=total"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "ans =\n",
      "     0\n",
      "     0\n",
      "     0\n",
      "     0\n",
      "     0\n",
      "     0\n",
      "     0\n",
      "     0\n",
      "    28\n",
      "    28\n"
     ]
    }
   ],
   "source": [
    "after-before"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "before =\n",
      "    40\n",
      "   139\n",
      "   193\n",
      "   202\n",
      "   301\n",
      "   400\n",
      "   491\n",
      "   590\n",
      "   689\n",
      "   740\n",
      "changed_index =\n",
      "    10\n",
      "changed_index =\n",
      "     9\n",
      "changed_index =\n",
      "     2\n",
      "changed_index =\n",
      "     1\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Array indices must be positive integers or logical values.\n"
     ]
    }
   ],
   "source": [
    "% How can we modify this to take advantage of our *a priori* knowledge?\n",
    "total = zeros(10,1);\n",
    "total(1)=nums(1);\n",
    "for i=2:numel(nums)\n",
    "    total(i) = total(i-1) + nums(i);\n",
    "end\n",
    "before=total\n",
    "% Before change\n",
    "num_changes = 100;\n",
    "for i=1:num_changes\n",
    "    changed_index = randi(numel(nums))\n",
    "    nums(changed_index)=99;\n",
    "    % After change\n",
    "    total(1)=nums(1);\n",
    "\n",
    "    total = before;\n",
    "    for i=changed_index:numel(nums)\n",
    "        total(i) = total(i-1) + nums(i);\n",
    "    end\n",
    "    after=total;\n",
    "end"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Language strengths/weaknesses"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Fibonacci sequence: 1,1,2,3,5,8,13"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%file myfib.m\n",
    "\n",
    "function f = myfib(n)\n",
    "    if n < 2\n",
    "        f = n;\n",
    "        return\n",
    "    else\n",
    "        f = myfib(n-1) + myfib(n-2);\n",
    "    end\n",
    "end"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "ans =\n",
      "    55\n",
      "Elapsed time is 0.011366 seconds.\n"
     ]
    }
   ],
   "source": [
    "tic\n",
    "myfib(10)\n",
    "toc"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "i =\n",
      "    31\n",
      "i =\n",
      "    32\n",
      "i =\n",
      "    33\n",
      "i =\n",
      "    34\n",
      "i =\n",
      "    35\n",
      "i =\n",
      "    36\n",
      "i =\n",
      "    37\n",
      "i =\n",
      "    38\n",
      "i =\n",
      "    39\n",
      "i =\n",
      "    40\n"
     ]
    }
   ],
   "source": [
    "timer = zeros(40,1);\n",
    "for i=1:40\n",
    "    tic\n",
    "    myfib(i);\n",
    "    timer(i) = toc;\n",
    "    if i>30\n",
    "        i\n",
    "    end\n",
    "end"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "metadata": {},
   "outputs": [],
   "source": [
    "loglog(timer(10:end))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Algorithmic improvements\n",
    "\n",
    "*https://en.wikipedia.org/wiki/Fenwick_tree* <br>\n",
    "\"A flat array of N values can either store the values or the prefix sums. In the first case, computing prefix sums requires linear time; in the second case, updating the array values requires linear time (in both cases, the other operation can be performed in constant time).\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "nums=randi(100,10,1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "N = numel(nums);\n",
    "total = nums;\n",
    "for i=2:N\n",
    "    total(i) = total(i-1) + nums(i);\n",
    "end\n",
    "total"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# **Advanced Vectorization Methods**\n",
    "\n",
    "Sometimes we have two sets of data etc. that we need to interact across all elements. Imagine you have a series of partially full boxes and a variety of items you can pack in them, but you can't exceed a max weight."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "myboxes = [9, 11, 7, 20, 1, 19];\n",
    "items = [1, 3, 4, 10, 19];\n",
    "max_weight = 20;\n",
    "\n",
    "res = zeros(numel(myboxes),numel(items));\n",
    "c = 1;\n",
    "for i=myboxes\n",
    "    res(c,:)=(i + items);\n",
    "    c=c+1;\n",
    "end\n",
    "res\n",
    "res<=max_weight"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "repmat(myboxes',1,numel(items))+repmat(items,numel(myboxes),1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "https://www.mathworks.com/help/matlab/ref/bsxfun.html\n",
    "\n",
    "- Binary\n",
    "- Singleton\n",
    "- Expansion"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "bsxfun(@plus,myboxes',items)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Another \"advanced\" technique: Use linear algebra when you can (and know how):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "cool"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Can we write a short program using loops to find the \"neighbor sum\" for each element?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Is there a better way?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "A=toeplitz([[1 1] zeros(1,5-2)])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "cool\n",
    "A*cool*A"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# **Instrumentation/metrics**\n",
    "Program bottlenecks and code performance will often defy your intuition. Therefore it's important to *empirically measure* what's going on.\n",
    "\n",
    "Consider a simple example: based on what we've learned which of these is faster?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "tic\n",
    "for t=1:100000\n",
    "    total = 0;\n",
    "    for i=1:10000\n",
    "        total = total + i;\n",
    "    end\n",
    "end\n",
    "toc\n",
    "\n",
    "tic\n",
    "for t=1:100000\n",
    "    total = sum(1:10000);\n",
    "end\n",
    "toc"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "total"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For non-trivial programs this measurement approach is insufficient. Let's take a look at a \"real\" program and see how we can profile it in MATLAB."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# **Recommended algorithms starting places:**\n",
    "\n",
    "https://en.wikipedia.org/wiki/Introduction_to_Algorithms (CLRS)\n",
    "\n",
    "Handbook of Mathematical Functions aka Abramowitz and Stegun:\n",
    "\n",
    "https://en.wikipedia.org/wiki/Abramowitz_and_Stegun\n",
    "and\n",
    "https://dlmf.nist.gov/"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "MATLAB",
   "language": "matlab",
   "name": "imatlab"
  },
  "language_info": {
   "codemirror_mode": "octave",
   "file_extension": ".m",
   "mimetype": "text/x-matlab",
   "name": "matlab",
   "pygments_lexer": "matlab",
   "version": "9.12.0.1884302 (R2022a)"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}