{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Polynomial Chaos: A technique for modeling uncertainty\n",
    "\n",
    "### Emily Gorcenski\n",
    "### @EmilyGorcenski\n",
    "\n",
    "[Download talk at emilygorcenski.com/post/infoshare-2022](emilygorcenski.com/post/infoshare-2022)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Content Notes\n",
    "\n",
    "- A lot of math, including measure theory and advanced probability theory\n",
    "- The cool stuff comes at the end\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Polynomial Chaos is a way of representing random variables/random processes in terms of orthogonal polynomials\n",
    "\n",
    "- Can be thought of a special case of Kosambi-Karhunen-Loève\n",
    "- Can equivalently be thought of as PCA over an orthogonal vector space of polynomials"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "#### KKL Transform\n",
    "$X_t = \\sum_{i=0}^\\infty Z_i e_i(t)$\n",
    "\n",
    "#### PCA\n",
    "$X = \\sum_{i=0}^N \\langle \\phi_i, X \\rangle \\phi_i$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "#### Polynomial Chaos\n",
    "$X = \\sum_{i=0}^\\infty X_i \\Phi_i(\\zeta)$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Polynomial Chaos lets us represent annoying uncertainty in terms that we can control\n",
    "\n",
    "- Allows for acceleration of Monte Carlo simulation\n",
    "  - Shifts the burden from computing stochastic solutions to computing deterministic parameters\n",
    "- Allows us to handle distributions without convenient forms\n",
    "  - Good for generating synthetic data!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "The Cameron-Martin theorem guarantees that the Polynomial Chaos representation converges to an $L^2$ functional in the $L^2$ sense: we can therefore use it to represent a random process belonging to any distribution with a finite second moment (aka finite variance)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Simpler Terms: We can represent random variables of arbitrary distributions using a Polynomial Chaos expansion about a random variable **of our choice**."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "There exists a natural relationship between orthogonal polynomials and probability distributions.\n",
    "\n",
    "Hermite Polynomials:\n",
    "\n",
    "$$ \\langle H_i(\\zeta)\\, H_j(\\zeta) \\rangle  = \\int_{-\\infty}^\\infty H_i(\\zeta) H_j(\\zeta) \\color{red}{e^{-\\frac{\\zeta^2}{2}}}\\, d\\zeta = \\color{red}{\\sqrt{2\\pi}} i! \\delta_{ij}$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "This looks very Gaussian! We can then use the Polynomial Chaos representation to write **any** finite-variance r.v. $k$ as a series of Hermite polynomials about a Gaussian r.v.\n",
    "\n",
    "$$k = \\sum_{i=0}^\\infty k_i H_i(\\zeta)$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "The $k_i$ are deterministic coefficients. We use the Galerkin projection to compute them:\n",
    "\n",
    "$$k_i = \\frac{\\langle k H_i(\\zeta) \\rangle}{\\langle H_i^2(\\zeta) \\rangle} = \\frac{1}{\\langle H_i^2 \\rangle} \\int_{-\\infty}^\\infty k H_i(\\zeta) e^{-\\frac{\\zeta^2}{2}}\\, d\\zeta.$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Let's use the fact that $k$ and $\\zeta$ are fully dependent and perform an inverse sample transform."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Using the CDF of both random variables, transform them to the same **uniform** random variable supported on $[0,1]$.\n",
    "\n",
    "$$u = G(\\zeta) = F(k).$$\n",
    "\n",
    "$$k = F^{-1}(u) \\stackrel{\\small{\\textrm{def}}}{=} h(u),$$\n",
    "$$\\zeta = G^{-1}(u) \\stackrel{\\small{\\textrm{def}}}{=} l(u).$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Using this transform, the integral is transformed:\n",
    "\n",
    "$$\\int_{-\\infty}^\\infty k H_i(\\zeta) e^{-\\frac{\\zeta^2}{2}}\\, d\\zeta \\Longrightarrow \\int_0^1 h(u) H_i(l(u))\\, du$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "We can then compute the coefficients $k_i$ with a fairly basic integral:\n",
    "\n",
    "$$k_i = \\frac{1}{\\langle H_i^2 \\rangle}\\int_0^1 h(u) H_i(l(u))\\, du.$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "What about that inner product? Three options:\n",
    "\n",
    "- Use the closed form representation from the orthogonality condition: $ \\langle H_i^2(\\zeta) \\rangle = \\sqrt{2\\pi} i!$;\n",
    "- Choose a scaling of $\\zeta$ such that the orthogonality condition is just the Kronecker delta;\n",
    "- Just compute the integral.\n",
    "\n",
    "Option 2 is best, but tedious. So let's go with option three using a numerical integrator, e.g. the Trapezoidal method or Gaussian Quadrature."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Ok less words more code"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Reminder: we want to compute $k_i$ so that we can assemble a random variable of an arbitrary distribution using Gaussian random variables with zero mean and standard deviation $1$ (aka standard normal):\n",
    "\n",
    "$$k = \\sum_{i=0}^\\infty k_i H_i(\\zeta)$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import numpy.polynomial.hermite_e as H\n",
    "from scipy.stats import norm, ks_2samp, skew\n",
    "from matplotlib import pyplot as plt\n",
    "from IPython.display import display\n",
    "from sdv.tabular import GaussianCopula, CTGAN, TVAE\n",
    "import pandas as pd\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "def Herm(n):\n",
    "    coeffs = [0] * (n+1)\n",
    "    coeffs[n] = 1\n",
    "    return coeffs\n",
    "\n",
    "# compute the inner product of hermite polynomials and return a function\n",
    "# to evaluate this inner product at a point x\n",
    "def inner_product(h1, h2):\n",
    "    return lambda x: H.hermeval(x, H.hermemul(h1, h2))\n",
    "\n",
    "# Use trapezoid integration to compute the integral of a function f\n",
    "# over the support [a, b] using n partitions\n",
    "def trapezoid_int(f, a, b, n=100):\n",
    "    P = [a + i * (b - a) / n for i in range(0, n + 1)]\n",
    "    F = [1 / 2 * np.abs(P[i+1] - P[i]) * (f(P[i+1]) + f(P[i])) for i in range(0, len(P)-1)]\n",
    "    return sum(F)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "# Define a handful of inverse cumulative distribution functions, from Wikipedia\n",
    "def unif_icdf(params):\n",
    "    a = params[0]\n",
    "    b = params[1]\n",
    "    return lambda u : u * (b - a) + a\n",
    "\n",
    "def expo_icdf(params):\n",
    "    return lambda u : -np.log(1-u)\n",
    "\n",
    "def norm_icdf(params):\n",
    "    return lambda u : norm.ppf(u, loc=0, scale=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "def approximate_rv_coeffs(P, h):\n",
    "    # initialize lists for output to make syntax more canonical with the math\n",
    "    ki = [0] * P\n",
    "    \n",
    "    # Set up Gauss-Hermite quadrature\n",
    "    m = P**2\n",
    "    x, w = H.hermegauss(m)\n",
    "    \n",
    "    # Compute the coefficients, and also build out k in the same pass\n",
    "    for i in range(0, P):\n",
    "        # compute the inner product with numerical integration\n",
    "        ip = sum([inner_product(Herm(i), Herm(i))(x[idx]) * w[idx] for idx in range(m)])\n",
    "        # compute the integral\n",
    "        integrand = lambda u : h(u) *  H.hermeval(norm.ppf(u, loc=0, scale=1), Herm(i))\n",
    "        ki[i] = np.sqrt(2*np.pi) / ip * trapezoid_int(integrand, 0.001, 1-0.001, 1000)\n",
    "    return ki"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "def generate_rv(ki, S):\n",
    "    # build out k termwise\n",
    "    return sum([ki[i] * H.hermeval(S, Herm(i)) for i in range(len(ki))])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "# Generate a bunch of Gaussian random variables to use\n",
    "N = 5000\n",
    "S = np.random.normal(loc=0, scale=1, size=N)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "h = unif_icdf([0,1])\n",
    "ki_uniform = approximate_rv_coeffs(13, h)\n",
    "k = generate_rv(ki_uniform, S)\n",
    "out = plt.hist(k, bins=50)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "# what are the mean and standard deviation of a standard uniform distribution\n",
    "# mean = 0.5\n",
    "# std = 1 / √12 ~= 0.289\n",
    "\n",
    "ki_uniform[:2]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "h = expo_icdf([])\n",
    "ki_expo = approximate_rv_coeffs(9, h)\n",
    "k_expo = generate_rv(ki_expo, S)\n",
    "out = plt.hist(k_expo, bins=50)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Must we use Gaussian random variables?\n",
    "\n",
    "No! The Wiener-Askey scheme is rad.\n",
    "\n",
    "|             | Distribution      | Family     | Support                |\n",
    "|:------------|:------------------|:-----------|:-----------------------|\n",
    "| Continuous  | Gaussian          | Hermite    | $(-\\infty, \\infty)$    |\n",
    "|             | Uniform           | Legendre   | $[a,b]$                |\n",
    "|             | Gamma             | Laguerre   | $[0,\\infty)$           |\n",
    "|             | Beta              | Jacobi     | $[a,b]$                |\n",
    "| Discrete    | Poisson           | Charlier   | $\\{0,1,\\ldots\\}$       |\n",
    "|             | Negative Binomial | Miexner    | $\\{0,1,\\ldots\\}$       |\n",
    "|             | Binomial          | Krawtchouk | $\\{0,1,\\ldots\\, N\\}$   |\n",
    "|             | Hypergeometric    | Hahn       | $\\{0,1,\\ldots\\, N\\}$   |"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "$$\\langle \\Phi_i(\\zeta) \\Phi_j(\\zeta) \\rangle = \\int_S \\Phi_i(\\zeta) \\Phi_j(\\zeta) \\color{red}{w(\\zeta)\\, d\\zeta}$$\n",
    "\n",
    "$$w(\\zeta)\\, d\\zeta = \\frac{d\\mu}{d\\zeta}d\\zeta$$\n",
    "\n",
    "Generalize to\n",
    "\n",
    "$$\\langle \\Phi_i(\\zeta) \\Phi_j(\\zeta) \\rangle = \\int_S \\Phi_i(\\zeta) \\Phi_j(\\zeta) d\\mu \\Longrightarrow \\sum_{n\\in S} \\Phi_i\\left(\\zeta_n\\right)\\Phi_j\\left(\\zeta_n\\right)w\\left(\\zeta_n\\right)$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### So let's have a practical example"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Suppose we want to solve a differential equation:\n",
    "\n",
    "$$\\frac{dy(k; t)}{dt} = -ky(k; t).$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "# The basic RK4 ODE solver\n",
    "def RK4(f, ic, tspan, h):\n",
    "    t = [x * h for x in range(int((max(tspan) - min(tspan)) / h) + 1)]\n",
    "    y = [ic]\n",
    "    h2 = h / 2\n",
    "    for i in range(0, len(t)-1):\n",
    "        y1 = f(t[i], y[i])\n",
    "        y2 = f(t[i] + h2, y[i] + np.multiply(h2, y1))\n",
    "        y3 = f(t[i] + h2, y[i] + np.multiply(h2, y2))\n",
    "        y4 = f(t[i] + h, y[i] + np.multiply(h, y3))\n",
    "        y.append(y[i] + np.multiply(h/6, y1 + y4 + np.multiply(2, y2 + y3)))\n",
    "    return t, y"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "k = 1\n",
    "A = np.matrix([[-k]])\n",
    "ode = lambda t, x: A * x\n",
    "ic = np.matrix([1])\n",
    "t, y = RK4(ode, ic, [0, 10], .1)\n",
    "y1 = [yt.item() for yt in y]\n",
    "plt.plot(t, y1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Now suppose we let $k$ be a random parameter with support $S$, pdf $f(k)$ and mean $\\bar{k}$.\n",
    "\n",
    "$$\\bar{y}(t) = y_0 e^{-\\bar{k}t}$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "$$\\bar{y}(t) = y_0 \\int_S f(k)e^{-kt}\\, dk$$\n",
    "\n",
    "We'll run a Monte Carlo simulation, $N = 1000$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "y_dmc = []\n",
    "for idx in range(1000):\n",
    "    k = -np.random.uniform(0,1)\n",
    "    B = np.matrix([[k]])\n",
    "    icb = np.matrix([1])\n",
    "    odeb = lambda t, x: B * x\n",
    "    t, y = RK4(odeb, icb, [0, 10], .01)\n",
    "    y_dmc.append([yt.item() for yt in y])\n",
    "\n",
    "plt.plot(t, np.mean(y_dmc, axis=0))\n",
    "plt.plot(t, np.percentile(y_dmc, axis=0, q=95))\n",
    "plt.plot(t, np.percentile(y_dmc, axis=0, q=5))\n",
    "    "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Important note! The mean solution is not the same as the solution for the mean of $k$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "plt.plot(t, np.mean(y_dmc, axis=0))\n",
    "\n",
    "B = np.matrix([[-0.5]])\n",
    "icb = np.matrix([1])\n",
    "odeb = lambda t, x: B * x\n",
    "t, y = RK4(odeb, icb, [0, 10], .01)\n",
    "y_detmean = [yt.item() for yt in y]\n",
    "plt.plot(t, y_detmean, 'r--')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "We can use Polynomial Chaos to make this easier.\n",
    "\n",
    "$$k = \\sum_{i=0}^P k_i \\Phi_i(\\zeta)$$\n",
    "$$y(t) = \\sum_{i=0}^P y_i(t) \\Phi_i(\\zeta)$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "$$\\sum_{i=0}^P \\frac{dy_i}{dt} \\Phi_i(\\zeta) = -\\left(\\sum_{i=0}^P k_i \\Phi_i(\\zeta)\\right)\\left(\\sum_{i=0}^P y_i \\Phi_i(\\zeta)\\right)$$\n",
    "\n",
    "Perform the Galerkin projection and gather terms.\n",
    "\n",
    "$$\\frac{dy_l(t)}{dt} = -\\frac{1}{\\langle \\Phi_l^2(\\zeta)\\rangle} \\sum_{i=0}^P\\sum_{j=0}^P \\langle \\Phi_i \\Phi_j \\Phi_j \\rangle k_i y_j(t)$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "- $\\langle \\Phi_i \\Phi_j \\Phi_j \\rangle$: This is easily computed with Gauss-Hermite quadrature\n",
    "- $k_i$: We've already computed these!\n",
    "\n",
    "Result: we have a single $P^{\\text{th}}$-order linear ODE to solve!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "def triple_product(h1, h2, h3):\n",
    "    return lambda x : H.hermeval(x, H.hermemul(h1,\n",
    "                                               H.hermemul(h2, h3)))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "P = 5\n",
    "A = np.matrix(np.zeros((P,P)))\n",
    "m = P**3\n",
    "x, w = H.hermegauss(m)\n",
    "\n",
    "for l in range(P):\n",
    "    ip = sum([inner_product(Herm(l), Herm(l))(x[idx]) * w[idx] for idx in range(m)])\n",
    "    for j in range(P):\n",
    "        for i in range(P):\n",
    "            tp = sum([triple_product(Herm(i), Herm(j), Herm(l))(x[idx]) * w[idx] for idx in range(m)])\n",
    "            A[l,j] += - 1 / ip * tp * ki_uniform[i]\n",
    "\n",
    "ode = lambda t, x: A * x\n",
    "ic = np.matrix(np.zeros((P,1)))\n",
    "ic[0] = 1\n",
    "t, y = RK4(ode, ic, [0, 10], .01)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "plt.plot(t, np.mean(y_dmc, axis=0), 'b-')\n",
    "plt.plot(t, np.mean(y_dmc, axis=0)+3*np.var(y_dmc, axis=0), 'g-')\n",
    "plt.plot(t, np.mean(y_dmc, axis=0)-3*np.var(y_dmc, axis=0), 'r-')\n",
    "\n",
    "y_pc_mean = [x[0,0] for x in y]\n",
    "y_pc_var = [x[1,0]**2 for x in y]\n",
    "plt.plot(t, y_pc_mean, 'b--')\n",
    "plt.plot(t, np.add(y_pc_mean, np.multiply(3, y_pc_var)), 'g--')\n",
    "plt.plot(t, np.add(y_pc_mean, np.multiply(-3, y_pc_var)), 'r--')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [],
   "source": [
    "# y_pc = []\n",
    "# for zeta in S[:500]:\n",
    "#     y_pc.append([[H.hermeval(zeta, np.squeeze(x.tolist())) for i in range(P)] for x in y])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "# let's generate the mean and confidence intervals for \n",
    "y_pc = [[[H.hermeval(zeta, np.squeeze(x.tolist())) for i in range(P)] for x in y] for zeta in S[:500]]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "plt.plot(t, np.mean(y_dmc, axis=0), 'b-')\n",
    "plt.plot(t, np.percentile(y_dmc, axis=0, q=95), 'g-')\n",
    "plt.plot(t, np.percentile(y_dmc, axis=0, q=5), 'r-')\n",
    "plt.plot(t,np.mean(y_pc, axis=0), 'b--')\n",
    "plt.plot(t, np.percentile(y_pc, axis=0, q=95), 'g--')\n",
    "plt.plot(t, np.percentile(y_pc, axis=0, q=5), 'r--')\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Ok, but solving elementary differential equations isn't so useful. Let's do something a little more useful. How can we model arbitrary distributions?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "# Generate a bunch of Gaussian random variables to use\n",
    "N = 5000\n",
    "weird_function = lambda x : x**2 + np.exp(x)*np.sin(2*x) + 1/6.*np.sqrt(x)\n",
    "weird = weird_function(np.sort(np.random.uniform(size=N)))\n",
    "out = plt.hist(weird, bins=50)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "def basis(size, index):\n",
    "    arr = np.zeros(size)\n",
    "    arr[index] = 1.0\n",
    "    return arr\n",
    "\n",
    "order = 13\n",
    "gaussian = np.sort(np.random.normal(loc=0, scale=1, size=N))\n",
    "A = np.array([[H.hermeval(zeta, basis(order, idx)).item() for idx in range(order)] for zeta in gaussian])\n",
    "m, _, _, _ = np.linalg.lstsq(A, weird, rcond=None)\n",
    "out = plt.hist(H.hermeval(gaussian, m), bins=50)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Ok, but what about real world data?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "my_data = np.genfromtxt('transactions.csv', delimiter=',')\n",
    "transactions = np.sort(my_data[1:,2])\n",
    "out = plt.hist(transactions, bins=50)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "order = 15\n",
    "N = np.size(transactions)\n",
    "gaussian = np.sort(np.random.normal(loc=0, scale=1, size=N))\n",
    "A = np.array([[H.hermeval(zeta, basis(order, idx)).item() for idx in range(order)] for zeta in gaussian])\n",
    "m, _, _, _ = np.linalg.lstsq(A, transactions, rcond=None)\n",
    "new_gaussian = np.sort(np.random.normal(loc=0, scale=1, size=N))\n",
    "out = plt.hist(H.hermeval(new_gaussian, m), bins=50)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "ks_2samp(transactions, H.hermeval(new_gaussian, m), mode=\"exact\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "{\n",
    "    \"sample\": {\n",
    "        \"mean\": np.mean(transactions),\n",
    "        \"std\": np.std(transactions)\n",
    "    },\n",
    "    \"gPC\":{\n",
    "        \"mean\": m[0],\n",
    "        \"std\": m[1]\n",
    "    }\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "df = pd.read_csv(\"transactions.csv\")\n",
    "df.drop(columns=['date', 'store_nbr'], inplace=True)\n",
    "model = GaussianCopula()\n",
    "model.fit(df)\n",
    "sdv_data = model.sample(10*N)['transactions']\n",
    "out = plt.hist(sdv_data, bins=50)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "{\n",
    "    \"sample\": {\n",
    "        \"mean\": np.mean(transactions),\n",
    "        \"std\": np.std(transactions)\n",
    "    },\n",
    "    \"gPC\": {\n",
    "        \"mean\": m[0],\n",
    "        \"std\": m[1]\n",
    "    },\n",
    "    \"sdv\": {\n",
    "        \"mean\": np.mean(sdv_data),\n",
    "        \"std\": np.std(sdv_data)\n",
    "    }\n",
    "}"
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Slideshow",
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}