{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Polynomial Chaos: A technique for modeling uncertainty\n", "\n", "### Emily Gorcenski\n", "### @EmilyGorcenski\n", "\n", "[Download talk at emilygorcenski.com/post/infoshare-2022](emilygorcenski.com/post/infoshare-2022)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Content Notes\n", "\n", "- A lot of math, including measure theory and advanced probability theory\n", "- The cool stuff comes at the end\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Polynomial Chaos is a way of representing random variables/random processes in terms of orthogonal polynomials\n", "\n", "- Can be thought of a special case of Kosambi-Karhunen-Loève\n", "- Can equivalently be thought of as PCA over an orthogonal vector space of polynomials" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### KKL Transform\n", "$X_t = \\sum_{i=0}^\\infty Z_i e_i(t)$\n", "\n", "#### PCA\n", "$X = \\sum_{i=0}^N \\langle \\phi_i, X \\rangle \\phi_i$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Polynomial Chaos\n", "$X = \\sum_{i=0}^\\infty X_i \\Phi_i(\\zeta)$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Polynomial Chaos lets us represent annoying uncertainty in terms that we can control\n", "\n", "- Allows for acceleration of Monte Carlo simulation\n", " - Shifts the burden from computing stochastic solutions to computing deterministic parameters\n", "- Allows us to handle distributions without convenient forms\n", " - Good for generating synthetic data!" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "The Cameron-Martin theorem guarantees that the Polynomial Chaos representation converges to an $L^2$ functional in the $L^2$ sense: we can therefore use it to represent a random process belonging to any distribution with a finite second moment (aka finite variance)." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Simpler Terms: We can represent random variables of arbitrary distributions using a Polynomial Chaos expansion about a random variable **of our choice**." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "There exists a natural relationship between orthogonal polynomials and probability distributions.\n", "\n", "Hermite Polynomials:\n", "\n", "$$ \\langle H_i(\\zeta)\\, H_j(\\zeta) \\rangle = \\int_{-\\infty}^\\infty H_i(\\zeta) H_j(\\zeta) \\color{red}{e^{-\\frac{\\zeta^2}{2}}}\\, d\\zeta = \\color{red}{\\sqrt{2\\pi}} i! \\delta_{ij}$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "This looks very Gaussian! We can then use the Polynomial Chaos representation to write **any** finite-variance r.v. $k$ as a series of Hermite polynomials about a Gaussian r.v.\n", "\n", "$$k = \\sum_{i=0}^\\infty k_i H_i(\\zeta)$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "The $k_i$ are deterministic coefficients. We use the Galerkin projection to compute them:\n", "\n", "$$k_i = \\frac{\\langle k H_i(\\zeta) \\rangle}{\\langle H_i^2(\\zeta) \\rangle} = \\frac{1}{\\langle H_i^2 \\rangle} \\int_{-\\infty}^\\infty k H_i(\\zeta) e^{-\\frac{\\zeta^2}{2}}\\, d\\zeta.$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Let's use the fact that $k$ and $\\zeta$ are fully dependent and perform an inverse sample transform." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Using the CDF of both random variables, transform them to the same **uniform** random variable supported on $[0,1]$.\n", "\n", "$$u = G(\\zeta) = F(k).$$\n", "\n", "$$k = F^{-1}(u) \\stackrel{\\small{\\textrm{def}}}{=} h(u),$$\n", "$$\\zeta = G^{-1}(u) \\stackrel{\\small{\\textrm{def}}}{=} l(u).$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Using this transform, the integral is transformed:\n", "\n", "$$\\int_{-\\infty}^\\infty k H_i(\\zeta) e^{-\\frac{\\zeta^2}{2}}\\, d\\zeta \\Longrightarrow \\int_0^1 h(u) H_i(l(u))\\, du$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "We can then compute the coefficients $k_i$ with a fairly basic integral:\n", "\n", "$$k_i = \\frac{1}{\\langle H_i^2 \\rangle}\\int_0^1 h(u) H_i(l(u))\\, du.$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "What about that inner product? Three options:\n", "\n", "- Use the closed form representation from the orthogonality condition: $ \\langle H_i^2(\\zeta) \\rangle = \\sqrt{2\\pi} i!$;\n", "- Choose a scaling of $\\zeta$ such that the orthogonality condition is just the Kronecker delta;\n", "- Just compute the integral.\n", "\n", "Option 2 is best, but tedious. So let's go with option three using a numerical integrator, e.g. the Trapezoidal method or Gaussian Quadrature." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Ok less words more code" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Reminder: we want to compute $k_i$ so that we can assemble a random variable of an arbitrary distribution using Gaussian random variables with zero mean and standard deviation $1$ (aka standard normal):\n", "\n", "$$k = \\sum_{i=0}^\\infty k_i H_i(\\zeta)$$" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "import numpy as np\n", "import numpy.polynomial.hermite_e as H\n", "from scipy.stats import norm, ks_2samp, skew\n", "from matplotlib import pyplot as plt\n", "from IPython.display import display\n", "from sdv.tabular import GaussianCopula, CTGAN, TVAE\n", "import pandas as pd\n", "%matplotlib inline" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "def Herm(n):\n", " coeffs = [0] * (n+1)\n", " coeffs[n] = 1\n", " return coeffs\n", "\n", "# compute the inner product of hermite polynomials and return a function\n", "# to evaluate this inner product at a point x\n", "def inner_product(h1, h2):\n", " return lambda x: H.hermeval(x, H.hermemul(h1, h2))\n", "\n", "# Use trapezoid integration to compute the integral of a function f\n", "# over the support [a, b] using n partitions\n", "def trapezoid_int(f, a, b, n=100):\n", " P = [a + i * (b - a) / n for i in range(0, n + 1)]\n", " F = [1 / 2 * np.abs(P[i+1] - P[i]) * (f(P[i+1]) + f(P[i])) for i in range(0, len(P)-1)]\n", " return sum(F)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "# Define a handful of inverse cumulative distribution functions, from Wikipedia\n", "def unif_icdf(params):\n", " a = params[0]\n", " b = params[1]\n", " return lambda u : u * (b - a) + a\n", "\n", "def expo_icdf(params):\n", " return lambda u : -np.log(1-u)\n", "\n", "def norm_icdf(params):\n", " return lambda u : norm.ppf(u, loc=0, scale=1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "def approximate_rv_coeffs(P, h):\n", " # initialize lists for output to make syntax more canonical with the math\n", " ki = [0] * P\n", " \n", " # Set up Gauss-Hermite quadrature\n", " m = P**2\n", " x, w = H.hermegauss(m)\n", " \n", " # Compute the coefficients, and also build out k in the same pass\n", " for i in range(0, P):\n", " # compute the inner product with numerical integration\n", " ip = sum([inner_product(Herm(i), Herm(i))(x[idx]) * w[idx] for idx in range(m)])\n", " # compute the integral\n", " integrand = lambda u : h(u) * H.hermeval(norm.ppf(u, loc=0, scale=1), Herm(i))\n", " ki[i] = np.sqrt(2*np.pi) / ip * trapezoid_int(integrand, 0.001, 1-0.001, 1000)\n", " return ki" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "def generate_rv(ki, S):\n", " # build out k termwise\n", " return sum([ki[i] * H.hermeval(S, Herm(i)) for i in range(len(ki))])" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "# Generate a bunch of Gaussian random variables to use\n", "N = 5000\n", "S = np.random.normal(loc=0, scale=1, size=N)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "h = unif_icdf([0,1])\n", "ki_uniform = approximate_rv_coeffs(13, h)\n", "k = generate_rv(ki_uniform, S)\n", "out = plt.hist(k, bins=50)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "# what are the mean and standard deviation of a standard uniform distribution\n", "# mean = 0.5\n", "# std = 1 / √12 ~= 0.289\n", "\n", "ki_uniform[:2]" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "h = expo_icdf([])\n", "ki_expo = approximate_rv_coeffs(9, h)\n", "k_expo = generate_rv(ki_expo, S)\n", "out = plt.hist(k_expo, bins=50)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Must we use Gaussian random variables?\n", "\n", "No! The Wiener-Askey scheme is rad.\n", "\n", "| | Distribution | Family | Support |\n", "|:------------|:------------------|:-----------|:-----------------------|\n", "| Continuous | Gaussian | Hermite | $(-\\infty, \\infty)$ |\n", "| | Uniform | Legendre | $[a,b]$ |\n", "| | Gamma | Laguerre | $[0,\\infty)$ |\n", "| | Beta | Jacobi | $[a,b]$ |\n", "| Discrete | Poisson | Charlier | $\\{0,1,\\ldots\\}$ |\n", "| | Negative Binomial | Miexner | $\\{0,1,\\ldots\\}$ |\n", "| | Binomial | Krawtchouk | $\\{0,1,\\ldots\\, N\\}$ |\n", "| | Hypergeometric | Hahn | $\\{0,1,\\ldots\\, N\\}$ |" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "$$\\langle \\Phi_i(\\zeta) \\Phi_j(\\zeta) \\rangle = \\int_S \\Phi_i(\\zeta) \\Phi_j(\\zeta) \\color{red}{w(\\zeta)\\, d\\zeta}$$\n", "\n", "$$w(\\zeta)\\, d\\zeta = \\frac{d\\mu}{d\\zeta}d\\zeta$$\n", "\n", "Generalize to\n", "\n", "$$\\langle \\Phi_i(\\zeta) \\Phi_j(\\zeta) \\rangle = \\int_S \\Phi_i(\\zeta) \\Phi_j(\\zeta) d\\mu \\Longrightarrow \\sum_{n\\in S} \\Phi_i\\left(\\zeta_n\\right)\\Phi_j\\left(\\zeta_n\\right)w\\left(\\zeta_n\\right)$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### So let's have a practical example" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Suppose we want to solve a differential equation:\n", "\n", "$$\\frac{dy(k; t)}{dt} = -ky(k; t).$$" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "# The basic RK4 ODE solver\n", "def RK4(f, ic, tspan, h):\n", " t = [x * h for x in range(int((max(tspan) - min(tspan)) / h) + 1)]\n", " y = [ic]\n", " h2 = h / 2\n", " for i in range(0, len(t)-1):\n", " y1 = f(t[i], y[i])\n", " y2 = f(t[i] + h2, y[i] + np.multiply(h2, y1))\n", " y3 = f(t[i] + h2, y[i] + np.multiply(h2, y2))\n", " y4 = f(t[i] + h, y[i] + np.multiply(h, y3))\n", " y.append(y[i] + np.multiply(h/6, y1 + y4 + np.multiply(2, y2 + y3)))\n", " return t, y" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "k = 1\n", "A = np.matrix([[-k]])\n", "ode = lambda t, x: A * x\n", "ic = np.matrix([1])\n", "t, y = RK4(ode, ic, [0, 10], .1)\n", "y1 = [yt.item() for yt in y]\n", "plt.plot(t, y1)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Now suppose we let $k$ be a random parameter with support $S$, pdf $f(k)$ and mean $\\bar{k}$.\n", "\n", "$$\\bar{y}(t) = y_0 e^{-\\bar{k}t}$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "$$\\bar{y}(t) = y_0 \\int_S f(k)e^{-kt}\\, dk$$\n", "\n", "We'll run a Monte Carlo simulation, $N = 1000$." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true, "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "y_dmc = []\n", "for idx in range(1000):\n", " k = -np.random.uniform(0,1)\n", " B = np.matrix([[k]])\n", " icb = np.matrix([1])\n", " odeb = lambda t, x: B * x\n", " t, y = RK4(odeb, icb, [0, 10], .01)\n", " y_dmc.append([yt.item() for yt in y])\n", "\n", "plt.plot(t, np.mean(y_dmc, axis=0))\n", "plt.plot(t, np.percentile(y_dmc, axis=0, q=95))\n", "plt.plot(t, np.percentile(y_dmc, axis=0, q=5))\n", " " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Important note! The mean solution is not the same as the solution for the mean of $k$." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "plt.plot(t, np.mean(y_dmc, axis=0))\n", "\n", "B = np.matrix([[-0.5]])\n", "icb = np.matrix([1])\n", "odeb = lambda t, x: B * x\n", "t, y = RK4(odeb, icb, [0, 10], .01)\n", "y_detmean = [yt.item() for yt in y]\n", "plt.plot(t, y_detmean, 'r--')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "We can use Polynomial Chaos to make this easier.\n", "\n", "$$k = \\sum_{i=0}^P k_i \\Phi_i(\\zeta)$$\n", "$$y(t) = \\sum_{i=0}^P y_i(t) \\Phi_i(\\zeta)$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "$$\\sum_{i=0}^P \\frac{dy_i}{dt} \\Phi_i(\\zeta) = -\\left(\\sum_{i=0}^P k_i \\Phi_i(\\zeta)\\right)\\left(\\sum_{i=0}^P y_i \\Phi_i(\\zeta)\\right)$$\n", "\n", "Perform the Galerkin projection and gather terms.\n", "\n", "$$\\frac{dy_l(t)}{dt} = -\\frac{1}{\\langle \\Phi_l^2(\\zeta)\\rangle} \\sum_{i=0}^P\\sum_{j=0}^P \\langle \\Phi_i \\Phi_j \\Phi_j \\rangle k_i y_j(t)$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "- $\\langle \\Phi_i \\Phi_j \\Phi_j \\rangle$: This is easily computed with Gauss-Hermite quadrature\n", "- $k_i$: We've already computed these!\n", "\n", "Result: we have a single $P^{\\text{th}}$-order linear ODE to solve!" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "def triple_product(h1, h2, h3):\n", " return lambda x : H.hermeval(x, H.hermemul(h1,\n", " H.hermemul(h2, h3)))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "P = 5\n", "A = np.matrix(np.zeros((P,P)))\n", "m = P**3\n", "x, w = H.hermegauss(m)\n", "\n", "for l in range(P):\n", " ip = sum([inner_product(Herm(l), Herm(l))(x[idx]) * w[idx] for idx in range(m)])\n", " for j in range(P):\n", " for i in range(P):\n", " tp = sum([triple_product(Herm(i), Herm(j), Herm(l))(x[idx]) * w[idx] for idx in range(m)])\n", " A[l,j] += - 1 / ip * tp * ki_uniform[i]\n", "\n", "ode = lambda t, x: A * x\n", "ic = np.matrix(np.zeros((P,1)))\n", "ic[0] = 1\n", "t, y = RK4(ode, ic, [0, 10], .01)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "plt.plot(t, np.mean(y_dmc, axis=0), 'b-')\n", "plt.plot(t, np.mean(y_dmc, axis=0)+3*np.var(y_dmc, axis=0), 'g-')\n", "plt.plot(t, np.mean(y_dmc, axis=0)-3*np.var(y_dmc, axis=0), 'r-')\n", "\n", "y_pc_mean = [x[0,0] for x in y]\n", "y_pc_var = [x[1,0]**2 for x in y]\n", "plt.plot(t, y_pc_mean, 'b--')\n", "plt.plot(t, np.add(y_pc_mean, np.multiply(3, y_pc_var)), 'g--')\n", "plt.plot(t, np.add(y_pc_mean, np.multiply(-3, y_pc_var)), 'r--')" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "# y_pc = []\n", "# for zeta in S[:500]:\n", "# y_pc.append([[H.hermeval(zeta, np.squeeze(x.tolist())) for i in range(P)] for x in y])" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "# let's generate the mean and confidence intervals for \n", "y_pc = [[[H.hermeval(zeta, np.squeeze(x.tolist())) for i in range(P)] for x in y] for zeta in S[:500]]" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "plt.plot(t, np.mean(y_dmc, axis=0), 'b-')\n", "plt.plot(t, np.percentile(y_dmc, axis=0, q=95), 'g-')\n", "plt.plot(t, np.percentile(y_dmc, axis=0, q=5), 'r-')\n", "plt.plot(t,np.mean(y_pc, axis=0), 'b--')\n", "plt.plot(t, np.percentile(y_pc, axis=0, q=95), 'g--')\n", "plt.plot(t, np.percentile(y_pc, axis=0, q=5), 'r--')\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Ok, but solving elementary differential equations isn't so useful. Let's do something a little more useful. How can we model arbitrary distributions?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "# Generate a bunch of Gaussian random variables to use\n", "N = 5000\n", "weird_function = lambda x : x**2 + np.exp(x)*np.sin(2*x) + 1/6.*np.sqrt(x)\n", "weird = weird_function(np.sort(np.random.uniform(size=N)))\n", "out = plt.hist(weird, bins=50)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "def basis(size, index):\n", " arr = np.zeros(size)\n", " arr[index] = 1.0\n", " return arr\n", "\n", "order = 13\n", "gaussian = np.sort(np.random.normal(loc=0, scale=1, size=N))\n", "A = np.array([[H.hermeval(zeta, basis(order, idx)).item() for idx in range(order)] for zeta in gaussian])\n", "m, _, _, _ = np.linalg.lstsq(A, weird, rcond=None)\n", "out = plt.hist(H.hermeval(gaussian, m), bins=50)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Ok, but what about real world data?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "my_data = np.genfromtxt('transactions.csv', delimiter=',')\n", "transactions = np.sort(my_data[1:,2])\n", "out = plt.hist(transactions, bins=50)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "order = 15\n", "N = np.size(transactions)\n", "gaussian = np.sort(np.random.normal(loc=0, scale=1, size=N))\n", "A = np.array([[H.hermeval(zeta, basis(order, idx)).item() for idx in range(order)] for zeta in gaussian])\n", "m, _, _, _ = np.linalg.lstsq(A, transactions, rcond=None)\n", "new_gaussian = np.sort(np.random.normal(loc=0, scale=1, size=N))\n", "out = plt.hist(H.hermeval(new_gaussian, m), bins=50)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "ks_2samp(transactions, H.hermeval(new_gaussian, m), mode=\"exact\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "{\n", " \"sample\": {\n", " \"mean\": np.mean(transactions),\n", " \"std\": np.std(transactions)\n", " },\n", " \"gPC\":{\n", " \"mean\": m[0],\n", " \"std\": m[1]\n", " }\n", "}" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "df = pd.read_csv(\"transactions.csv\")\n", "df.drop(columns=['date', 'store_nbr'], inplace=True)\n", "model = GaussianCopula()\n", "model.fit(df)\n", "sdv_data = model.sample(10*N)['transactions']\n", "out = plt.hist(sdv_data, bins=50)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "{\n", " \"sample\": {\n", " \"mean\": np.mean(transactions),\n", " \"std\": np.std(transactions)\n", " },\n", " \"gPC\": {\n", " \"mean\": m[0],\n", " \"std\": m[1]\n", " },\n", " \"sdv\": {\n", " \"mean\": np.mean(sdv_data),\n", " \"std\": np.std(sdv_data)\n", " }\n", "}" ] } ], "metadata": { "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.7" } }, "nbformat": 4, "nbformat_minor": 2 }