Lecture 20: Random Variables, Sampling Variablility

  1. The model should fit our training data well
  2. The new athletes

Random Variables {21, 21} -> 21

  • X is a function

    • argument is a sample: element of the domain
    • returns number: element of the range
  • Random Variables: X_1, X_2, X_3

  • P(X=x)

    • X: random Variable: function
    • x: what the function may return: number
    • chance X returns x

  • added up all of the probabilities
  • in a discrete probability we think of it as area
    • P(a <= X <= b)

  • in continuous

  • Bernoulli(p)
    • indicator variable I has value 1 if event happens and 0 if not
    • P(I = 1) = p
    • P(1 = 0) = 1-p
  • Binomial(n, p) $$ P(X = k) ~ = ~ \binom{n}{k} p^k(1-p)^{n-k}, ~~~~ 0 \le k \le n $$
# with scipy
# chance of 50 heads in 100 tosses of a fair coin
stats.binom.pmf(50, 100, 0.5)
  • Uniform
unif_density = stats.uniform.pdf(x)    # uniform (0, 1) density
  • Normal $$ f(x) ~ = ~ \frac{1}{\sqrt{2\pi}\sigma} e^{-\frac{1}{2}\big{(}\frac{x-\mu}{\sigma}\big{)}^2}, ~~~ -\infty < x < \infty $$
norm_density = stats.norm.pdf(x, 50, 5)      # 

  • there is no elementary closed form

Expectation

  • weighted average of possible values
  • weights: probabilities

one sample at a time:

  • E[X] = sum over all samples X(s) * P(s)
  • E[X] = sum over all x, x * P(X=x)

  • samples, P(s) and X(s)

  • instead P(X=x)

Properties

  • E[X+Y] = E[X] + E[Y]
  • S = X + Y
  • S(s) = X(s) + Y(s)
  • S(s) * P(s) = X(s)P(s) + Y(s)P(s)
  • do a sum S(s) * P(s) = X(s)P(s) + Y(s)P(s)

  • E[X]-E[5]=2-5=-3
  • E[(X-5)(X-5)]=E[X^2]-E[10x]+E[25]=13-20+25=18

Variance and SD

  • Var[x]=E[(x-E[x])^2]

  • pull out the term

  • D_s = S - mu_S

  • D_s = D_x + D_y

  • Var[X] + Var[Y] +2E[D_x D_y]

  • \( E(D_x D_y) = E((X- \mu x)(y-\mu Y)) \)

  • covariance

  • var[s] = var[x+y] only if the covariance is zero, they are independent


Random Variable

  • A random variable is a function mapping events to real numbers
    • \( X: \Omega -> \real, )