using Printf, Statistics, StatsBase, Random, Distributions
include("jlFiles/printmat.jl")
Random.seed!(678) #set the random number generator to this starting point
using Plots
gr(size=(480,320))
default(fmt = :svg)
This exam explores how autocorrelation ought to change how we test statistical hypotheses.
Code a function for simulating $T$ observations from an AR(1) series
$ y_t = (1-\rho)\mu + \rho y_{t-1} + \varepsilon_t \sigma $ where $\varepsilon_t$ is N(0,1).
That is, generate $y_1,y_2,...,y_T$ from this formula.
To make also the starting value ($y_0$) random, simulate $T+100$ data points, but then discard the first 100 values of $y_t$.
Generate a single "sample" using (T,ρ,σ,μ) = (500,0,3,2)
. Calculate and report the average (mean) and the first 5 autocorrelations (hint: autocor()
) of this sample. Redo a 2nd time, but with ρ=0.75
.
function SimAR1(T,ρ,σ,μ)
y = fill(NaN, T + 100)
e = rand(Normal(0, 1), T + 100)
y[1] = 1
for i = 2:T + 100
y[i] = (1-ρ)μ + ρ*y[i-1] + e[i]*σ
end
return y[101:end]
end
y1 = SimAR1(500,0,3,2)
printmat("average from one sample with ρ=0", mean(y1))
printmat("autocorrelations with ρ=0")
printmat(1:5, autocor(y1)[2:6])
y2 = SimAR1(500,0.75,3,2)
printmat("average from one sample with ρ=0.75", mean(y2))
printmat("autocorrelations with ρ=0.75")
printmat(1:5, autocor(y2)[2:6])
Do a Monte Carlo simulation. Use the parameters (T,ρ,σ,μ) = (500,0,3,2)
.
Generate a sample with $T$ observations and calculate the average. Repeat $M=10,000$ times and store the estimated averages in a vector of length $M$. (The rest of the question uses the symbol $\mu_i$ to denote the average from sample $i$.)
What is average $\mu_i$ across the $M$ estimates? (That is, what is $\frac{1}{M}\sum\nolimits_{i=1}^{M}\mu_i$?) Report the result.
What is the standard deviation of $\mu_i$ across the $M$ estimates? Compare with the theoretical standard deviation (see below). Report the result.
Does the distribution of $\mu_i$ look normal? Plot a histogram and compare with the theoretical pdf (see below).
says that the sample average of an iid ("independently and identically distributed") data series is normally distributed with a mean equal to the true (population) mean $\mu$ and a standard deviation equal to $s=\sigma_y/\sqrt{T}$ where $\sigma_y$ is the standard deviation of $y$.
To compare with our simulation results, you could estimate $\sigma_y$ from a single simulation with very many observations (say 10'000).
M = 10_000
(T,ρ1,σ,μ) = (500,0,3,2)
μi1 = fill(NaN, M)
σi1 = fill(NaN, M)
# Monte Carlo simulation
for i = 1:M
y = SimAR1(T, ρ1, σ, μ)
μi1[i] = mean(y)
σi1[i] = std(y)
end
# Theoretical results
y1 = SimAR1(10_000,ρ1,σ,μ)
σy1 = std(y1)
s1 = σy1/sqrt(T)
printmat("Average across the simulations:", mean(μi1))
println("Std across the samples (with ρ=0) and in theory:")
printmat(["simulations", std(μi1)], ["theory", s1])
histogram(μi1,bins=0:0.1:4,normalize=true,legend=false,title="Histogram of 10000 averages with ρ=0")
plot!(μi1->pdf(Normal(μ, s1), μi1))
Redo task 2, but now use ρ=0.75
(the other parameters are unchanged).
M = 10_000
(T,ρ2,σ,μ) = (500,0.75,3,2)
μi2 = fill(NaN, M)
σi2 = fill(NaN, M)
# Monte Carlo simulation
for i = 1:M
y = SimAR1(T, ρ2, σ, μ)
μi2[i] = mean(y)
σi2[i] = std(y)
end
# Theoretical results
y2 = SimAR1(10_000,ρ2,σ,μ)
σy2 = std(y2)
s2 = σy2/sqrt(T)
printmat("Average across the simulations:", mean(μi2))
println("Std across the samples (with ρ=0.75) and in theory:")
printmat(["simulations", std(μi2)], ["theory", s2])
histogram(μi2,bins=0:0.1:4,normalize=true,legend=false,title="Histogram of 10000 averages with ρ=0.75")
plot!(μi2->pdf(Normal(μ, s2), μi2))
You decide to test the hypothesis that $\mu=2$. Your decision rule is
With this decision rule, you are clearly assuming that the theoretical result (definition of $s$) is correct.
Estimate both $\mu_i$ and $\sigma_y$ from each sample.
In what fraction of the $M$ simulation do you reject your hypothesis when $\rho=0$ and when $\rho=0.75$? For the other parameters, use (T,σ,μ) = (500,3,2)
(same as before).
(T,σ,μ) = (500,3,2)
rejection1 = 0
rejection2 = 0
# Count how many times we reject the hypothesis
for i = 1:M
# rejections for ρ = 0
s1 = σi1[i]/sqrt(T)
if abs((μi1[i] - 2)/s1) > 1.645
rejection1 += 1
end
# rejections for ρ = 0.75
s2 = σi2[i]/sqrt(T)
if abs((μi2[i] - 2)/s2) > 1.645
rejection2 += 1
end
end
println("Frequency of rejections:")
printmat(["with ρ=0 ", rejection1 / (M)], ["with ρ=0.75 ", rejection2 / (M)])