**Initial setup**

Assuming $X_1, …, X_n in mathbb{R}^m$ are iid, sampled from $mathcal{N}(mu, V)$, one can define the estimators for the sample mean $hat{mu} = frac{1}{n} := X^T 1_n$, and sample covariance $hat{V} := frac{1}{n-1}

(X – 1_nhat{u}^T)^T(X – 1_nhat{u}^T)$, where $X := (X_i)_{i=1}^{n} in mathbb{R}^{n times m}$. $hat{u}$ and $hat{V}$ are independent,

$$hat{u} sim mathcal{N}(mu, frac{1}{n}V), hspace{0.2cm}

hat{V} sim mathcal{W}(frac{1}{n-1}V, n – 1)$$

where $mathcal{W}(cdot, cdot)$ denotes a Wishart distribution.

**Missing values scenario**

The first setup only applies when one either doesn’t have missing values, or can recover them with high probability from some matrix completion procedure. If neither case applies, one can use some other unbiased estimator, as constructed in $(1.3)$ of High-dimensional covariance matrix estimation with missing observations.

Specifically, they model missing values by defining $Y_1, …, Y_n in mathbb{R}^m$

as $Y_{ij} := delta_{ij}X_{ij}$, where the $(delta_{ij})_{1 leq i leq n, 1 leq j le m}$ are Bernoulli random variables with parameter $delta$ which are independent of $X$. The $delta_{ij}$ can be interpreted as *masking*, where $delta_{ij} := 0 $ implies that $X_{ij}$ cannot be observed, and hence by convention $Y_{i,j} := 0$ (authors assume $0$ mean vectors).

Define $Sigma_{n}^{delta} := frac{1}{n} sum_{i=1}^{n}Y_i otimes Y_i$, where $otimes$ is the usual tensor product, assuming $0$-mean. The authors show that an unbiased estimator of the covariance matrix can be constructed as such (eq. $(1.4)$ in the paper)

$$tilde{Sigma_{n}} := (delta^{-1} – delta^{-2})text{diag}(Sigma_{n}^{delta})

+ delta^{-2}Sigma_{n}^{delta}$$

**Actual question**

The authors of the aforementioned paper do not assume that the $X_i$‘s are Gaussian, as in the initial setup – with that in mind, and maybe some additional restrictions, could something be said about the distribution of of the new unbiased estimator $tilde{Sigma_n}$? Is anyone aware of some results on this?