## pr.probability – Finding a function that convert distribution A into distribution B

Variable $$x$$ is from distribution $$p(x)$$.

And variable $$y$$ is from distribution $$q(y)$$.

The objective is to find function $$f$$ which is $$f(y)=x$$.

If there a are set of samples $$x$$ and a sample $$y$$, how can I find $$f$$ that can convert $$y$$ to the domain of $$x$$?

Or without having explicit function $$f$$, can I find the result of $$f(y)$$ which could be translated as mapping a value of $$y$$ into distribution $$p(x)$$?

## machine learning – How to chose the probability distribution and its parameters in maximum likelihood estimation

I’m reading the book “Mathematics for Machine Learning”, it’s a free book that you can find here. So I’m reading section 8.3 of the book which explains the `maximum likelihood estimation` (or MLE).
This is my understanding of how MLE works in machine learning:

Say we have a dataset of vectors $$(x_1, x_2, …, x_n)$$, we also have corresponding labels $$(y_1, y_2, …, y_n)$$ which are real numbers, finally we have a model with parameters $$theta$$. Now MLE is a way to find the best parameters $$theta$$ for the model, so that model would map $$x_n$$ to $$hat{y}_n$$ and $$hat{y}_n$$ is as close to $$y_n$$ as possible.

For each $$x_n$$ and $$y_n$$ we have a probability distribution $$p(y_n|x_n,theta)$$. Basically it estimates how likely our model with parameters $$theta$$ will output $$y_n$$ when we feed it $$x_n$$ (and the bigger the probability the better).

We then take a logarithm of each of the estimated probabilities and sum up all the logarithms, like this:
$$sum_{n=1}^Nlog{p(y_n|x_n,theta)}$$

The bigger this sum the better our model with parameters $$theta$$ explains the data, so we have to maximize the sum.

What I don’t understand is how do we chose the probability distribution $$p(y_n|x_n,theta)$$ and its parameters? In the book there is an Example 8.4, where they chose the probability distribution to be Gaussian distribution with zero mean, $$epsilon_n sim mathcal{N}(0,,sigma^{2})$$. They then assume that the linear model $$x_n^Ttheta$$ is used for prediction, so:
$$p(y_n|x_n,theta) = mathcal{N}(y_n|x_n^Ttheta,,sigma^{2})$$
and I don’t understand why they replaced zero mean with $$x_n^Ttheta$$, also where do we get covariance $$sigma^{2}$$?

So this is my question, how do we chose the probability distribution and it’s parameters? In the example above the distribution is Gaussian but it could be any other distribution from those that exist and different distributions have different types and numbers of parameters. Also as I understood each $$x_n$$ and $$y_n$$ have its own probability distribution $$p(y_n|x_n,theta)$$ which even more complicates the problem.

I would really appreciate your help. Also note that I’m just learning the math for machine learning and not very skilled. If you need any additional info please ask in the comments.

Thanks!

## Weight distribution within a box

I have a large box with weight 1920, and uneven distribution of weight within.
If I were to divide the large box along horizontal line L, the top section would weigh 600 and the bottom section would weigh 1320.
If I were to divide the large box along vertical line K, the left section would weight 1560, and the right would weigh 360.

The large box is made up of 256 smaller boxes that weigh between 0 and 15 inclusive. But always an integer weight.
Every row and column of the large box can only include one box of a given integer weight.

If I were to divide the box along both lines L and K, what would the weight of the top left portion be?

Is this question even possible? I’ve been struggling with analyzing an array of numbers which essentially functions this way, but if I can’t find a way to solve this equation I’m barking up the wrong tree.

The uneven distribution messes with me. In my example I know the answer is supposed to be 408, but I can’t figure out how I could even get there.

## reference request – Is there a distribution f(z) that returns non-zero if and only if z is a positive integer

I am looking for distribution $$f(z)$$ with $$zinmathbb{C}$$ that is non-zero only when z is a positive integer, namely:
$$f(z)=left{begin{array}{c} 0& znotinmathbb{N}\ text{non-zero}& zinmathbb{N} end{array}right. .$$
Does such distribution exist? If so, could you give me some examples?

## Flipping a distribution on the Y-axis i.e. getting the value for -x instead of x

the title says it all:

I want to flip a distribution on the y axis (that is evaluating it for the negative of the input compared to the standard case).

I want to use this to create a composite or spliced distribution with a left fat tail, but the distribution of my choice (Pareto Distribution) is right facing.

Thank you for the help.

## pr.probability – To prove a relation involving a probability distribution

I’m reading a book and have encountered a relation which seems to me to be impossible to prove, I would like to be sure if this is the case. The author gives a probability function as
$$p_n = frac{e^{-c_1 n – c_2/n}}{Z},$$
where $$c_1$$ and $$c_2$$ are constants and Z is a normalization factor and $$n geq 3$$. Then by defining $$alpha$$ as $$alpha = sum_{n = 3}^{infty} p_n (n – 6)^2$$, the author claims one can show that

$$begin{equation} alpha + p_6 = 1, quad quad quad 0.66 < p_6 < 1, end{equation}$$
$$begin{equation} alpha p_6^2 = 1 / 2 pi, quad quad quad 0.34 < p_6 < 0.66. end{equation}$$

How is such a thing possible in the first place as these relations are not even dependent on $$c_1$$ and $$c_2$$?

## Reference for entropy of a binomial distribution

In Wikipedia, the entropy of binomial distribution, Binomial(n,p), is written as
$$frac{1}{2} ln (2 pi e n p (1-p)) + O(1/n)$$. Can anyone name a reference what is exactly $$O(1/n)$$?

## st.statistics – Distribution of unbiased estimator of covariance matrix with missing values

Initial setup

Assuming $$X_1, …, X_n in mathbb{R}^m$$ are iid, sampled from $$mathcal{N}(mu, V)$$, one can define the estimators for the sample mean $$hat{mu} = frac{1}{n} := X^T 1_n$$, and sample covariance $$hat{V} := frac{1}{n-1} (X – 1_nhat{u}^T)^T(X – 1_nhat{u}^T)$$, where $$X := (X_i)_{i=1}^{n} in mathbb{R}^{n times m}$$. $$hat{u}$$ and $$hat{V}$$ are independent,

$$hat{u} sim mathcal{N}(mu, frac{1}{n}V), hspace{0.2cm} hat{V} sim mathcal{W}(frac{1}{n-1}V, n – 1)$$

where $$mathcal{W}(cdot, cdot)$$ denotes a Wishart distribution.

Missing values scenario

The first setup only applies when one either doesn’t have missing values, or can recover them with high probability from some matrix completion procedure. If neither case applies, one can use some other unbiased estimator, as constructed in $$(1.3)$$ of High-dimensional covariance matrix estimation with missing observations.

Specifically, they model missing values by defining $$Y_1, …, Y_n in mathbb{R}^m$$
as $$Y_{ij} := delta_{ij}X_{ij}$$, where the $$(delta_{ij})_{1 leq i leq n, 1 leq j le m}$$ are Bernoulli random variables with parameter $$delta$$ which are independent of $$X$$. The $$delta_{ij}$$ can be interpreted as masking, where $$delta_{ij} := 0$$ implies that $$X_{ij}$$ cannot be observed, and hence by convention $$Y_{i,j} := 0$$ (authors assume $$0$$ mean vectors).

Define $$Sigma_{n}^{delta} := frac{1}{n} sum_{i=1}^{n}Y_i otimes Y_i$$, where $$otimes$$ is the usual tensor product, assuming $$0$$-mean. The authors show that an unbiased estimator of the covariance matrix can be constructed as such (eq. $$(1.4)$$ in the paper)

$$tilde{Sigma_{n}} := (delta^{-1} – delta^{-2})text{diag}(Sigma_{n}^{delta}) + delta^{-2}Sigma_{n}^{delta}$$

Actual question

The authors of the aforementioned paper do not assume that the $$X_i$$‘s are Gaussian, as in the initial setup – with that in mind, and maybe some additional restrictions, could something be said about the distribution of of the new unbiased estimator $$tilde{Sigma_n}$$? Is anyone aware of some results on this?

## [FREE] AnonArchive – Anonymous File Storage and Distribution | NewProxyLists

(AnonArchive)
Anonymous, Private, and Secure File Storage and Distribution made easy.

AnonArchive is a free, privacy/anonymity-oriented file sharing, storing and distribution platform, with a completely unique website, and form of security, completely controlled by the end user, we are a service to beat!

We allow anyone, to store any files they wish completely anonymously, with optional encryption (AES-256). We do not store your archive keys in plain-text (SHA256 HASHING) or have access to any kind of “access_logs” that could point us toward the controllers/managers of specific archives, if you lose your keys (encryption & archive keys), then you will either lose access to your archive, or the encrypted file.

Privacy, begins with anonymity.

Some of our key features:

• Optional (User-Controlled) File Encryption with AES-256bit (Military Grade).
• No Login Needed. 100% controlled by Public and Private Keys.
• Delete archive and, or any or all associated files.

A random variable $$X sim N(mu, sigma^2)$$. One day, we draw a sample $$X_1, X_2, …, X_n$$ from this population. This gives an estimate of the actual distribution of $$X$$.
What I would like to calculate next is a one tailed 95% confidence interval for the variance of the sampling distribution of $$X$$ with samples of size $$7$$. However, these samples are not drawn independently. I know that $$(n−1)frac{S^2}{σ^2}∼χ^{2}(n)$$. Is this still valid for non-independently drawn samples?
Supposing this is the case, with $$n = 7$$, we get $$6frac{S^2}{sigma^2} sim χ^{2}(7)$$. So my question boils down to what value of $$sigma$$ to use. Should I be using an unbiased estimate, or a biased estimate? Or are there other estimates which are practically more useful in such a case?