## st.statistics – Infer the policy in a Markov decision process (statistical question)

I deal with the following situation: in a Markov decision process, each state has a finite number of possible actions, and each action leads to a discrete probability distribution on the successor states. We will define a policy that chooses one of these actions for each state (that is, the policy is stationary and can not contain probabilistic combinations of different actions). We would like to choose this policy so that a friendly agent knowing the underlying PDM, including actions for each state, and able to observe the transition processes in the induced Markov chain, can guess what action we are taking in each state probability. My question is this: given a set (probably over) of observations generated by the induced Markov chain, how should the friend agent guess the action undertaken in each state? In other words, given a finite number of discrete PDF files and a set of observations, how do we select the PDF file whose data most likely originates? I realize that this is probably not a difficult question but that I have no experience in the field.

## stats – Using statistical analysis and tests on user data

I have a master's degree in HCI. As a graduate student, I had to take statistics courses at the university. In these courses, we were familiar with tests such as T-Tests and we had to run them on participants' datasets (for example, datasets showing user performance on prototypes) to compare different prototypes.

My question is: are these tests used in concrete projects that you have experienced so far in your career?

## Statistical inference – A basic question about a randomized test involving Type I error.

I have a fundamental question in the context of the verification of statistical assumptions, specifically randomized tests. Suposse that I have two actions (alternatives) on a certain unknown parameter $$theta in Theta$$: the null ($$H_0$$) and alternative hypotheses ($$H_1$$).

In this case, the sample space is $$(0,15) subset mathbb {R}$$. We know that the critical function is given by
$$phi (x) = P (reject , , H_0 , , | , , x , , observed)$$

I do not know exactly if this definition really implies a conditional probability. Suposse I have the following critical function

$$phi (x) = begin {cases} 0, quad x in (0,2) \ p, quad x in (2,10) \ 1, quad x in (10,15) \ end {cases}$$

I can understand why

$$P (reject , , H_0 , , , , H_0 , , is , , true) , = 0 times P (x in (0,2)) + p times P (x in (2,10)) + 1 times P (x in (10,15))$$

The right side looks a lot like a wait. But I can not understand.

## beginner – A statistical function that calculates the average duration of work in different months

It is a task of a Ruby course, to which I am currently enrolled:
Ruby Race – Page

This is precisely one of the tasks of the first week.

Next idea:
```[ {work: "element 1", date: "2017-04-26", time: 20}, {work: "item 2", date: "2017-04-27", time: 27}, ...```

You must write a function that calculates the daily work time for the different months.
Means: The average daily working time in April, average time in May, … in June. Etc.

A data structure, on which to work, was given. Even the expected result for this data structure: `{"2017-04" => 40, "2017-05" => 14}`.

I was able to write a function that passed all unit tests.
It's here:

``````#! / usr / bin / ruby
{work: "element 1", date: "2017-04-26", time: 20},
{work: "item 2", date: "2017-04-27", time: 27},
{work: "item 3", date: "2017-04-27", time: 33},
{work: "item 4", date: "2017-05-05", time: 20},
{work: "element 5", date: "2017-05-06", time: 12},
{work: "item 6", date: "2017-05-14", time: 10},
]# Expected result: {"2017-04" => 40, "2017-05" => 14}
days_aggregate = {}

if days_aggregate.key? (key)
other
arr = []

jours_agréger[key] = arr
end
end

months_aggregate = {}

days_aggregate.each do | key, task |
parts = key.split ("-")
k = "# {parties[0]}-#{rooms[1]} "

if months_aggregate.key? (K)
mois_agréger[k][0]    = month_to aggregate[k][0]    + task[0]
mois_agréger[k][1]    = month_to aggregate[k][1]    +1
other
arr = []
arr[1] = 1

mois_agréger[k] = arr
end
end

avg_hours_month = {}

months_aggregate.each do | key, data |
avg_hours_month[key] = data[0] / The data[1]
end

avg_hours_month
end

puts work_per_month (tasks) # Returns {"2017-04" => 40, "2017-05" => 14}
``````

Please, take into account that I started programming Ruby just a week ago.

It works and he passed the tests. But I am aware that it is clumsy.

Is there a more elegant way to solve the task described?

Without having this sequence of loops?

## technology – storing statistical data in the database

I have several similar threads (primitive robots) in the application. The robots perform actions until they reach the goal. I want to record each step of the thread in the database for later analysis.

Each bot has its unique identifier and I will also provide them with timestamps. So, the disc looks like this

``````record
bot_id
area_id
x_position
Y_POSITION
group_id
timestamp
``````

However, I do not want to block the thread while it is trying to store the data in the database. I've therefore decided to send messages to a message broker, and then use a separate process to read these messages and store them in the database.

However, I can not decide which technology to use. As far as I know, I can use any for my case – I can use any broker, I can use any database (relational or not). I really like to experiment with different tools.

So, what could you recommend that would be effective for my case?

## beginner – Statistical methods using PHP (mean, co / variance, standard deviation, bias, correlation)

### Functionality

This class contains a list of basic statistical functions such as mean, variance, standard deviation, asymmetry, etc., and everything works normally. It allows these functions to be performed on the stock charts for the last 30 days.

It extends a parent class that estimates future prices very close to an action list using a 60-second lagged API graph.

Would you be so nice and could you possibly consider it to determine performance, efficiency, math or coding best practices?

### Code

``````// Configuration class for the path and other constants
require_once __DIR__. "/ConstEQ.php";

/ **
* This is an extended class with basic statistical method
* Stock
* /
ST extended class EQ implements ConstEQ
{

/ **
*
* @back a number equal to the average of the values ​​of a table
* /
getMean public static function (\$ array)
{
if (count (\$ array) == 0) {
return ConstEQ :: NEAR_ZERO_NUMBER;
} other {
return array_sum (\$ array) / count (\$ array);
}
}

/ **
*
* @back a normalized number between 0 and 1
* /
Public static function getNormalize (\$ value, \$ min, \$ max)
{
if (\$ max - \$ min! = 0) {
\$ normalized = 2 * ((\$ value - \$ min) / (\$ max - \$ min)) - 1;
} other {
\$ normalized = 2 * ((\$ value - \$ min)) - 1;
}
return \$ normalized;
}

/ **
*
* @back a normalized number between 0.0 and 1 of any entry -inf to inf
* /
public static function getSigmoid (\$ t)
{
returns 1 / (1 + pow (M_EULER, - \$ t));
}

/ **
*
* @back a number equal to the square of the average value
* /
Public static function getMeanSquare (\$ x, \$ mean)
{return pow (\$ x - \$ mean, 2);}

/ **
*
* @return a number equal to the standard deviation of the values ​​of a table
* /
Public static function getStandardDeviation (\$ array)
{
if (count (\$ array) <2) {
return ConstEQ :: NEAR_ZERO_NUMBER;
} other {
return sqrt (array_sum (array_map ("ST :: getMeanSquare"), \$ array, array_fill (0, count (\$ array), (array_sum (\$ array) / count (\$ array))))) ((count (\$ array ) - 1));
}
}

/ **
*
* @back a number equal to the covariance of the values ​​of two arrays
* /
Public static function getCovariance (\$ valuesA, \$ valuesB)
{
// size both tables in the same way, if different sizes
\$ no_keys = min (number (\$ valuesA), number (\$ valuesB));
\$ valuesA = array_slice (\$ valuesA, 0, \$ no_keys);
\$ valuesB = array_slice (\$ valuesB, 0, \$ no_keys);

// if the size of the paintings is too small
if (\$ no_keys <2) {return ConstEQ :: NEAR_ZERO_NUMBER;}

// Use the library function if available
if (function_exists (& # 39; stats_covariance & # 39;)) {return stats_covariance (\$ valuesA, \$ valuesB);}

\$ meanA = array_sum (\$ valuesA) / \$ no_keys;
\$ meanB = array_sum (\$ valuesB) / \$ no_keys;

for (\$ pos = 0; \$ pos <\$ no_keys; \$ pos ++) {
\$ valueA = \$ valuesA[\$pos];
if (! is_numeric (\$ valueA)) {
trigger_error (# non-numeric value from array A to position & # 39 ;. \$ pos. & # 39; value = & # 39 ;. \$ valueA, E_USER_WARNING);
returns false;
}

\$ valueB = \$ valuesB[\$pos];
if (! is_numeric (\$ valueB)) {
trigger_error (Non-numeric value of array B at the position & # 39 ;. \$ pos., value = \$ .B value, E_USER_WARNING);
returns false;
}

\$ difA = \$ valueA - \$ meanA;
\$ difB = \$ valueB - \$ meanB;
\$ add + = (\$ difA * \$ difB);
}

return \$ add / \$ no_keys;
}

/ **
*
* @back a number equal to the asymmetry of the array values
* /
Public static function getSkewness (\$ values)
{
\$ numValues ​​= count (\$ values);
if (\$ numValues ​​== 0) {return 0.0;}

// Use the php_stats library function if available
if (function_exists (& # 39; stats_skew & # 39;)) {return stats_skew (\$ values);}

\$ mean = array_sum (\$ values) / floatval (\$ numValues);

foreach (\$ values ​​as \$ value) {
if (! is_numeric (\$ value)) {return false;}

\$ dif = \$ value - \$ average;
\$ add2 + = (\$ dif * \$ dif);
\$ add3 + = (\$ dif * \$ dif * \$ dif);

}

\$ variance = \$ add2 / floatval (\$ numValues);

if (\$ variance == 0) {return ConstEQ :: NEAR_ZERO_NUMBER;} else {return (\$ add3 / floatval (\$ numValues)) / pow (\$ variance, 3 / 2.0);
}

/ **
*
* @back a number equal to kurtosis of array values
* /
Public static function getKurtosis (\$ values)
{
\$ numValues ​​= count (\$ values);
if (\$ numValues ​​== 0) {return 0.0;}

// Use the php_stats library function if available
if (function_exists (& # 39; stats_kurtosis & # 39;)) {return stats_kurtosis (\$ values);}

\$ mean = array_sum (\$ values) / floatval (\$ numValues);

foreach (\$ values ​​as \$ value) {
if (! is_numeric (\$ value)) {return false;}
\$ dif = \$ value - \$ average;
\$ dif2 = \$ dif * \$ dif;
\$ add2 + = \$ dif2;
\$ add4 + = (\$ dif2 * \$ dif2);
}

\$ variance = \$ add2 / floatval (\$ numValues);
if (\$ variance == 0) {return ConstEQ :: NEAR_ZERO_NUMBER;} else {return (\$ add4 * \$ numValues) / (\$ add2 * \$ add2) - 3.0;}
}

/ **
*
* @back a number equal to the correlation of two arrays
* /
Public static function getCorrelation (\$ arr1, \$ arr2)
{
correlation \$ = 0;

\$ k = ST :: sumProductMeanDeviation (\$ arr1, \$ arr2);
\$ ssmd1 = ST :: sumSquareMeanDeviation (\$ arr1);
\$ ssmd2 = ST :: sumSquareMeanDeviation (\$ arr2);

\$ product = \$ ssmd1 * \$ ssmd2;

\$ res = sqrt (\$ product);
if (\$ res == 0) {return ConstEQ :: NEAR_ZERO_NUMBER;}
correlation \$ = \$ k / \$ res;

if (\$ correlation == 0) {return ConstEQ :: NEAR_ZERO_NUMBER;} else {return \$ correlation;}
}

/ **
*
* @return a number equal to the sum of the average deviation of the product of each array value
* /
Public static function sumProductMeanDeviation (\$ arr1, \$ arr2)
{
\$ sum = 0;
\$ num = count (\$ arr1);

for (\$ i = 0; \$ i <\$ num; \$ i ++) {\$ sum = \$ sum + ST :: productMeanDeviation (\$ arr1, \$ arr2, \$ i);}
return \$ sum;
}

/ **
*
* @back a number equal to the average product gap of each array value
* /
public static function productMeanDeviation (\$ arr1, \$ arr2, \$ item)
{return (ST :: meanDeviation (\$ arr1, \$ item) * * ST :: meanDeviation (\$ arr2, \$ item))}

/ **
*
* @back a number equal to the sum of the average square deviation of the values ​​of each table
* /
public static function sumSquareMeanDeviation (\$ arr)
{
\$ sum = 0;
\$ num = count (\$ arr);

for (\$ i = 0; \$ i <\$ num; \$ i ++) {\$ sum = \$ sum + ST :: squareMeanDeviation (\$ arr, \$ i);}
return \$ sum;
}

/ **
*
* @back a number equal to the square average of the deviation of the values ​​of each table
* /
Public static function squareMeanDeviation (\$ arr, \$ item)
{
return ST :: meanDeviation (\$ arr, \$ item) * ST :: meanDeviation (\$ arr, \$ item);
}

/ **
*
* @back a number equal to the sum of the average deviation of each array value
* /
Public static function sumMeanDeviation (\$ arr)
{
\$ sum = 0;
\$ num = count (\$ arr);

for (\$ i = 0; \$ i <\$ num; \$ i ++) {\$ sum = \$ sum + ST :: meanDeviation (\$ arr, \$ i);}
return \$ sum;
}

/ **
*
* @back a number equal to the average deviation of each array value
* /
public static function meanDeviation (\$ arr, \$ item)
{
\$ average = ST :: average (\$ arr); back \$ arr[\$item] - \$ average;
}

/ **
*
* @back a number equal to the average of the values ​​in the table
* /
average public static function (\$ arr)
{
\$ sum = ST :: sum (\$ arr);
\$ num = count (\$ arr); return \$ sum / \$ num;

/ **
*
* @return a number equal to the sum of a table
* /
static public service sum (\$ arr)
{return array_sum (\$ arr);}

/ **
*
* @return a table of coefficients for 7 levels of volatility
* /
Public static function getCoefParams (\$ Overall_market_coeff)
{
\$ daily_coef = 0.9 + (\$ Overall_market_coeff / 10);

\$ coefs = array (
ConstEQ :: LEVEL_VOLATILITY_COEF_1 * \$ daily_coef,
ConstEQ :: LEVEL_VOLATILITY_COEF_2 * \$ daily_coef,
ConstEQ :: LEVEL_VOLATILITY_COEF_3 * \$ daily_coef,
ConstEQ :: LEVEL_VOLATILITY_COEF_4 * \$ daily_coef,
ConstEQ :: LEVEL_VOLATILITY_COEF_5 * \$ daily_coef,
ConstEQ :: LEVEL_VOLATILITY_COEF_6 * \$ daily_coef,
ConstEQ :: LEVEL_VOLATILITY_COEF_7 * \$ daily_coef,
)

return \$ coefs;
}

/ **
* @return a real or false binary for the is_numeric test of a string
* /
public static function isNumber (\$ arr)
{
foreach (\$ arr in \$ b) {
if (! is_numeric (\$ b)) {
returns false;
}
}
return true;
}

}
``````

## I will analyze your traffic for a week and give you a complete statistical report for \$ 10

For 1 week, I will analyze your traffic from your website or blog,
and I will give you a video guide explaining the meaning of the statistics.

This service is ideal for any internet marketer wishing to try a traffic provider.
For the first time. I will tell you if your traffic is quality or robots.

It's great if you have a website or blog.
In the area that works best.

This is an excellent service for anyone who wants to know exactly
If the traffic came from.

<name of entry = "addon[1]"type =" checkbox "value =" 1 "class =" checkable "data-label =" Analyze your traffic for 2 weeks to \$ 15"data-color =" gray "/>

<name of entry = "addon[2]"type =" checkbox "value =" 1 "class =" checkable "data-label =" Analyze your traffic for 3 weeks to \$ 30"data-color =" gray "/>

<name of entry = "addon[3]"type =" checkbox "value =" 1 "class =" checkable "data-label =" Analyze your traffic for 1 month to \$ 50"data-color =" gray "/>

* includes the price of the service

### User Ratings

This service does not have evaluation – order and leave the first one!

\$tenIn stock

.

## st.statistics – Perform a statistical analysis on a dataset with many null responses

I'm currently trying to perform statistical analyzes on some data to see if there is a useful conclusion for a research project I'm working on. However, I have encountered a problem. There are a lot of nil responses in my dataset (because the person could not answer the question).

(This is the top block)

These data record the time it took respondents to answer the various questions I asked, but the problem is that respondents were not always able to answer the questions. I compare the times of people from category L to those of category R (last column)

Is there a generally accepted way for a method to include these zero responses in the results? I thought of penalizing them with a number such as twice the average of the other results, but that would ruin $$sigma$$ (this also affects $$mu$$).

## statistical physics – how to filter the measurement noise from a dataset

I need to find the best way to filter white noise from a dataset $$(x_0, y_0), …, (x_n, y_n) / n> 1000$$

$$x: hour$$

$$y: physics$$ $$quantity$$

The noise comes from the sensor that makes the measurements. We know that the noise follows the standard normal probability law (Gaussian distribution with and expectation equal to zero $$m = 0$$), but we have no information about the value of the variance ($$sigma$$ unknown).

So, to deal with this problem, I taught linear regression, but I do not know which model to use.

In fact, the measurements correspond to the thermograph of an electrical equipment, whether operational or standby (not operational but facing the heat of its environment)

note that measurements are not taken with a constant sampling period. for example the value $$x_3-x_2$$ differs from $$x_4-x_3$$

So my question is:
How to handle white noise?

ps: i'm sorry for my bad english

## How to design a multi-year statistical database?

I am designing a database for name statistics (how many people have received this name). The data includes names and numbers indicating how many men and women have received this name during a given period (mainly 15 years, but this varies). The data is rather simple, but I remain stuck in the schema.

Here are two options (very similar) that I consider:

1) Just a big table.
`(Name, Men, Women, Timeperiod)`
The time period would probably be split up to start and end the columns for easier querying. Depending on the primary key, I could have an auto-incremented identifier or just use a combination of name and start of period.

2) I will have names in a separate table (where they will form a primary key) and the other table will contain the actual statistics (and will therefore look like the number 1 table). I've read that having a table at a column was not a bad design, but I do not know if it makes sense or adds value.

The options I have excluded are:

1) Have a column for each period because I should eventually update the schema. It just seems like a terrible design.

2) Have separate tables for each period. Because the deadlines are not so short, I will not finish with so much

So, how would I recommend that I tackle this? Is there an approach that I have not considered? I know it's a simple thing and I should probably stop thinking too much and choose an approach. Nevertheless, I would like a second opinion first because I am quite new to databases.