## Statistical inference – A basic question about a randomized test involving Type I error.

I have a fundamental question in the context of the verification of statistical assumptions, specifically randomized tests. Suposse that I have two actions (alternatives) on a certain unknown parameter $$theta in Theta$$: the null ($$H_0$$) and alternative hypotheses ($$H_1$$).

In this case, the sample space is $$(0,15) subset mathbb {R}$$. We know that the critical function is given by
$$phi (x) = P (reject , , H_0 , , | , , x , , observed)$$

I do not know exactly if this definition really implies a conditional probability. Suposse I have the following critical function

$$phi (x) = begin {cases} 0, quad x in (0,2) \ p, quad x in (2,10) \ 1, quad x in (10,15) \ end {cases}$$

I can understand why

$$P (reject , , H_0 , , , , H_0 , , is , , true) , = 0 times P (x in (0,2)) + p times P (x in (2,10)) + 1 times P (x in (10,15))$$

The right side looks a lot like a wait. But I can not understand.

## beginner – A statistical function that calculates the average duration of work in different months

It is a task of a Ruby course, to which I am currently enrolled:
Ruby Race – Page

This is precisely one of the tasks of the first week.

Next idea:
```[ {work: "element 1", date: "2017-04-26", time: 20}, {work: "item 2", date: "2017-04-27", time: 27}, ...```

You must write a function that calculates the daily work time for the different months.
Means: The average daily working time in April, average time in May, … in June. Etc.

A data structure, on which to work, was given. Even the expected result for this data structure: `{"2017-04" => 40, "2017-05" => 14}`.

I was able to write a function that passed all unit tests.
It's here:

``````#! / usr / bin / ruby
{work: "element 1", date: "2017-04-26", time: 20},
{work: "item 2", date: "2017-04-27", time: 27},
{work: "item 3", date: "2017-04-27", time: 33},
{work: "item 4", date: "2017-05-05", time: 20},
{work: "element 5", date: "2017-05-06", time: 12},
{work: "item 6", date: "2017-05-14", time: 10},
]# Expected result: {"2017-04" => 40, "2017-05" => 14}
days_aggregate = {}

if days_aggregate.key? (key)
other
arr = []

jours_agréger[key] = arr
end
end

months_aggregate = {}

days_aggregate.each do | key, task |
parts = key.split ("-")
k = "# {parties[0]}-#{rooms[1]} "

if months_aggregate.key? (K)
mois_agréger[k][0]    = month_to aggregate[k][0]    + task[0]
mois_agréger[k][1]    = month_to aggregate[k][1]    +1
other
arr = []
arr[1] = 1

mois_agréger[k] = arr
end
end

avg_hours_month = {}

months_aggregate.each do | key, data |
avg_hours_month[key] = data[0] / The data[1]
end

avg_hours_month
end

puts work_per_month (tasks) # Returns {"2017-04" => 40, "2017-05" => 14}
``````

Please, take into account that I started programming Ruby just a week ago.

It works and he passed the tests. But I am aware that it is clumsy.

Is there a more elegant way to solve the task described?

Without having this sequence of loops?

## technology – storing statistical data in the database

I have several similar threads (primitive robots) in the application. The robots perform actions until they reach the goal. I want to record each step of the thread in the database for later analysis.

Each bot has its unique identifier and I will also provide them with timestamps. So, the disc looks like this

``````record
bot_id
area_id
x_position
Y_POSITION
group_id
timestamp
``````

However, I do not want to block the thread while it is trying to store the data in the database. I've therefore decided to send messages to a message broker, and then use a separate process to read these messages and store them in the database.

However, I can not decide which technology to use. As far as I know, I can use any for my case – I can use any broker, I can use any database (relational or not). I really like to experiment with different tools.

So, what could you recommend that would be effective for my case?

## beginner – Statistical methods using PHP (mean, co / variance, standard deviation, bias, correlation)

### Functionality

This class contains a list of basic statistical functions such as mean, variance, standard deviation, asymmetry, etc., and everything works normally. It allows these functions to be performed on the stock charts for the last 30 days.

It extends a parent class that estimates future prices very close to an action list using a 60-second lagged API graph.

Would you be so nice and could you possibly consider it to determine performance, efficiency, math or coding best practices?

### Code

``````// Configuration class for the path and other constants
require_once __DIR__. "/ConstEQ.php";

/ **
* This is an extended class with basic statistical method
* Stock
* /
ST extended class EQ implements ConstEQ
{

/ **
*
* @back a number equal to the average of the values ​​of a table
* /
getMean public static function (\$ array)
{
if (count (\$ array) == 0) {
return ConstEQ :: NEAR_ZERO_NUMBER;
} other {
return array_sum (\$ array) / count (\$ array);
}
}

/ **
*
* @back a normalized number between 0 and 1
* /
Public static function getNormalize (\$ value, \$ min, \$ max)
{
if (\$ max - \$ min! = 0) {
\$ normalized = 2 * ((\$ value - \$ min) / (\$ max - \$ min)) - 1;
} other {
\$ normalized = 2 * ((\$ value - \$ min)) - 1;
}
return \$ normalized;
}

/ **
*
* @back a normalized number between 0.0 and 1 of any entry -inf to inf
* /
public static function getSigmoid (\$ t)
{
returns 1 / (1 + pow (M_EULER, - \$ t));
}

/ **
*
* @back a number equal to the square of the average value
* /
Public static function getMeanSquare (\$ x, \$ mean)
{return pow (\$ x - \$ mean, 2);}

/ **
*
* @return a number equal to the standard deviation of the values ​​of a table
* /
Public static function getStandardDeviation (\$ array)
{
if (count (\$ array) <2) {
return ConstEQ :: NEAR_ZERO_NUMBER;
} other {
return sqrt (array_sum (array_map ("ST :: getMeanSquare"), \$ array, array_fill (0, count (\$ array), (array_sum (\$ array) / count (\$ array))))) ((count (\$ array ) - 1));
}
}

/ **
*
* @back a number equal to the covariance of the values ​​of two arrays
* /
Public static function getCovariance (\$ valuesA, \$ valuesB)
{
// size both tables in the same way, if different sizes
\$ no_keys = min (number (\$ valuesA), number (\$ valuesB));
\$ valuesA = array_slice (\$ valuesA, 0, \$ no_keys);
\$ valuesB = array_slice (\$ valuesB, 0, \$ no_keys);

// if the size of the paintings is too small
if (\$ no_keys <2) {return ConstEQ :: NEAR_ZERO_NUMBER;}

// Use the library function if available
if (function_exists (& # 39; stats_covariance & # 39;)) {return stats_covariance (\$ valuesA, \$ valuesB);}

\$ meanA = array_sum (\$ valuesA) / \$ no_keys;
\$ meanB = array_sum (\$ valuesB) / \$ no_keys;

for (\$ pos = 0; \$ pos <\$ no_keys; \$ pos ++) {
\$ valueA = \$ valuesA[\$pos];
if (! is_numeric (\$ valueA)) {
trigger_error (# non-numeric value from array A to position & # 39 ;. \$ pos. & # 39; value = & # 39 ;. \$ valueA, E_USER_WARNING);
returns false;
}

\$ valueB = \$ valuesB[\$pos];
if (! is_numeric (\$ valueB)) {
trigger_error (Non-numeric value of array B at the position & # 39 ;. \$ pos., value = \$ .B value, E_USER_WARNING);
returns false;
}

\$ difA = \$ valueA - \$ meanA;
\$ difB = \$ valueB - \$ meanB;
\$ add + = (\$ difA * \$ difB);
}

return \$ add / \$ no_keys;
}

/ **
*
* @back a number equal to the asymmetry of the array values
* /
Public static function getSkewness (\$ values)
{
\$ numValues ​​= count (\$ values);
if (\$ numValues ​​== 0) {return 0.0;}

// Use the php_stats library function if available
if (function_exists (& # 39; stats_skew & # 39;)) {return stats_skew (\$ values);}

\$ mean = array_sum (\$ values) / floatval (\$ numValues);

foreach (\$ values ​​as \$ value) {
if (! is_numeric (\$ value)) {return false;}

\$ dif = \$ value - \$ average;
\$ add2 + = (\$ dif * \$ dif);
\$ add3 + = (\$ dif * \$ dif * \$ dif);

}

\$ variance = \$ add2 / floatval (\$ numValues);

if (\$ variance == 0) {return ConstEQ :: NEAR_ZERO_NUMBER;} else {return (\$ add3 / floatval (\$ numValues)) / pow (\$ variance, 3 / 2.0);
}

/ **
*
* @back a number equal to kurtosis of array values
* /
Public static function getKurtosis (\$ values)
{
\$ numValues ​​= count (\$ values);
if (\$ numValues ​​== 0) {return 0.0;}

// Use the php_stats library function if available
if (function_exists (& # 39; stats_kurtosis & # 39;)) {return stats_kurtosis (\$ values);}

\$ mean = array_sum (\$ values) / floatval (\$ numValues);

foreach (\$ values ​​as \$ value) {
if (! is_numeric (\$ value)) {return false;}
\$ dif = \$ value - \$ average;
\$ dif2 = \$ dif * \$ dif;
\$ add2 + = \$ dif2;
\$ add4 + = (\$ dif2 * \$ dif2);
}

\$ variance = \$ add2 / floatval (\$ numValues);
if (\$ variance == 0) {return ConstEQ :: NEAR_ZERO_NUMBER;} else {return (\$ add4 * \$ numValues) / (\$ add2 * \$ add2) - 3.0;}
}

/ **
*
* @back a number equal to the correlation of two arrays
* /
Public static function getCorrelation (\$ arr1, \$ arr2)
{
correlation \$ = 0;

\$ k = ST :: sumProductMeanDeviation (\$ arr1, \$ arr2);
\$ ssmd1 = ST :: sumSquareMeanDeviation (\$ arr1);
\$ ssmd2 = ST :: sumSquareMeanDeviation (\$ arr2);

\$ product = \$ ssmd1 * \$ ssmd2;

\$ res = sqrt (\$ product);
if (\$ res == 0) {return ConstEQ :: NEAR_ZERO_NUMBER;}
correlation \$ = \$ k / \$ res;

if (\$ correlation == 0) {return ConstEQ :: NEAR_ZERO_NUMBER;} else {return \$ correlation;}
}

/ **
*
* @return a number equal to the sum of the average deviation of the product of each array value
* /
Public static function sumProductMeanDeviation (\$ arr1, \$ arr2)
{
\$ sum = 0;
\$ num = count (\$ arr1);

for (\$ i = 0; \$ i <\$ num; \$ i ++) {\$ sum = \$ sum + ST :: productMeanDeviation (\$ arr1, \$ arr2, \$ i);}
return \$ sum;
}

/ **
*
* @back a number equal to the average product gap of each array value
* /
public static function productMeanDeviation (\$ arr1, \$ arr2, \$ item)
{return (ST :: meanDeviation (\$ arr1, \$ item) * * ST :: meanDeviation (\$ arr2, \$ item))}

/ **
*
* @back a number equal to the sum of the average square deviation of the values ​​of each table
* /
public static function sumSquareMeanDeviation (\$ arr)
{
\$ sum = 0;
\$ num = count (\$ arr);

for (\$ i = 0; \$ i <\$ num; \$ i ++) {\$ sum = \$ sum + ST :: squareMeanDeviation (\$ arr, \$ i);}
return \$ sum;
}

/ **
*
* @back a number equal to the square average of the deviation of the values ​​of each table
* /
Public static function squareMeanDeviation (\$ arr, \$ item)
{
return ST :: meanDeviation (\$ arr, \$ item) * ST :: meanDeviation (\$ arr, \$ item);
}

/ **
*
* @back a number equal to the sum of the average deviation of each array value
* /
Public static function sumMeanDeviation (\$ arr)
{
\$ sum = 0;
\$ num = count (\$ arr);

for (\$ i = 0; \$ i <\$ num; \$ i ++) {\$ sum = \$ sum + ST :: meanDeviation (\$ arr, \$ i);}
return \$ sum;
}

/ **
*
* @back a number equal to the average deviation of each array value
* /
public static function meanDeviation (\$ arr, \$ item)
{
\$ average = ST :: average (\$ arr); back \$ arr[\$item] - \$ average;
}

/ **
*
* @back a number equal to the average of the values ​​in the table
* /
average public static function (\$ arr)
{
\$ sum = ST :: sum (\$ arr);
\$ num = count (\$ arr); return \$ sum / \$ num;

/ **
*
* @return a number equal to the sum of a table
* /
static public service sum (\$ arr)
{return array_sum (\$ arr);}

/ **
*
* @return a table of coefficients for 7 levels of volatility
* /
Public static function getCoefParams (\$ Overall_market_coeff)
{
\$ daily_coef = 0.9 + (\$ Overall_market_coeff / 10);

\$ coefs = array (
ConstEQ :: LEVEL_VOLATILITY_COEF_1 * \$ daily_coef,
ConstEQ :: LEVEL_VOLATILITY_COEF_2 * \$ daily_coef,
ConstEQ :: LEVEL_VOLATILITY_COEF_3 * \$ daily_coef,
ConstEQ :: LEVEL_VOLATILITY_COEF_4 * \$ daily_coef,
ConstEQ :: LEVEL_VOLATILITY_COEF_5 * \$ daily_coef,
ConstEQ :: LEVEL_VOLATILITY_COEF_6 * \$ daily_coef,
ConstEQ :: LEVEL_VOLATILITY_COEF_7 * \$ daily_coef,
)

return \$ coefs;
}

/ **
* @return a real or false binary for the is_numeric test of a string
* /
public static function isNumber (\$ arr)
{
foreach (\$ arr in \$ b) {
if (! is_numeric (\$ b)) {
returns false;
}
}
return true;
}

}
``````

## I will analyze your traffic for a week and give you a complete statistical report for \$ 10

For 1 week, I will analyze your traffic from your website or blog,
and I will give you a video guide explaining the meaning of the statistics.

This service is ideal for any internet marketer wishing to try a traffic provider.
For the first time. I will tell you if your traffic is quality or robots.

It's great if you have a website or blog.
In the area that works best.

This is an excellent service for anyone who wants to know exactly
If the traffic came from.

<name of entry = "addon[1]"type =" checkbox "value =" 1 "class =" checkable "data-label =" Analyze your traffic for 2 weeks to \$ 15"data-color =" gray "/>

<name of entry = "addon[2]"type =" checkbox "value =" 1 "class =" checkable "data-label =" Analyze your traffic for 3 weeks to \$ 30"data-color =" gray "/>

<name of entry = "addon[3]"type =" checkbox "value =" 1 "class =" checkable "data-label =" Analyze your traffic for 1 month to \$ 50"data-color =" gray "/>

* includes the price of the service

### User Ratings

This service does not have evaluation – order and leave the first one!

\$tenIn stock

.

## st.statistics – Perform a statistical analysis on a dataset with many null responses

I'm currently trying to perform statistical analyzes on some data to see if there is a useful conclusion for a research project I'm working on. However, I have encountered a problem. There are a lot of nil responses in my dataset (because the person could not answer the question).

(This is the top block)

These data record the time it took respondents to answer the various questions I asked, but the problem is that respondents were not always able to answer the questions. I compare the times of people from category L to those of category R (last column)

Is there a generally accepted way for a method to include these zero responses in the results? I thought of penalizing them with a number such as twice the average of the other results, but that would ruin $$sigma$$ (this also affects $$mu$$).

## statistical physics – how to filter the measurement noise from a dataset

I need to find the best way to filter white noise from a dataset $$(x_0, y_0), …, (x_n, y_n) / n> 1000$$

$$x: hour$$

$$y: physics$$ $$quantity$$

The noise comes from the sensor that makes the measurements. We know that the noise follows the standard normal probability law (Gaussian distribution with and expectation equal to zero $$m = 0$$), but we have no information about the value of the variance ($$sigma$$ unknown).

So, to deal with this problem, I taught linear regression, but I do not know which model to use.

In fact, the measurements correspond to the thermograph of an electrical equipment, whether operational or standby (not operational but facing the heat of its environment)

note that measurements are not taken with a constant sampling period. for example the value $$x_3-x_2$$ differs from $$x_4-x_3$$

So my question is:
How to handle white noise?

ps: i'm sorry for my bad english

## How to design a multi-year statistical database?

I am designing a database for name statistics (how many people have received this name). The data includes names and numbers indicating how many men and women have received this name during a given period (mainly 15 years, but this varies). The data is rather simple, but I remain stuck in the schema.

Here are two options (very similar) that I consider:

1) Just a big table.
`(Name, Men, Women, Timeperiod)`
The time period would probably be split up to start and end the columns for easier querying. Depending on the primary key, I could have an auto-incremented identifier or just use a combination of name and start of period.

2) I will have names in a separate table (where they will form a primary key) and the other table will contain the actual statistics (and will therefore look like the number 1 table). I've read that having a table at a column was not a bad design, but I do not know if it makes sense or adds value.

The options I have excluded are:

1) Have a column for each period because I should eventually update the schema. It just seems like a terrible design.

2) Have separate tables for each period. Because the deadlines are not so short, I will not finish with so much

So, how would I recommend that I tackle this? Is there an approach that I have not considered? I know it's a simple thing and I should probably stop thinking too much and choose an approach. Nevertheless, I would like a second opinion first because I am quite new to databases.

## dnd 5e – How would you go about finding statistical periods known to be likely to be used?

So in 5th, there are a lot of variations in how a character has spells at his disposal. Going from casters prepared to known casters. I like to take things apart, understand how they work and compare.

• assuming a cast of 18

• assuming a level of 11

• assuming basic examples of comparison including these four classes. Cleric, Sorcerer, Palladin and Ranger.

Known spells will work as follows: you will learn a spell from each available location until you reach the highest level known, then start again with level 1 spells.

cleric
Known spells 15

-1th level spell 3 known locations 4

• 2nd level spell 3 known locations 3

• 3rd level of spells. 3 known locations 3

• 4th level of spells. 2 known locations 3

• 5th level of spells. 2 known locations 2

• 6th level: 2 known locations 1

Wizard. Known spells 12

• 1st 2 known spells. 4 locations

• 2nd. 2 known spells 3 slots

• 3rd. 2 known spells. 3 locations

• 4th. 2 known spells 3 slots

• 5th. 2 known spells 2 slots

• 6th 2 known spells 1 location

Tidy

Known spells 7

• 1st 3 known 4 locations

• 2nd. 2 known 3 locations

• 3rd. 2 known 3 locations

Known Spells 9

• 1st 3 known 4 locations

• 2nd. 3 known 3 locations

• 3rd. 3 known 3 locations

I'm trying to understand how to compare known spells to usable slots between classes What would be the best way to factor each known spell through the locations where it can be used. So you can find the average afterwards and compare classes.

something like (KS1 × TUS) / TKS where KS1 is a known first-level spell, TUS is. The total number of slots available and TKS corresponds to the total number of known episodes.

## 5th dnd – When is the feat of the Brawler Tavern better than a statistical shock against Str or Dex?

Both options for you are surprisingly close, statistically.

Mechanically, Tavern Brawler is statistically better if you fight enemies with an AC of 18 or higher, while ASI is better for a CA of 17 or less. However, the difference never exceeds 0.4 DPR.