architecture – Predicting the cost of change

Ah, the ever-elusive cost of change! In fact, it's one of those things that, in retrospect, seem too obvious, but in the beginning, no one could have speculated at all. Well, if there ever was a thorny problem in software development, predict the cost of change might be that!

Books sometimes like to talk impractically, such as what happens in black holes, unresolved math problems, and so on. This is not because something is impractical that it does not affect you in any way. If you want a slightly simplified explanation of why you can not measure the cost of change in advance, support me …

The software is composed of blocks and these blocks can (and are) modified. Changing these blocks takes time and, as time is expensive and irrevocable (you do not recover your time), the cost of change is strongly related to the cost of time and c & # 39; not constant through time too. The same change, when done at two different times in time, can have a considerably different cost, ie the cost of the change. change with time, even for the same changes.

That is to say that the cost of change VS for a software building block X is a function of the block and time, that is C = C (x, t). Due to a limited cost range and a (virtually) infinite number of problems and an infinite number of manners divide a problem into small blocks of solution Xit is obvious that collisions exist. In simple terms, two totally different changes in two totally different systems can take exactly the same time, or can take different amounts of time but end up coding the exact same amount of … cost units!

Cost units??? Yes, as if it was not complicated enough, the cost is not always quantified in currency. Often the cost itself is not significant, it is the report costs, that is to say, given two changes, how many more or less) will change 1 take compared to change 2? As you can see, dozens of ways to tackle the problem and still no solution, we make it more and more complex.

In short, the reason why it is very impractical to make assumptions and detailed analyzes in advance about the cost of changes for a Not implemented yet the system is that you can not detail anything until you know what it is that you are describing (so this is a thorny problem).

There are things that you can act as if you kept a good, if not a great, trace of what you do as you go. A fairly good organization of the software in identifiable modules and a detailed documentation of what amended when and How long it took to change, can you give what every analyst likes: The data. As your "project" progresses, you will get a first hand picture of what is robust, what is going well, what needs to be reduced, and so on.

Imagine a graph starting at t = 0 and describing, for all the components of your software (classes, if you wish), the necessary modifications and their implementation time during the different periods. Robust and reliable components will stand out easily and you can almost predict which components will have to change due to their "accumulated inertia" (or their heaviness, in terms of adaptability), as they always take too long to change. a change is due. In reflecting on this, however, do not forget that to make this table, you had to first solve your problem … which brings us back to the simple answer to your question:

The software solves the problems and we (and the books) can not predict the cost of the changes made by your own solution (software) to your own problem, because we have not yet solved your problem, much less in your own way !

Mathematics – Predicting the time of the projectile's flight

I'm trying to predict mathematically how long the projectile will fly until it reaches
target. I'm trying to implement this formula:

t = [V * sin(α) + √(V * sin(α))² + 2 * g * h)] / g

where I know V (velocity of the projectile), α (launch angle), h (launch height) The problem is that it gives strange results. I'm not sure what's wrong (with ue4):

float Time = Velocity * FMath :: Sin (AngleDeg) + FMath :: Sqrt (FMath :: Square (Velocity * FMath :: Sin (AngleDeg)) + (2.f * Severity * Height));
Time / = gravity;

For example, these test entries:
Severity = 9.8;
Height = 10;
Speed ​​= 10;
AngleDeg = 10;
indicates 0.97 seconds but the result should be 1.61 according to this calculator: https://www.omnicalculator.com/physics/projectile-motion … When I try different inputs, the result is always wrong. .. I also have the formula of this website so not sure what's wrong. Thank you for any advice

deep learning – predicting time series with variable input length

My thesis focuses on the prediction of cancer in mice. I collected data from 35 mice. I measure the volume of tumors every day after the onset of cancer until the death of the mouse. The time of death varies between 50 and 72 days and so I have a series of 35 episodes of different length.

I have to predict the evolution of tumor volume over time. I want to use regression but I do not know how to adapt a model to 35 time series of different lengths.

note that I can not change the length of my dataset because I am losing important information about cancer behavior.

a suggestion for my problem?

java – Predicting the position of the robot mars

The description

A robot lands on Mars, which happens to be a Cartesian grid; assuming we give these instructions to the robot, such as LFFFRFFFRRFFF, where "L" is a "90 degree turn", "R" is a "90 degree turn" and "F" is "advance of a space, enter the control code of the robot so that it ends at the appropriate and correct destination, and include unit tests.

Here is an example of an output with the command "FF":

[0, 2]

Code

Robot class {
private int x;
private int y;
private int currentDirection;

Robot () {
this (0, 0);
}

Robot (int x, int y) {
this.x = x;
this.y = y;
currentDirection = 0;
}

public void move (String move) {
for (char ch: move.toCharArray ()) {
if (ch == R) currentDirection + = 1;
if (ch == L) currentDirection - = 1;

currentDirection = currentDirection% 4;

if (ch! = "F") continues;

System.out.println (currentDirection);
if (currentDirection == 0) {
y + = 1;
} if (currentDirection == 1) {
x + = 1;
} if (currentDirection == 2) {
y - = 1;
} if (currentDirection == 3) {
x - = 1;
}
}
}

public void reset () {
x = 0;
y = 0;
currentDirection = 0;
}

public String position () {
returns x + ":" + y;
}
}

Main class {
public static void main (String[] args) {
Robot robot = new robot ();
robot.move ("FF");
System.out.println (robot.position ()); // 0,2
robot.reset ();
System.out.println (robot.position ()); // 0,0
robot.move ("FFRF");
System.out.println (robot.position ()); // 1,2
robot.reset ();
robot.move ("FFRRRFF");
System.out.println (robot.position ()); // -2,2
}
}

How can I make this code more object oriented?

Python – Predicting Credit Card Defaults

I have this code to predict credit card defects and it works perfectly, but I check here if anyone can make it more efficient or more compact. It's quite long though, but please, support me.

# Import the necessary libraries.
import numpy as np
import pandas as pd
import matplotlib.pyplot as a plt


# Extract the data from the .csv file.
file = C: \ Users \ alhut \ OneDrive \ Desktop \ default credit card \ creditcard_default.csv & # 39;
dataset = pd.read_csv (file, index_col = & # 39; ID & # 39;)

dataset.rename (columns = lambda x: x.lower (), inplace = True)


# Preparing data using dummy functions (hot-encoding). The basic values ​​are: other_education, female, not_married.
dataset['grad_school'] = (dataset['education'] == 1) .astype (& # 39; int)
dataset['universty'] = (dataset['education'] == 2) .astype (& # 39; int & # 39;)
dataset['high_school'] = (dataset['education'] == 3) .astype (& # 39; int & # 39;)
dataset.drop (& # 39; education & # 39 ;, axis = 1, inplace = True) # Removes the education column because all information is available in the functions above.

dataset['male'] = (dataset['sex'] == 1) .astype (& # 39; int)
dataset.drop ('sex', axis = 1, inplace = True)

dataset['married'] = (dataset['marriage'] == 1) .astype (& # 39; int)
dataset.drop (& # 39; marriage & # 39 ;, axis = 1, inplace = True)

# In the case of payment functions, <= 0 means that the payment has not been delayed.
pay_features = ['pay_0','pay_2','pay_3','pay_4','pay_5','pay_6']
for p in pay_features:
dataset.loc[dataset[p]<=0, p] = 0

dataset.rename(columns={'default_payment_next_month':'default'}, inplace=True) # Renames last column for convenience.


# Importing objects from sklearn to help with the predictions.
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score, confusion_matrix, precision_recall_curve
from sklearn.preprocessing import RobustScaler


# Scaling and fitting the x and y variables and creating the x and y test and train variables.
target_name = 'default'
X = dataset.drop('default', axis=1)
robust_scaler = RobustScaler()
X = robust_scaler.fit_transform(X)
y = dataset[target_name]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.15, random_state=123, stratify=y)


# Creating a confusion matrix.
def CMatrix(CM, labels=['pay','default']):
    df = pd.DataFrame(data=CM, index=labels, columns=labels)
    df.index.name='TRUE'
    df.columns.name='PREDICTION'
    df.loc['TOTAL'] = df.sum()
    df['Total'] = df.sum(axis=1)
    return df



# Preparing a pandas DataFrame to analyze models (evaluation metrics).
metrics = pd.DataFrame(index=['accuracy', 'precision', 'recall'],
                        columns=['NULL','LogisticReg','ClassTree','NaiveBayes'])


#######################
# The Null Model.
y_pred_test = np.repeat(y_train.value_counts().idxmax(), y_test.size)
metrics.loc['accuracy','NULL'] = accuracy_score(y_pred=y_pred_test, y_true=y_test)
metrics.loc['precision','NULL'] = precision_score(y_pred=y_pred_test, y_true=y_test)
metrics.loc['recall','NULL'] = recall_score(y_pred=y_pred_test, y_true=y_test)

CM = confusion_matrix(y_pred=y_pred_test, y_true=y_test)
CMatrix(CM)


# A. Logistic Regression.
# 1- Import the estimator object (model).
from sklearn.linear_model import LogisticRegression

# 2- Create an instance of the estimator.
logistic_regression = LogisticRegression(n_jobs=-1, random_state=15)

# 3- Use the trainning data to train the estimator.
logistic_regression.fit(X_train, y_train)

# 4- Evaluate the model.
y_pred_test = logistic_regression.predict(X_test)
metrics.loc['accuracy','LogisticReg'] = accuracy_score(y_pred=y_pred_test, y_true=y_test)
metrics.loc['precision','LogisticReg'] = precision_score(y_pred=y_pred_test, y_true=y_test)
metrics.loc['recall','LogisticReg'] = recall_score(y_pred=y_pred_test, y_true=y_test)

# Confusion Matrix.
CM = confusion_matrix(y_pred=y_pred_test, y_true=y_test)
CMatrix(CM)


# B. Classification Trees.
# 1- Import the estimator object (model).
from sklearn.tree import DecisionTreeClassifier

# 2- Create an instance of the estimator.
class_tree = DecisionTreeClassifier(min_samples_split=30, min_samples_leaf=10, random_state=10)

# 3- Use the trainning data to train the estimator.
class_tree.fit(X_train, y_train)

# 4- Evaluate the model.
y_pred_test = class_tree.predict(X_test)
metrics.loc['accuracy','ClassTree'] = accuracy_score(y_pred=y_pred_test, y_true=y_test)
metrics.loc['precision','ClassTree'] = precision_score(y_pred=y_pred_test, y_true=y_test)
metrics.loc['recall','ClassTree'] = recall_score(y_pred=y_pred_test, y_true=y_test)

# Confusion Matrix.
CM = confusion_matrix(y_pred=y_pred_test, y_true=y_test)
CMatrix(CM)


# C. Naive Bayes Classifier
# 1- Import the estimator object (model).
from sklearn.naive_bayes import GaussianNB

# 2- Create an instance of the estimator.
NBC = GaussianNB()

# 3- Use the trainning data to train the estimator.
NBC.fit(X_train, y_train)

# 4- Evaluate the model.
y_pred_test = NBC.predict(X_test)
metrics.loc['accuracy','NaiveBayes'] = accuracy_score(y_pred=y_pred_test, y_true=y_test)
metrics.loc['precision','NaiveBayes'] = precision_score(y_pred=y_pred_test, y_true=y_test)
metrics.loc['recall','NaiveBayes'] = recall_score(y_pred=y_pred_test, y_true=y_test)

# Confusion Matrix.
CM = confusion_matrix(y_pred=y_pred_test, y_true=y_test)
CMatrix(CM)


#######################
# Comparing the models with percentages.
100*metrics


# Comparing the models with a bar graph.
fig, ax = plt.subplots(figsize=(8,5))
metrics.plot(kind='barh', ax=ax)
ax.grid();


# Adjusting the precision and recall values for the logistic regression model and the Naive Bayes Classifier model.
precision_nb, recall_nb, thresholds_nb = precision_recall_curve(y_true=y_test, probas_pred=NBC.predict_proba(X_test)[:,1])
precision_lr, recall_lr, thresholds_lr = precision_recall_curve(y_true=y_test, probas_pred=logistic_regression.predict_proba(X_test)[:,1])


# Plotting the new values for the logistic regression model and the Naive Bayes Classifier model.
fig, ax = plt.subplots(figsize=(8,5))
ax.plot(precision_nb, recall_nb, label='NaiveBayes')
ax.plot(precision_lr, recall_lr, label='LogisticReg')
ax.set_xlabel('Precision')
ax.set_ylabel('Recall')
ax.set_title('Precision-Recall Curve')
ax.hlines(y=0.5, xmin=0, xmax=1, color='r')
ax.legend()
ax.grid();


# Creating a confusion matrix for modified Logistic Regression Classifier.
fig, ax = plt.subplots(figsize=(8,5))
ax.plot(thresholds_lr, precision_lr[1:], label='Precision')
ax.plot(thresholds_lr, recall_lr[1:], label='Recall')
ax.set_xlabel('Classification Threshold')
ax.set_ylabel('Precision, Recall')
ax.set_title('Logistic Regression Classifier: Precision-Recall')
ax.hlines(y=0.6, xmin=0, xmax=1, color='r')
ax.legend()
ax.grid();


# Adjusting the threshold to 0.2.
y_pred_proba = logistic_regression.predict_proba(X_test)[:,1]
y_pred_test = (y_pred_proba >= 0.2) .type (& # 39; int & # 39;)

# Confusion matrix.
CM = confusion_matrix (y_pred = y_pred_test, y_true = y_test)
print ('Recall:', str (100 * reminder_score (y_pred = y_pred_test, y_true = y_test)) +% (% & # 39;%)
print ('Precision:', str (100 * precision_score (y_pred = y_pred_test, y_true = y_test)) +% (% & # 39;)
CMatrix (CM)


#########################
# Define a function to make individual predictions.
def make_ind_prediction (new_data):
data = new_data.values.reshape (1, -1)
data = robust_scaler.transform (data)
prob = logistic_regression.predict_proba (data)[0][1]
    
    
    
    if prob> = 0.2:
return & # 39; Will default. & # 39;
other:
back & # 39; Will pay. & # 39;


# Make individual predictions using given data.
from imported collections OrderedDict
new_customer = OrderedDict ([('limit_bal', 4000),('age', 50 ),('bill_amt1', 500),
                            ('bill_amt2', 35509 ),('bill_amt3', 689 ),('bill_amt4', 0 ),
                            ('bill_amt5', 0 ),('bill_amt6', 0 ), ('pay_amt1', 0 ),('pay_amt2', 35509 ),
                            ('pay_amt3', 0 ),('pay_amt4', 0 ),('pay_amt5', 0 ), ('pay_amt6', 0 ),
                            ('male', 1 ),('grad_school', 0 ),('university', 1 ), ('high_school', 0 ),
                            ('married', 1 ),('pay_0', -1 ),('pay_2', -1 ),('pay_3', -1 ),
                            ('pay_4', 0),('pay_5', -1), ('pay_6', 0)])

new_customer = pd.Series (new_customer)
make_ind_prediction (new_customer)

optimization – What is the correct term / theory for predicting binary variables based on their continuous value?

I am working with a linear programming problem in which we have about 3500 binary variables. As a general rule, it takes about 72 hours for IBM Cplex to achieve a goal with a gap of about 15-20% with the best link. In the solution, we obtain about 85 to 90 binary files worth 1, the others being null. The value of the goal is about 20 to 30 million. I have created an algorithm in which I predict (setting their values) 35 binary (with the value 1) and leaving those that remain solved through the Cplex. This reduced the time needed to reach the same goal at around 24 hours (the best limit is slightly compromised). I've tried this approach with the other (same type of problems) and it worked with them too. I call this approach "probabilistic prediction", but I do not know what is the standard term that designates it in mathematics?

Here is the algorithm:

Let y = ContinousObjective (AllBinariesSet);
WriteValuesOfTheContinousSolution ();
Let count = 0;
Let managedbinaries = EmptySet;
while (count < 35 )
{
Let maxBinary =AllBinariesSet.ExceptWith(processedJourneys).Max();//Having Maximum Value between 0 & 1 (usually lesser than 0.6)            
processedJourneys.Add(maxBinary);
maxBinary=1;
Let z = y;
y = ContinousObjective(AllBinariesSet);
if (z > y + 50000)
{
// Reset maxBinary
maxBinary.LowerBound = 0;
maxBinary.UpperBound = 1;
y = z;
}
other
{
WriteValuesOfTheContinousSolution ();
account = account + 1;
}
}

In my opinion, this works because the matrix of solutions is very rare and there are too many good solutions.

python 3.x – Tensorflow model for predicting dice game decisions

For my first ML project, I modeled a dice game called Ten Thousand, or Farkle, depending on who you ask, as an extremely over-designed solution for a computer player. You can find the complete game (with a very good player complete with about 15 lines of logic) here.

As a brief explanation of the game, the 1 and the 5 are still valid, scoring the dice. Other numbers must be involved in 1) three or more of a type, 2) a following or 3) three pairs to score dice. I would like my model to predict which dice should be kept for a given pitch. Until now, it's great to understand that the 1 and 5 are guardians, but I can not improve that with so many antics until now.

I am looking for advice on how to improve the forecast to include dice scoring dice other than 1 and 5. I have tried to increase the proportion of these situations overall of learning, by increasing and decreasing the complexity of the model, both in terms of structure and with various methods of regularization, and even using convolutional layers.

Specifically, is the RMSProp optimizer and sigmoid-cross-entropy loss appropriate here?

Setting up things.

import tensorflow as a tf
import numpy as np
import pandas as pd

from import collections
from itertools, import combinations_with_replacement as combinations
itertools import permutations as perms

import matplotlib.pyplot as plt
from the import layers tensorflow.keras, Model
of tensorflow.data import Dataset


tf.enable_eager_execution ()
tfe = tf.contrib.eager

I get my data just by making them, making sure to do many examples of special logging situations.

def make_some_features (numbers, clip):
features = set ()
combinations = (combo combo combo (numbers, 6))
for i, comb enumerate (combinations):
if i% clip == 0: # Keep a reasonable size
for permanent (combs):
features.add (perm)
return features

# I have browsed these pages and we are expecting a proportion of similar or better examples.
features = make_some_features (list (range (1, 7)), 3)

# Make a few strokes of three pairs.
special_features = set ()
for _ in the range (1000):
half = [np.random.randint(1, 6) for _ in range(3)]
    half + = half
for permanent (half):
special_features.add (perm)

# We can not do as much as with straight lines.
for permanent in the permanent ([1, 2, 3, 4, 5, 6]):
special_features.add (perm)

all_features = [np.array(feature) for feature in special_features]
all_features + = [np.array(feature) for feature in features]
all_labels = [choose_dice(feature) for feature in special_features]
all_labels + = [choose_dice(feature) for feature in features]

I put all this in a database of pandas to facilitate the scrambling and partitioning of training, validation and testing.

def create_dataset (features, labels):
dice = pd.Series (characteristics)
labels = pd.Series (labels)
dataset = pd.DataFrame ({-dice: dice,
& # 39; labels & # 39 ;: labels})
return the dataset


all_dice = create_dataset (all_features, all_labels)
all_dice = all_dice.reindex (np.random.permutation (all_dice.index))

train_dice = all_dice.head (10000)
val_dice = train_dice.tail (5000)
test_dice = all_dice.tail (1936)

I one_hot encode the features and resize the tensor of the label.

def pre_process_features (dice: pd.DataFrame) -> list:
rollers = []
    to roll dice['dice']:
roll = np.array (roll)
roll - = 1
roll = tf.one_hot (roll, depth = 6, axis = -1)
rolls.append (roll)
return rollers


def pre_process_labels (dice: pd.DataFrame) -> list:
labels = [tf.reshape(tf.convert_to_tensor([label]), (6, 1)) for the diced label['labels']]return labels

Model, optimization, loss and gradient functions.

model = tf.keras.Sequential ([
    layers.Dense(6, activation=tf.nn.relu, input_shape=(6, 6),
                 kernel_regularizer=tf.keras.regularizers.l2(regularization_rate)),
    layers.Dense(64, activation=tf.nn.relu,
                 kernel_regularizer=tf.keras.regularizers.l2(regularization_rate)),
    layers.Dense(128, activation=tf.nn.relu,
                 kernel_regularizer=tf.keras.regularizers.l2(regularization_rate)),
    # layers.Dense(256, activation=tf.nn.relu,
    #              kernel_regularizer=tf.keras.regularizers.l2(regularization_rate)),
    layers.Dense(32, activation=tf.nn.relu,
                 kernel_regularizer=tf.keras.regularizers.l2(regularization_rate)),
    layers.Dense(1)])

optimizer = tf.train.RMSPropOptimizer (learning_rate = learning_rate)
global_step = tf.train.get_or_create_global_step ()

def loss (model, features, labels):
logits = model (features)
if logits.shape == (1, 6, 1):
logits = tf.squeeze (logits, [0])
standard_loss = tf.losses.sigmoid_cross_entropy (logits = logits, multi_class_labels = labels)
returns standard_loss


def grad (model, features, labels):
with tf.GradientTape () as a tape:
loss_value = loss (template, features, labels)
return loss_value, tape.gradient (loss_value, model.trainable_variables)

Training loop.

train_loss = []
train_accuracy = []
val_loss = []
val_accuracy = []

val_features, val_labels = iter (val_features), iter (val_labels)
val_feature, val_label = next (val_features), next (val_labels)
for the time in the beach (num_epochs):
epoch_loss_ave = tfe.metrics.Mean ("loss")
epoch_val_loss_average = tfe.metrics.Mean ("loss")
epoch_accuracy = tfe.metrics.Accuracy (& # 39; acc)
epoch_val_accuracy = tfe.metrics.Accuracy (& # 39; acc)

for the entity, label in zip (train_features, train_labels):
feature = tf.convert_to_tensor (feature.numpy (). reshape (1, 6, 6))

loss_value, grads = grad (model, feature, label)
optimizer.apply_gradients (zip (grads, model.variables), global_step)
epoch_loss_ave (loss_value)

guessed_label = decode_label (model (feature))
epoch_accuracy (guessed_label, decode_label (label))

val_loss_value = loss (model, val_feature, val_label)
epoch_val_loss_average (val_loss_value)

val_guess_label = decode_label (model (val_feature))
epoch_val_accuracy (val_guess_label, decode_label (val_label))

train_loss.append (epoch_loss_ave.result ())
train_accuracy.append (epoch_accuracy.result ())

val_loss.append (epoch_val_loss_average.result ())
val_accuracy.append ((epoch_val_accuracy.result ()))

if epoch% 20 == 0:
print (f Epoch {epoch} Loss: {epoch_loss_ave.result ()} Accuracy: {epoch_accuracy.result ()} & # 39;)
print (Validation loss: {epoch_val_loss_average.result ()} Accuracy: {epoch_val_accuracy.result ()} & # 39;)

Tests and few predictions for random game entries.

test_results = []
test_accuracy = tfe.metrics.Accuracy (& # 39; acc)

for the feature, label in zip (test_features, test_labels):

guessed_label = decode_label (model (feature))
test_accuracy (guessed_label, decode_label (label))
print (test precision: {test_accuracy.result ()} & # 39;)

for _ in the range (25):
roll = np.array ([np.random.randint(0, 5) for _ in range(6)])
turn = tf.one_hot (roll, depth = 6, dtype = np.int32)
roll + = 1
answer = Choose_dice (roll)
print (Roll: {roll} & # 39;)
print (Dice should be kept: {answer} & # 39;)
turn = tf.convert_to_tensor (turn.numpy (). reshape ((1, 6, 6)), dtype = tf.float32)
predictions = model.predict (round)
tf.nn.softmax (predictions)
predicted label = []
    for prediction in predictions[0]:
if prediction[0] > 0 .:
predicted label.append (1.)
other:
predicted label.append (0.)
print (Dice should be kept: {preded_label} & # 39;)

The complete code can be found here.