I’m confused about the steps to go from a simple linear regression to logistic regression.

If we have a dataset consisting of a column of x values and a column of y values (the values we want to predict), then we can run a simple linear regression to get a predictive model such that y_pred = B1x + c, where B1 is the coefficient for our inputs, x, and c is the intercept of the line.

Now let’s say y is categorical such that it is either 1 or 0. 1 if event occurs, and 0 of it does not. Many of the videos I’ve watched tell me to think of y_pred as a probability even though it’s not. If we think of it as a probability it makes no sense because, assuming the regression line has positive slope, for very large values of x we get y_pred values which can go to infinity. Also for small values of x, depending on the regression line, may have negative predicted values. Both of those are not a way to think about probabilities so we throw linear regression out and try something else.

As a next step, they say to start calling y_pred “z” and think of a function that can take in the z values we got from our linear regression output and map them to values between 0 and 1. A sigmoid does this well and is described as P = 1/(1+e^(-z)). If we now make a new column of data, P, using all our z values, then we now have a column of data that tells us the probability of a 1 or 0 based on the independent variable. But to fit the data better we think of P = (1/e^(-z)) as ln(p/(p-1)) = z, as they are equivalent. Then we perform something called maximum likelihood estimation to get a new coefficient for x and c so the curve fits the data better.

I think my confusion is this: why do we care about log odds? Why not just pass z through the sigmoid, fit it, and use that to get probabilities for varying x values? Am I just so lost that I’m going in circles with misunderstandings?