I’d like to model a dependent variable which a continuous value between 0 and 1 , see . It is not a count proportion, it’s the concentration of a drug needed to kill a disease.
I tried the following/my thoughts:
1. GLM with binomial family and logit link, i.e. logistic regression. The logit link to render my mean in the region makes sense, but the binomial random component does not, since my error is continuous and must be bouneded between 0 and 1.As you can see, the in-sample prediction on the train data looks quite different from the real label distribution. Also, there is a strange offset in my prediction.
2. Beta Regression also does not work, because I have 0 labels.
3. I tried different links, different regularizers , with intercept and without, but nothing seems to work.
Hence, I would be happy if someone could help me with the following questions:
1. Does it make sense to have a *compound* model, i.e. one model that predicts if the response is 0 or >0, and in case of >0 another model that predicts the label using a beta regression?
2. Logistic Regressions seems to be the go-to approach if my labels are binary. However, they are not in my case. How can I determine a proper family for my problem?
3. Are GLMs not useful at all for my problem?