Spark implementation of survival regression (AFTSurvivalRegression):
- Model: Accelerated Failure Time (AFT).
- Parametric: Using Weibull distribution.
- Optimization: Spark chooses AFT because it is easier to parallelize and views the problem as a convex optimization problem with L-BFGS being the method of choice as optimization method.
- R/SparkR users: When fitting AFTSurvivalRegressionModel without intercept on dataset with constant nonzero column, Spark MLlib outputs zero coefficients for constant nonzero columns. This behavior is different from R survival::survreg. (from Spark 2.0.2 documentation)
You should think of the outcome as the time until the occurrence of an event of interest occurs, such as occurrence of a disease, ...