We’re happy to launch the beta program of process modeling on Invert. With this release, we provide you a starting point for your experiment design.
You can now explore relationships between process parameters and key performance indicators through our custom ML prediction model. This capability is available for process metadata and timeseries endpoints across all of your runs.
Beta program
Please note that this release is a beta and we’re working rapidly to improve and expand the scope of the product offering. We’d love to hear your feedback and are looking forward to working with you and your teams closely to improve the product.
Release Notes
Compared to V1, we’ve added:
Two more models: Elastic Net Linear Regression and Gaussian Process Regressor
Rich visualization tools to help you get the most out of your data, via the models. In particular, the Gaussian Process Regressor comes with two powerful visualization tools: the Partial Dependence plot, and the Braid plot—more below!
Models
We now offer three different models for selection:
Gradient Boosted Trees Model - This model employs an ensemble of decision trees, where each one makes a prediction (“if x1 > 0.5 and x2 < 0.2, then predict y=2”). The trees are collectively tallied to render an aggregate decision. Gradient boosting refines this process: newer trees in the ensemble learn from the errors of previous ones to improve overall accuracy.
Elastic Net Linear Model - This approach enhances linear regression with regularization—a technique addressing the limitations of traditional linear regression, such as sensitivity to outliers. Regularization, including Ridge and Lasso methods, compels simpler, highly interpretable models with fewer and smaller coefficients. Our Elastic Net implementation finds the optimal balance of Ridge and Lasso regularization.
Gaussian Process Model - Imagine a scenario where multiple functions could fit your data, and you wish to understand the uncertainty in their predictions. The Gaussian Process Regressor (GPR) is your solution. It treats each input variable coming from a normal distribution and considers their covariance to select the best fitting functions. GPR provides the uncertainty because we can look at the distribution of what the multiple functions predict.
Scalers
Often, it’s helpful to re-scale the data before training and predicting any sort of model. We now include two scalers to help you rescale your data:
Standard Scaler - Subtracts the mean and divides by the standard deviation for each input variable.
MixMax Scaler. - Subtracts the minimum value and divides by the maximum for each input variable.
Note that you won’t have to manage differently working with the scalers—you’ll work at your original data scale.
Model accuracy
Model accuracy can be impacted by several factors including:
Class Balance: It's crucial to have a balanced number of samples, if possible. Class imbalances arise when the data distribution is skewed to only one value. This can lead to a model that is biased. Consider picking runs that have a good representation of all possible ranges in your design space before training the model.
Feature Selection: Choosing features that are relevant to the prediction task will significantly improve the model’s output. In the future, we will expand our offering to include better feature selection.
Data Quality: In general, the model output is directly proportional to the data quality. Things to keep in mind are correct data types, no empty fields, and properly formatted values.
Results
Once your model is trained on the selected data, the individual model page provides an interactive results interface - where you can enter values will result in a real-time update in the top right corner and for relevant analysis tiles.
Note:
Elastic Net models will predict a specific value
Gaussian Process Regressor will provides an uncertainty
Gradient boosted trees will predict a binned range
Visualizations
Target Distribution A visualization of the training set and test set as they represent different parts of the target values, as well as, the live prediction.
Available for Models: Gradient Boosted Tree
Prediction Distribution A real-time, probabilistic prediction distribution for the values entered into the Prediction Table.
Available for Models: Gradient Boosted Tree
Confusion Matrix A tabular visualization of which groups lead to the most discrepancy between the actual and predicted target values.
Contour Value plot A visualization of how a target value changes over two inputs. The input variables can be toggled in the Prediction Table.
Available for Models: Gradient Boosted Tree, Elastic Net, Gaussian Process
Feature Importances The magnitude and direction in how different input variables affect the final target value.
Available for Models: Gradient Boosted Tree, Elastic Net
Regression Training and Performance Scatter How the trained model did with both the training and test set, as well as where the table prediction lies.
Available for Models: Elastic Net, Gaussian Process
Training Summary and Hyperparameter Selection A summary of how the model training went as well as the hyperparameters selected.
Available for Models: Elastic Net
Partial Dependence Plot The partial derivative of one input variable versus the other, i.e. how sensitive one is versus the other.
Available for Models: Gaussian Process
Braid Plot The prediction and uncertainty of the Gaussian Process regression versus an input variable with the other input variables held constant.
Available for Models: Gaussian Process
