Skip to main content

Process Models - Updates!

How do I create a process model?

Written by Masaki Yamada
Updated over 2 years ago

We’re happy to launch the beta program of process modeling on Invert. With this release, we provide you a starting point for your experiment design.

You can now explore relationships between process parameters and key performance indicators through our custom ML prediction model. This capability is available for process metadata and timeseries endpoints across all of your runs.

Beta program

Please note that this release is a beta and we’re working rapidly to improve and expand the scope of the product offering. We’d love to hear your feedback and are looking forward to working with you and your teams closely to improve the product.

Release Notes

Compared to V1, we’ve added:

  • Two more models: Elastic Net Linear Regression and Gaussian Process Regressor

  • The ability to scale inputs using MinMax or Standard scalers

  • Rich visualization tools to help you get the most out of your data, via the models. In particular, the Gaussian Process Regressor comes with two powerful visualization tools: the Partial Dependence plot, and the Braid plot—more below!

Models

We now offer three different models for selection:

  • Gradient Boosted Trees Model - This model employs an ensemble of decision trees, where each one makes a prediction (“if x1 > 0.5 and x2 < 0.2, then predict y=2”). The trees are collectively tallied to render an aggregate decision. Gradient boosting refines this process: newer trees in the ensemble learn from the errors of previous ones to improve overall accuracy.

  • Elastic Net Linear Model - This approach enhances linear regression with regularization—a technique addressing the limitations of traditional linear regression, such as sensitivity to outliers. Regularization, including Ridge and Lasso methods, compels simpler, highly interpretable models with fewer and smaller coefficients. Our Elastic Net implementation finds the optimal balance of Ridge and Lasso regularization.

  • Gaussian Process Model - Imagine a scenario where multiple functions could fit your data, and you wish to understand the uncertainty in their predictions. The Gaussian Process Regressor (GPR) is your solution. It treats each input variable coming from a normal distribution and considers their covariance to select the best fitting functions. GPR provides the uncertainty because we can look at the distribution of what the multiple functions predict.

Scalers

Often, it’s helpful to re-scale the data before training and predicting any sort of model. We now include two scalers to help you rescale your data:

  • Standard Scaler - Subtracts the mean and divides by the standard deviation for each input variable.

  • MixMax Scaler. - Subtracts the minimum value and divides by the maximum for each input variable.

Note that you won’t have to manage differently working with the scalers—you’ll work at your original data scale.

Model accuracy

Model accuracy can be impacted by several factors including:

  1. Class Balance: It's crucial to have a balanced number of samples, if possible. Class imbalances arise when the data distribution is skewed to only one value. This can lead to a model that is biased. Consider picking runs that have a good representation of all possible ranges in your design space before training the model.

  2. Feature Selection: Choosing features that are relevant to the prediction task will significantly improve the model’s output. In the future, we will expand our offering to include better feature selection.

  3. Data Quality: In general, the model output is directly proportional to the data quality. Things to keep in mind are correct data types, no empty fields, and properly formatted values.

Results

Once your model is trained on the selected data, the individual model page provides an interactive results interface - where you can enter values will result in a real-time update in the top right corner and for relevant analysis tiles.

Note:

  • Elastic Net models will predict a specific value

  • Gaussian Process Regressor will provides an uncertainty

  • Gradient boosted trees will predict a binned range

Visualizations

  • Target Distribution A visualization of the training set and test set as they represent different parts of the target values, as well as, the live prediction.

    Available for Models: Gradient Boosted Tree

  • Prediction Distribution A real-time, probabilistic prediction distribution for the values entered into the Prediction Table.

    Available for Models: Gradient Boosted Tree

  • Confusion Matrix A tabular visualization of which groups lead to the most discrepancy between the actual and predicted target values.

  • Contour Value plot A visualization of how a target value changes over two inputs. The input variables can be toggled in the Prediction Table.

    Available for Models: Gradient Boosted Tree, Elastic Net, Gaussian Process

  • Feature Importances The magnitude and direction in how different input variables affect the final target value.

    Available for Models: Gradient Boosted Tree, Elastic Net

  • Regression Training and Performance Scatter How the trained model did with both the training and test set, as well as where the table prediction lies.

    Available for Models: Elastic Net, Gaussian Process

  • Training Summary and Hyperparameter Selection A summary of how the model training went as well as the hyperparameters selected.

    Available for Models: Elastic Net

  • Partial Dependence Plot The partial derivative of one input variable versus the other, i.e. how sensitive one is versus the other.

    Available for Models: Gaussian Process

  • Braid Plot The prediction and uncertainty of the Gaussian Process regression versus an input variable with the other input variables held constant.
    Available for Models: Gaussian Process

Did this answer your question?