Zum Inhalt springen

What is AutoML (Automated Machine Learning)?

AutoML - Automatisiertes maschinelles Lernen

AutoML enjoys a steadily increasing popularity (see Forbes). Not least driven by the numerous successes in practical analyses. In a world in which more and more devices produce data and are networked with each other, the data “produced” grows disproportionately. Therefore AutoML is of urgent necessity to gain knowledge from these rapidly increasing data on time. We assume that AutoML becomes even more critical in the coming years and that the analysis methods deliver even more precise and faster results. The field of activity of the data scientist will not disappear, but rather, his focus will shift to more specific or sophisticated analysis techniques. In short: AutoML saves time and money (you don’t need a larger team of data science and machine learning experts). It is also the easiest and cheapest way to enter the world of artificial intelligence or machine learning.

Features of AutoML

Features of AutoML

AISOMA – Features of AutoML

So what is AutoML?

Automated Machine Learning (AutoML) is the process of automating the end-to-end process of applying Machine Learning to real-world problems. In a typical machine learning application, experts must apply the appropriate methods of data preprocessing, feature engineering, feature extraction, and feature selection to make the data set used for machine learning. Following these preprocessing steps, practitioners must then perform the algorithm selection and hyper-parameter optimization to maximize the predictive performance of the final machine learning model. Since many of these steps often go beyond the capabilities of laypersons, AutoML has been developed as an artificial intelligence-based solution to the ever-growing challenge of applying machine learning. Automating the end-to-end process of applying machine learning offers the benefits of producing more straightforward solutions, faster creation of these solutions, and models that often outperform hand-designed models.

 

Comparison of Traditional Machine Learning Workflow and AutoML Workflow.

AutoML Workflow

AutoML Workflow (source)

Objectives of automation:

Automated machine learning can capture different phases of the machine learning process:

  • Automated data preparation and recording (from raw data and various formats)
  • Automated column type recognition, e.g., Boolean, discrete numeric, continuous numeric, or text recognition.
  • Automated column intent recognition; e.g., destination/label, numeric feature, categorical text feature, or free text feature.
  • Automated task recognition; e.g., binary classification, regression, clustering or ranking
  • Automated Feature Engineering
    • feature selection
    • feature extraction
    • Meta-learning and transfer learning
    • Detection and handling of distorted data or missing values
  • Automated model selection
  • Hyper-Parameter Optimization of the Learning Algorithm and Functionalization
  • Automated pipeline selection under time, storage and complexity constraints
  • Automated selection of valuation metrics / validation procedures
  • Automated problem checking
    • Detection of “leaky” spots
    • Detection of misconfigurations
  • Automated analysis of the achieved results
  • User interfaces and visualizations for automated machine learning

Below is a list of AutoML vendors:

  • AutoWEKA is an approach for simultaneously selecting a machine learning algorithm and its hyperparameters; combined with the WEKA package, it automatically provides good models for a variety of data sets.
  • Auto-sklearn is an extension of AutoWEKA with the Python library scikit-learn, a drop-in replacement for regular scikit-learn classifiers and regressors.
  • TPOT is a data science assistant that optimizes machine learning pipelines using genetic programming.
  • H2O AutoML provides automated model selection and compilation for the H2O machine learning and data analysis platform.
  • TransmogrifAI is an AutoML library that runs on Spark.
  • MLBoX is an AutoML library with three components: Preprocessing, optimization, and prediction.
  • Google Cloud AutoML is a machine learning product suite that allows even developers with little knowledge in this area to train high-quality models tailored to their specific needs.
  • Azure Automated ML based on a breakthrough from our Microsoft Research division. The approach combines ideas from collaborative filtering and Bayesian optimization to search an enormous space of possible machine learning pipelines intelligently and efficiently.

(Note: The list represents only a small selection of providers.)

AISOMA

See also: 8 Useful Industry 4.0 Slides