AutoML in 15 Minutes. From Hypothesis to Production-Ready Model

This guide is a must-read for AI leaders and engineering managers seeking to strategically implement Automated Machine Learning (AutoML) within their organizations. AutoML offers numerous benefits that can address common challenges faced by these professionals, including boosting efficiency, scaling machine learning, and minimizing human error.

TL;DR

1. Dive Deeper into the World of AutoML Ready to explore the vast landscape of AutoML, discover the latest tools and libraries, and unlock the secrets of the most innovative solutions? Learn More here about the cutting-edge frameworks, libraries and tools that are revolutionizing the machine learning scene!

2. AutoML vs. Classical ML: The Game-Changer Curious about how AutoML boosts efficiency, and reduces manual labor? Discover the Difference in our in-depth comparison guide and see what you gain by switching to AutoML! 3. Hands-On with AutoML: Try It Yourself Want to get hands-on experience with AutoML? Dive into an example and dissect what happens under the hood. Experiment Now and witness the magic of automated machine learning!

Business Impact of AutoML

AutoML is revolutionizing how businesses approach AI by significantly reducing time-to-market for AI features, improving cost efficiency in AI development, and augmenting AI teams when hiring is a challenge.

Reducing Time-to-Market for AI Features

AutoML accelerates the development and deployment of AI models by automating tasks like hyperparameter tuning, feature selection, and model selection. This streamlined process allows businesses to quickly integrate AI solutions into their operations, enhancing decision-making and operational efficiency. By leveraging AutoML, companies can realize the value of AI faster, enabling them to respond more rapidly to market changes and customer needs.

Improving Cost Efficiency in AI Development

AutoML optimizes costs by reducing the need for large machine learning teams, making AI more accessible to smaller companies. It automates repetitive tasks, allowing data scientists to focus on strategic tasks rather than manual model tuning. This not only reduces hiring costs but also accelerates the development cycle, leading to significant cost savings and improved ROI.

Augmenting AI Teams

AutoML serves as a powerful tool to augment existing teams. It enables non-experts to build and deploy machine learning models, democratizing access to AI and ensuring that businesses can maintain a competitive edge even with limited resources.

Why AutoML?

AutoML is a subfield of AI and ML that focuses on developing tools and libraries to automate various aspects of the ML pipeline, enhancing usability and performance. As machine learning adoption grows, so does the demand for off-the-shelf solutions that require minimal expertise. AutoML aims to bridge this gap, making ML accessible to non-experts while optimizing results.

What motivates the use of Automated ML tools and code libraries? Building an effective ML pipeline requires significant expertise, and there is no single “correct” way to structure an ML analysis.

Each time a new problem or dataset is encountered, countless possible ML pipelines can be designed, each addressing key challenges differently:

✔ How should the data be cleaned and transformed?
✔ What features should be engineered and selected?
✔ Which algorithms, methods, or models should be used?
✔ What hyperparameters will yield optimal performance?

With so many elements and options to consider, the risk of errors is high. AutoML helps streamline this process, reducing complexity and minimizing mistakes while improving efficiency.

AutoML in Action – A Technical Perspective

AutoML: Enterprise vs Open Source

Among the many available AutoML solutions, it is important to distinguish between two key categories: enterprise and open-source.

Table 1. Examples of enterprise and open source AutoML solutions

Enterprise solutions are proprietary, often offering various pay-to-use business models, guaranteed minimum performance, and dedicated user support. However, a major drawback is the lack of transparency—the specifics of how the analysis is conducted are hidden behind a proprietary wall, making it difficult to validate or fine-tune the process.

Alternatively, there is a much larger variety of open-source options, developed with diverse goals, features, and user needs in mind. These solutions often offer more flexibility and transparency, allowing users to inspect and modify the underlying code. While open-source tools may not always come with the same level of dedicated support or guaranteed performance as enterprise solutions, they provide the opportunity for greater customization and experimentation, making them ideal for organizations with specific needs or those looking for cost-effective solutions. These solutions often come with different licenses, such as MIT, Apache, or GPL, offering varying levels of freedom in usage, modification, and distribution.

Figure 1. Different licences of open source tool and libraries

In the following sections I will focus on open source AutoML solutions.

AutoML: Tool vs Libraries

When it comes to AutoML, solutions can be broadly categorized into tools or libraries, and understanding the difference between the two can help in selecting the best solution for specific needs and goals.

AutoML tools are often no-code or low-code platforms with user-friendly graphical interfaces, designed for users with minimal technical expertise. These tools automate the entire pipeline, from data preprocessing to model deployment, making them ideal for non-technical users or teams looking for quick solutions without needing deep ML knowledge.
AutoML libraries, on the other hand, are typically aimed at technical users (e.g., data scientists and engineers) and provide more flexibility and control over the machine learning process. These libraries integrate into existing codebases and offer programmatic access to different stages of the ML pipeline, allowing users to fine-tune their models and algorithms to better suit specific use cases.

Understanding the distinction between tools and libraries will help you choose the right solution, whether you’re looking for a simple, accessible platform or a more customizable and robust solution for advanced use cases. Here’s a concise summary of the pros and cons of each, which can help decide what suites ones needs best:

Table 2. Pros and cons of AutoML tools and libraries

Whether to opt for tools or libraries depends on your specific needs:

Tools: Ideal for quick deployment and minimal technical expertise.
Libraries: Suitable for advanced customization and integration into existing codebases.

Table 3. AutoML tools and libraries

AutoML vs classical ML approach

AutoML automates various stages of the machine learning pipeline, from data preprocessing to model selection and hyperparameter tuning. But how does it work under the hood? Let’s break it down into key components and compare how AutoML simplifies what traditionally needs to be done in a classical ML approach.

1. Data Preprocessing & Feature Engineering

In a classical ML approach, data preprocessing requires careful manual intervention:

Data cleaning – Handling missing values, outliers, and duplicates.
Feature engineering – Manually identifying and creating new features based on domain knowledge.
Scaling & encoding – Manually selecting and applying techniques like normalization or one-hot encoding for categorical variables.

With AutoML frameworks, these tasks are automated:

✔ Automated feature selection – Automatically identifying the most relevant features based on the data and model performance.
✔ Data augmentation & transformation – Automatically handling missing values, encoding categorical variables, scaling features, and performing transformations (e.g., log transforms, normalization).
✔ Feature generation – Automatically creating new features through techniques like polynomial expansion or embeddings.

2. Model Selection & Architecture Search

In the classical ML approach, model selection requires manually evaluating different algorithms, which involves:

Choosing algorithms – Selecting from various models (e.g., Random Forest, SVM, Logistic Regression).
Tuning parameters – Testing multiple configurations manually.
Selecting the right model architecture – For deep learning, manually designing neural network architectures.

AutoML frameworks automate model selection, architecture search, and performance optimization:

✔ Traditional ML models – AutoML tools can automatically evaluate multiple models (e.g., Random Forest, XGBoost, SVM) and select the best performing one.
✔ Neural Architecture Search (NAS) – AutoML tools can automatically discover optimal deep learning architectures.
✔ Meta-learning – AutoML uses past experimentation to guide the model selection process, speeding up searches based on previous results.

3. Hyperparameter Optimization

In the classical ML approach, hyperparameter tuning is often performed manually or with basic grid/random search:

Grid search – Exhaustively testing all possible combinations of hyperparameters.
Random search – Randomly sampling hyperparameters to find the best configuration.

AutoML frameworks automate hyperparameter optimization, improving efficiency:

✔ Grid Search / Random Search – While basic, AutoML tools perform this automatically.
✔ Bayesian Optimization – More advanced methods using probabilistic models to search the hyperparameter space efficiently.
✔ Evolutionary Algorithms – Optimization using genetic algorithms to find the optimal hyperparameters.

4. Model Evaluation & Deployment

In the classical ML approach, model evaluation and deployment require manual setup:

Cross-validation – Splitting data into folds and validating performance manually.
Early stopping – Deciding when to stop training to prevent overfitting.
Deploying models – Setting up pipelines for model deployment into production systems.

AutoML frameworks automate these tasks, enabling rapid deployment:

✔ Cross-validation & early stopping – AutoML tools automatically perform cross-validation and implement early stopping to ensure robust, generalizable models.
✔ Model explainability – Many AutoML tools integrate interpretability methods like SHAP, LIME, or attention mechanisms to explain model decisions.
✔ Auto-deployment – Automatically generating deployment pipelines, including APIs, Docker containers, or optimized inference pipelines.

AutoML solutions can automate some, or all steps of the ML pipeline, depending on the complexity and scope of the tool or library being used.

Single-step AutoML solutions focus on automating one stage, such as hyperparameter tuning or data preprocessing.
Many solutions automate key steps like data preparation and model selection.
Full-pipeline AutoML solutions automate the entire ML workflow, saving significant time and effort.

Figure 2. AutoML tools and libraries supporting different stages of typical ML pipeline

AutoML libraries vary significantly based on the types of data they support. Most of the solutions analyzed primarily focus on tabular data, with time-series data being supported somewhat less frequently. Image and text data are even less commonly handled by these tools. However, there are a few libraries that support a wide range of data types. Specifically, only three AutoML libraries from the ones considered support all of the following data types: tabular, time-series, text, images, and multimodal data: Auto-Keras, Auto-Gluon, and FEDOT.

Use Case: From Hypothesis to Production in 15 Minutes?

Since AutoML is designed to provide fast, out-of-the-box solutions, I decided to test whether it’s possible to create a simple PoC in just 15 minutes. For demonstration purposes, I use the Car Damage Detection dataset from Kaggle, which contains images of both damaged and undamaged cars.

Figure 3. Examples of damaged and undamaged cars from Car Damage Detection dataset

I decided to use AutoGluon as my AutoML library because it covers most stages of the ML pipeline and supports a broad range of input data types.

In traditional machine learning, the next stage would require defining the problem, selecting a model, choosing training parameters, and fine-tuning them. However, since I’m using AutoML, I expect choosing the library to be the final analysis and decision needed before preparing the demonstration.

Let’s Dig into the Code – Car Damage Detection

In the Car Damage Detection dataset, images are split into ‘training’ and ‘validation’ sets, each containing ‘whole’ and ‘damaged’ cars. AutoGluon requires the data to be formatted as a Pandas DataFrame, linking each image path to its corresponding class label.

Once the data is loaded and formatted correctly, I can start training the classifier immediately with just two lines of code. For demonstration purposes I set a time limit to 10 minutes.

Under the hood, the data is first analyzed to define the problem.

Next, the model architecture, loss function, and validation metric best suited to my problem are automatically selected.

The training continues until it reaches the time limit set during the initialization of the MultiModalPredictor, achieving a 0.82 accuracy score on the validation set. The training time was short, and I didn’t expect high metrics in this case. If the results are unsatisfactory, AutoGluon offers two ways to address it: training for a longer period with the same parameters (suitable for non-ML experts) or customizing the model and training process for users with more domain knowledge.

For those who want more control over the modeling and training process, AutoGluon offers extensive customization flexibility. The documentation provides guidance on customizing a wide range of configuration parameters, such as model type, pretrained checkpoint version, learning rate, batch size, weight decay, and more.

AutoML: is it worth it?

AutoML has undoubtedly delivered real value by bringing efficiency and democratization to machine learning. However, its impact depends on the use case, expectations, and implementation. While it’s not just hype, AutoML isn’t a silver bullet for all ML challenges. For mission-critical, high-stakes, or highly specialized applications, human expertise remains irreplaceable.

Where AutoML Delivers Real Value 🚀

✔ Lowering the Entry Barrier – Enables non-experts to build ML models without deep data science knowledge.
✔ Speeding Up Development – Automates tasks like hyperparameter tuning, feature selection, and model selection.
✔ Optimizing Costs – Reduces the need for large ML teams in certain applications, making ML more accessible to smaller companies.
✔ Handling Repetitive Tasks – Useful for Auto-Feature Engineering, Auto-Training, and Auto-Tuning, freeing up time for data scientists.

Where AutoML Falls Short 🤔

❌ Limited Customization – For complex ML tasks, domain-specific expertise is still required.
❌ “Black Box” Models – Many AutoML tools lack transparency, making debugging and explainability difficult.
❌ Computational Cost – Running exhaustive searches for optimal models can be resource-intensive.
❌ Not Always Business-Ready – While it automates modeling, it doesn’t fully handle data preprocessing, domain adaptation, or deployment.

Choose AutoML When:

Speed is Key: You need to quickly develop and deploy models.
AI Expertise is Limited: Your team lacks extensive machine learning knowledge.
Use Cases are Standard: Common tasks like fraud detection or spam filtering are involved.

Choose Custom ML When:

High Customization is Required: You need to tailor models to specific, complex business needs.
Domain Expertise is Essential: Deep understanding of the industry or specific data is necessary.
Strict Compliance is Needed: Transparency and explainability are crucial for regulatory compliance.

What’s Next?

As AutoML continues to evolve, it will play an increasingly vital role in democratizing machine learning and enhancing business efficiency. Stay ahead of the curve by leveraging AutoML for standard tasks and reserving custom ML development for your most complex and critical applications.

References

[1] AutoML.org

[2] A Brief Introduction to Automated Machine Learning (AutoML): https://www.youtube.com/watch?v=IjX0phz3LLE (accessed 30.01.2025)

[3] A list of open-source and commercial Automated Machine Learning tools: https://github.com/askery/automl-list?tab=readme-ov-file (accessed 30.01.2025)

[4] AutoGluon documentation: https://auto.gluon.ai/stable/index.html (accessed 31.01.2025)

Table of contents