Machine learning is a type of artificial intelligence (AI) focused on building applications that allows software applications to become more accurate at predicting outcomes without being explicitly programmed to do so. From driving cars to translating speech, machine learning is driving an explosion in the capabilities of artificial intelligence – helping software understand the messy and unpredictable real world.
But what is machine learning and what makes the current boom in machine learning possible?
What is Machine Learning?
Why do we need to care about machine learning?
“A breakthrough in machine learning would be worth ten Microsofts.”
— Bill Gates
Machine learning algorithms allow computers to program themselves. If programming is automation, then machine learning automates the automation process. Writing software is the bottleneck, we don’t have enough good developers. Let the data do the work rather than the people. ML is a way to make programming scalable.
- Traditional Programming: Data and programs are run on the computer to produce the output.
- Machine Learning: Data and output are run on the computer to create a program. This program can be used in traditional programming.
Machine Learning Algorithms is like farming or gardening. Seeds are the algorithms, nutrients are the data, the gardener is you, and plants are the programs.
Applications of Machine Learning
Sample applications of machine learning:
- Web search: Ranking page based on what you are most likely to click on.
- Computational biology: Rational design drugs in the computer-based on past experiments.
- Finance: Decide who to send what credit card offers to. Evaluation of risk on credit offers. How to decide where to invest money.
- E-commerce: Predicting customer churn. Whether or not a transaction is fraudulent.
- Space exploration: Space probes and radio astronomy.
- Robotics: How to handle uncertainty in new environments. Autonomous. Self-driving car.
- Information extraction: Ask questions over databases across the web.
- Social networks: Data on relationships and preferences. Machine learning to extract value from data.
- Debugging: Use in computer science problems like debugging. Labor-intensive process. Could suggest where the bug could be.
Key Elements of Machine Learning
There are tens of thousands of machine learning algorithms and hundreds of new algorithms are developed every year.
Every machine learning algorithm has three components:
- Representation: how to represent knowledge. Examples include decision trees, sets of rules, instances, graphical models, neural networks, support vector machines, model ensembles, and others.
- Evaluation: the way to evaluate candidate programs (hypotheses). Examples include accuracy, prediction and recall, squared error, likelihood, posterior probability, cost, margin, entropy k-L divergence, and others.
- Optimization: the way candidate programs are generated known as the search process. For example, combinatorial optimization, convex optimization, constrained optimization.
All machine learning algorithms are combinations of these three components. A framework for understanding all algorithms.
Types of Machine Learning Algorithms
There are four types of machine learning:
- Supervised learning: (also called inductive learning) Training data includes desired outputs. This is spam this is not, learning is supervised.
- Unsupervised learning: Training data does not include desired outputs. An example is clustering. It is hard to tell what is good learning and what is not.
- Semi-supervised learning: Training data includes a few desired outputs.
- Reinforcement learning: Rewards from a sequence of actions. AI types like it, it is the most ambitious type of learning.
1.What is Supervised Learning?
This approach basically teaches machines by example.
During training for supervised learning, systems are exposed to large amounts of labeled data, for example, images of handwritten figures annotated to indicate which number they correspond to. Given sufficient examples, a supervised-learning system would learn to recognize the clusters of pixels and shapes associated with each number and eventually be able to recognize handwritten numbers, able to reliably distinguish between the numbers 9 and 4 or 6 and 8.
However, training these systems typically requires huge amounts of labeled data, with some systems needing to be exposed to millions of examples to master a task.
As a result, the datasets used to train these systems can be vast, with Google’s Open Images Dataset having about nine million images, its labeled video repository YouTube-8M linking to seven million labeled videos and ImageNet, one of the early databases of this kind, having more than 14 million categorized images. The size of training datasets continues to grow, with Facebook recently announcing it had compiled 3.5 billion images publicly available on Instagram, using hashtags attached to each image as labels. Using one billion of these photos to train an image-recognition system yielded record levels of accuracy — of 85.4 percent — on ImageNet’s benchmark.
The laborious process of labeling the datasets used in training is often carried out using crowdworking services, such as Amazon Mechanical Turk, which provides access to a large pool of low-cost labor spread across the globe. For instance, ImageNet was put together over two years by nearly 50,000 people, mainly recruited through Amazon Mechanical Turk. However, Facebook’s approach of using publicly available data to train systems could provide an alternative way of training systems using billion-strong datasets without the overhead of manual labeling.
- How machine learning can be used to catch a hacker (TechRepublic)
- Scientists built this Raspberry Pi-powered, 3D-printed robot-lab to study flies
2.What is Unsupervised Learning?
In contrast, unsupervised learning tasks algorithms with identifying patterns in data, trying to spot similarities that split that data into categories.
An example might be Airbnb clustering together houses available to rent by neighborhood, or Google News grouping together stories on similar topics each day.
The algorithm isn’t designed to single out specific types of data, it simply looks for data that can be grouped by its similarities, or for anomalies that stand out.
3.What is Semi-supervised Learning?
The importance of huge sets of labeled data for training machine-learning systems may diminish over time, due to the rise of semi-supervised learning.
As the name suggests, the approach mixes supervised and unsupervised learning. The technique relies upon using a small amount of labeled data and a large amount of unlabelled data to train systems. The labeled data is used to partially train a machine-learning model, and then that partially trained model is used to label the unlabelled data, a process called pseudo-labeling. The model is then trained on the resulting mix of the labeled and pseudo-labeled data.
The viability of semi-supervised learning has been boosted recently by Generative Adversarial Networks ( GANs), machine-learning systems that can use labeled data to generate completely new data, for example creating new images of Pokemon from existing images, which in turn can be used to help train a machine-learning model.
Were semi-supervised learning to become as effective as supervised learning, then access to huge amounts of computing power may end up being more important for successfully training machine-learning systems than access to large, labeled datasets.
4.What is Reinforcement Learning?
A way to understand reinforcement learning is to think about how someone might learn to play an old school computer game for the first time, when they aren’t familiar with the rules or how to control the game. While they may be a complete novice, eventually, by looking at the relationship between the buttons they press, what happens on screen, and their in-game score, their performance will get better and better.
An example of reinforcement learning is Google DeepMind’s Deep Q-network, which has beaten humans in a wide range of vintage video games. The system is fed pixels from each game and determines various information about the state of the game, such as the distance between objects on the screen. It then considers how the state of the game and the actions it performs in-game relate to the score it achieves.
Over the process of many cycles of playing the game eventually, the system builds a model of which actions will maximize the score in which circumstance, for instance, in the case of the video game Breakout, where the paddle should be moved to in order to intercept the ball.
As you can see, there are many exciting opportunities for machine learning over the next decade. Many of them are already in use or will be in use soon, so you can expect machine learning to become an increasingly important part of effective online business.