Machine learning: what is it really?

This post is written by Brian Godsey, Panopticon Labs’ Chief Data Scientist. If you find this excerpt interesting and want to learn more, look for a special promotion code at the end of the post good for a 40% discount on Brian’s new book, Think Like a Data Scientist:

I’ve been using machine learning for over a decade, first for national defense, then in academia, and later at analytic software start-ups. See my bio for more information. Since joining Panopticon in 2014, I’ve encountered many folks in the video game industry—many of them technically or artistically brilliant people—who have little or no experience with machine learning and what it can do. At Panopticon, machine learning is the bread and butter of our analytics; it’s how we highlight the bad guys in a sea of engaged players. So, I take every opportunity to bridge the gap from us to those folks who are curious about how machine learning can make video games safer, more fair, and more fun. Below is an excerpt from my recently-published book, Think Like a Data Scientist, that lays the groundwork for understanding what machine learning is, what it can do, what makes it so complex, and why there’s such a wide range of algorithms and tools available. I hope it conveys the power as well as the complexity involved with applying machine learning to problems such as in-game security.

——————————

In the world of analytic software development, machine learning is all the rage these days. Not that it hasn’t been popular for a long time now, but in the last few years I’ve seen the first few products come to market that claim to “bring machine learning to the masses,” or something like that. It sounds great on some level, but on another level it sounds like they’re asking for trouble. I don’t think most people know how machine learning works or how to notice if it has gone wrong. If you’re new to machine learning, I’d like to emphasize that machine learning, in most of its forms, is a tricky tool that shouldn’t be considered a magic solution to anything. There’s a reason why it takes years or decades of academic research to develop a completely new machine learning technique, and it’s the same reason why most people wouldn’t yet understand how to operate it: machine learning is extremely complex.

The term machine learning is used in many contexts and has a somewhat fluid meaning. Some people use it to refer to any statistical methods that can draw conclusions from data, but that’s not the meaning I use. I use the term machine learning to refer to the classes of somewhat abstract algorithms that can make conclusions from data but whose models—if you want to call them that—are difficult to dissect and understand. In that sense, only the machine can understand its own model, in a way. Sure, with most machine learning methods, you can dig into the innards of the machine’s generated model and learn about which variables are most important and how they relate to each other, but in that way the machine’s model begins to feel like a data set unto itself—without reasonably sophisticated statistical analysis, it’s tough to get a handle on how the machine’s model even works. That’s why many machine learning tools are called black box methods.

There’s nothing wrong with having a black box that takes in data and produces correct answers. But it can be challenging to produce such a box and confirm that its answers continue to be correct, and it’s nearly impossible to look inside the box after you’ve finished and debug. Machine learning is great, but probably more than any other class of statistical methods, it requires great care to use successfully.

I’ll stop short of giving lengthy explanations of machine learning concepts, because countless good references are available both on the internet and in print. I will, however, give some brief explanations of some of the key concepts to put them in context.

Feature extraction is a process by which you convert your data points into more informative versions of themselves. To get the best results, it’s crucial to extract good features every time you do machine learning—except, maybe, when doing deep learning. Each feature of a data point should be showing its best side(s) to a machine learning algorithm if it hopes to be classified or predicted correctly in the future. For example, in credit card fraud detection, one possible feature to add to a credit card transaction is the amount by which the transaction is above the normal transaction amount for the card; alternatively, the feature could be the percentile of the transaction size compared to all recent transactions. Likewise, good features are those that common sense would tell you might be informative in differentiating good from bad or any two classes from one another. There are also many valuable features that don’t make common sense, but you always have to be careful in determining whether these are truly valuable or if they’re artifacts of the training data set.

Here are a few of the most popular machine learning algorithms that you would apply to the feature values you extracted from your data points:

  • Random forest—This is a funny name for a useful method. A decision tree is a series of yes/no questions that ends in a decision. A random forest is a collection of randomly generated decision trees that favors trees and branches that correctly classify data points. This is my go-to machine learning method when I know I want machine learning but I don’t have a good reason to choose a different one. It’s versatile and not too difficult to diagnose problems.
  • Support vector machine (SVM)—This was quite popular a few years ago, and now it has settled into the niches where it’s particularly useful as the next machine learning fads pass through. SVMs are designed to classify data points into one of two classes. They manipulate the data space, turning it and warping it in order to drive a wedge between two sets of data points that are known to belong to the two different classifications. SVMs focus on the boundary between the two classes, so if you have two classes of data points, with each class tending to stick together in the data space, and you’re looking for a method to divide the two classes with maximal separation (if possible), then an SVM is for you.
  • Boosting—This is a tricky one to explain, and my limited experience doesn’t provide all the insights I probably need. But I know that boosting was a big step forward in machine learning of certain types. If you have a bunch of so-so machine learning models (weak learners), boosting might be able to combine them intelligently to result in a good machine learning model (a strong learner). Because boosting combines the outputs of other machine learning methods, it’s often called a meta-algorithm. It’s not for the faint of heart.
  • Neural network—The heyday for the neural network seemed to be the last decades of the twentieth century, until the advent of deep learning. In their earlier popular incarnation, artificial neural networks (the more formal name) were perhaps the blackest of black boxes. They seemed to be designed not to be understood. But they worked well, in some cases at least. Neural networks consist of layers upon layers of one-way valves (or neurons), each of which transforms the inputs in some arbitrary way. The neurons are connected to each other in a large network that leads from the input data to the output prediction or classification, and all of the computational work for fitting the model involved weighting and reweighting each of the neurons in clever ways to optimize results.
  • Deep learning—This is a new development in this millennium. Loosely speaking, deep learning refers to the idea that you might not need to worry much about feature extraction because, with enough computational power, the algorithm might be able to find its own good features and then use them to learn. More specifically, deep learning techniques are layered machine learning methods that, on a low level, do the same types of learning that other methods do, but then, on a higher level, they generate abstractions that can be applied generally to recognize important patterns in many forms. Today, the most popular deep learning methods are based on neural networks, causing a sort of revival in the latter.
  • Artificial intelligence—I’m including this term because it’s often conflated with machine learning and rightly so. There’s no fundamental difference between machine learning and artificial intelligence, but with artificial intelligence comes the connotation that the machine is approaching the intellectual capabilities of a human. In some cases, computers have already surpassed humans in specific tasks—famously, chess or Jeopardy!, for instance—but they’re nowhere near the general intelligence of an average human on a wide variety of day-to-day tasks.

HOW IT WORKS

Each specific machine learning algorithm is different. Data goes in, answers come out; you have to do much work to confirm that you didn’t make any mistakes, that you didn’t over-fit, that the data was properly train-test separated, and that your predictions, classifications, or other conclusions are still valid when brand-new data comes in.

WHEN TO USE IT

Machine learning, in general, can do things that no other statistical methods can do. I prefer to try a statistical model first in order to get an understanding of the system and its relationship to the data, but if the model falls short in terms of results, then I begin to think about ways to apply machine learning techniques without giving up too much of the awareness that I had with the statistical model. I wouldn’t say that machine learning is my last resort, but I do usually favor the intuitiveness and insight of a well-formed statistical model until I find it lacking. Head straight for machine learning if you know what you’re doing and you have a complex problem that’s nowhere near linear, quadratic, or any of the other common variable relationships in statistical models.

WHAT TO WATCH OUT FOR

Don’t trust the machine’s model or its results until you’ve verified them—completely independently of the machine learning implementation—with test-train separation as well as some completely new data that you and the machine have never seen before. Data snooping—the practice of looking at data before you formally analyze it and using what you see to bias how you analyze—can be a problem if you didn’t already do test-train separation before you snooped.

——–

Think Like a Data Scientist is available now.  Use discount code godsey40 for 40% off.