Azure Machine Learning Studio Project Startup Strategy

Let’s get started with a quote from Tom Dietterich, who is Emeritus Professor of Computer Science at Oregon State University. “The goal of machine learning is to build computer systems that can adapt and learn from their experience.” I think his quote helps to bring some clarity to the confusion between machine learning and artificial intelligence. Many mistakenly use these terms interchangeably. Artificial Intelligence (AI) is a broad concept of giving machines tasks to carry out that require the ability to “think”. Machine Learning (ML) is the current application of AI built on the idea that machines can learn how to recognize patterns in data. Machine learning is an application of the AI concept. Since we haven’t hit the singularity yet, it us up to us mere humans to figure out how best to apply this new technology to help us out.

THInc.IT will use Azure Machine Learning Studio with a very forward-thinking client who would like to apply this new technology to some specific prediction problems. When considering this project, we realized that we must first establish the object of the model and then define problem domain within which that model must function.

Here are some prerequisite questions to ask:

  1. How much of the future is predictable with what we know?
    Are we working with an objective within a problem domain that is predictable with what we know? If something is unpredictable, then it can’t be predicted. Seems simple, but often model objectives miss the mark.
  2. Do we have a large amount of historical data that is useful?
    To train a model to learn, we need enough data to provide historical patterns that lead to predictability. If you study weather models, you learn that predictions depend on patterns in history. There are multiple models for weather, each with its own strengths and weaknesses. Often the model differences depend the weight each gives to a specific prediction feature. They all rely on patterns in history to inform future predictions.
  3. Are we solving the problem in other ways today?
    Base your machine learning projects on institutional experience. Many companies already use brain power or other tools to manually build these forecasts. Base the model’s algorithm and general process on these existing processes and you will reduce the amount of trial and error. The more an organization understands the factors that affect future performance, the more capable the model will be in supporting that effort.

Object of the Model

The best way to build a machine learning model is to have the target objective always in focus. Consider the objective.  Is it specific enough? Can it be evaluated? Is it useful? Many times, a model developer must travel multiple paths to find the components of the objective, then pull the components together into a whole. That is, build out the overall model by decomposing the objective into smaller, simpler parts. Test each part, then assemble them together to achieve the broader objective. I will be writing a blog post in the near future about this process.

Problem Domain

The most important aspect of a machine learning strategy is to understand the domain in which the problem exists. Many people define this as feature engineering or simply assembling the information the model needs to learn. To me, it is more than that.  To understand the problem domain is to understand the specific factors that might throw the accuracy of the model off.  Organizations that understand how the domain in which they operate affects their predictability need to articulate those concerns and how they address them.

Azure Machine Learning Studio provides every organization access to business intelligence capabilities that once were limited to organizations with a great deal of maturity. The studio itself is very easy to setup and use.  I recommend investing time in on Khan academy or revisit some textbooks on statistical analysis. These concepts will feature heavily in model development and training, so get back in touch or learn them before attempting your first machine learning project.