Man has always been enamored and mystified by human intelligence. The ability to learn and understand patterns in seemingly unrelated things have contributed a great deal to our success here on this planet. Researchers have always been on the hunt for the holy grail, the ability to replicate this intelligence in a machine and, in the process, understand what is it that makes the human mind what it is. In a series of seminal papers Alan Turing, who is widely considered to be the father of theoretical computer science and artificial intelligence, gave the basics of the Turing machine, which is one of the first formal definitions of the general purpose computer. The logical conclusion after that was if the computer could think and this gave birth to the Turing Test. These thoughts on general computing have influenced most of the general purpose languages that we use today.

## When do we need machine learning?

But, this has not been enough. Though general purpose computer languages are Turing Complete, this does not solve the whole problem of human intelligence as the computer can only solve those problems for which the algorithm is known. What if the algorithm is unknown? This is where machine learning algorithm comes in. Let’s take an example. Jack and Mary are a newlywed couple and they want to buy a place in the near suburbs where they want to raise a home. Investing in a home and buying a house is one of the most expensive and largest purchases of any person. So, Jack and Mary want an inexpensive way to monitor the price of the land that they are interested in. Now, the price of land depends on various factors like the area of the land and the locality it’s placed in, for which we need a real estate expert to evaluate this correctly. However, the expert may have biases when he is advising us. How can we know that we can trust the person? Only if we had an algorithm or a mathematical formula that could correctly answer, that would be perfect. This is a kind of problem that probably does not seem to have anything like that. So, in this case, we will take the help of machine learning.

Please note that some may argue that the theory explained here falls under the subcategory of supervised learning under machine learning and does not show all the capabilities of machine learning. Having said that, the aim of this post is not to show all the facets of machine learning but for the reader to have some “idea” or “feel” of what machine learning is all about and how machine learning happens.

Coming back to the topic, machine learning is implemented when:

* A pattern exists

* The pattern cannot be defined mathematically. At least no one knows that right now

* The data must exist

Let’s take a look at the above points. For us to implement machine learning a pattern must exist. Machine learning cannot help if there is no pattern. If the distribution is truly random, then no amount of machine learning will be able to help. We can always fit a pattern in any set of data, but once new random data comes in, it will throw all our predictions to the dust. Another question may be how to understand if there is a pattern in the data from before. One of the greatest dichotomies of machine learning is that we don’t. We get a ‘feel’ of the data and based on that we apply the techniques and hope that it works. Sounds more like ‘magic’ and not at all the ‘science’ in ‘Data Science’.

Now looking at the second point. The pattern should not be that obvious so that it can easily be defined mathematically. If it were, we could have just written a program and be done with it. Implementing machine learning models is ill-advised in such cases, as there is a certain amount of uncertainty in it. Implementation of such a model will never give us that kind of accuracy that we get by simply writing the code. So go along with machine learning only if the pattern cannot be identified exactly.

Lastly, the data must exist. If there is no data, then there is nothing else to do. Period.

### The Process

We are going to describe a home as a vector with each component of the vector corresponding to some attribute of the home. Hence, the first field may be the area of the house. Second, the distance of the house from the nearest store and so on.

[

a1 -> area of the house

a2 -> distance of the house from the nearest store

a3 -> previous selling price of the house.

…

an

]

Such a house will have an optimum price that can be said to be the value of the house. We can say that if the above vector is called x, then there exists some function f, such that f(x) = y where y is the value of the house. Machine learning aims to find the value of f which is the ideal formula. The issue is that this function f is unknown and can be anything. This function f is the set of all functions that exists, which is infinite. So, how to find this function f . We say that this function f is uncomputable. Instead of using the data set that we have, we will formulate x and y and then we will try to guess, using machine learning methods, a function g such that g ≈ f (g is an approximation of f). So while f is unknown to us, g is very much known and is called the hypothesis set. We are hoping here that g is a good estimate of the function f .

### The Hypothesis Set and Learning Algorithm

We can have the whole thing in the form of the following diagram.

So, we start by first defining x and y and asking if there is a function f such that f(x) = y. Then we take the data set and come up with a learning algorithm where we generate a function g such that g is a good approximation of f. There is another crucial thing that is needed in the mix and it is the hypothesis set. The Learning algorithm and the Hypothesis set are the tools that we chose in order to come to the final solution. The Hypothesis set is the set of functions that have some common and unique properties that make them a viable candidate to be considered for the final hypothesis. We apply the learning algorithm to the hypothesis set and then choose a specific function from the hypothesis set. In terms of examples, the Hypothesis set could be linear regression and the learning algorithm could be gradient descent or the hypothesis set could be neural networks and the learning algorithm could be back propagation. Therefore, in order words, hypothesis set is a set of similar functions which have been shown to have good results under specific conditions and the learning algorithm is the algorithm that will do the actual searching. It will search through the space of possible hypotheses and produce the specific function that we will consider.

### An example of machine learning

Now, working with the example of predicting home property values, let’s take a simple hypothesis set — linear regression. Let’s say take only one dimension of x, or one feature, which is the area of the house for simplicity.

So, for input x = (x1, x2, x3, … xN) which are the corresponding areas of N houses.

We will say that output y = (y1, y2, y3, …, yN) are the corresponding values of the houses. Say, this data set can be plotted in the graph shown below. Please note that this graph is possible because we are only taking one feature of the data set. In many cases, the data set would be of huge dimension and so we may not be able to visualize the graph but, would need to resort to other methods to get the feel of the data.

Now, we will try to fit a linear line into the model. Since we have taken the hypothesis set to be linear regression the family of equations that we will consider is shown below.

We can convert this equation in matrix form, which can be written in the following format:

where h is the output vector, Θ and X are the model weights and the input matrices respectively. To start, we initialize Θ to random values. This gives us a random line.

Now that we have the hypothesis set, we will decide on a learning algorithm. The learning algorithm tells us the resources we can work with. In this case, we will choose the gradient descent to be the learning algorithm. Now, we will run through the whole training set and form the values of h for each value of x . Applying the gradient descent algorithm, it updates the weight vector Θ such that next time the error of h from the real values of y is lessened. In this way, after iterating through the training set, hundreds of thousands of time we arrive at a line or at a value of Θ that represents the final hypothesis g.

Machine learning is slowly making inroads into all the aspects of our lives and soon it would be imperative for us as developers to be able to understand data and generate models. I hope through this post you have been able to grasp some of the beauty that encompasses this interesting field.