Yes! It is a huge world. A really, really huge world in which you have infinitely many ways to achive a goal unless there are some restrictions which always exist fortunately. You may find it little bit murky, little bit daunting, little bit overwhelming, and a little bit scary in the beginning. You may not understand anything about it, you will probabily stumble over and over again, and you definitely get lost in the way. If, however, you have a passion for it, you will see that all those things will be the underlying force behind your progress, and ultimately your success.
Yes, I am talking about Data Science. From now on, I will write my topics in English and they will cover my journey of data science. I want to write about it because there are ever incresing demand for data science and there are also lots of people like me(in the beginning of my journey) who don’t know where to start the journey! Well, let’s start.
Everything started with the book Elements of Statistical Learning, Data Mining, Inference, and Prediction. During Analytics Edge course on edX from MIT, one of the successful students said that we should read and understand at least the first eight chapters of this book in order to be competent at data science. Then I decided to read the book. At the first glance, the book looked very attractive and fun to read, but all those good thoughts immediately vanished after a copule of pages. That didn’t happen due to the book, but happened due to my lack of basic knowledge about certain subjects. I want to go over them chapter by chapter. Since the first chapter is an introduction, I skip it. All the subjects from chapter 2 to chatper 8 are the building blocks of every statistical and machine learning method.
The second chapter of the book is Overview of Supervised Learning. It starts with basic
terminologies which we will encounter frequently in this journey. It assumes that you have basic knowledge of calculus, statistics, probability theory, and linear algebra. If you don’t, then you are in trouble! Fortunately, we are living in a world where we are just a couple of clicks away from anything we want to learn. There are lots of invaluable sources on the Internet.
– If you want to learn calculus, or refresh your knowledge of calculus, Khan Academy is a perfect place.
– If you want to learn statistics and linear algebra, there are lots of good courses on Coursera and edX. For linear algebra Linear Algebra from MIT given by Prof. Gilbert Strang is an amazing course. It covers all the subjects of linear algebra you need to know for data science. Especially spaces and matrix decompositions are the fundamentals.
– If you learn probability theory, there is a very good course called Introduction to Probability – The Science of Uncertainty. It follows the book Introduction to Probability, 2nd Edition, which I love much.
In order to understand this chapter and the followings, you will need at least basic understanding of those subjects. You will frequently encounter “differentiating w.r.t, integrating, probability distribution, probability density function, least squares, bias, variance”. If you are not familiar with those, then you can get familiar with them using the sources I wrote above. Furthermore, this chapter puts all the statistical and machine learning methods on a spectrum on which traditional least squares regression represents one end and K-Nearest Neighbor represents the other end, and any other algorithms are put in-between. The least squares regression is the most rigid one and has the lowest variance but potentially has higher bias, whereas KNN has the lowest bias but potentially higher variance. All those terms may make no sense for now, but they make sense as we progress.
The third chaper is Linear Methods for Regression. We will encounter regression
problems frequently in which we want to estimate a continuous outcome or response from various inputs. To understand the building blocks of regression, you must have an understanding of linear algebra, or vector algebra, or matrix algebra, all of which are used interchangeably. All of the computation and algorithms are created using linear algebra. It also covers the best subset selection and shrinkage methods, again frequently used in machine learning. For example, it is always said that correlated features are bad for regression. But if I say that “it is bad because the algorithm that minimizes the squared errors need to computer the inverse of feature matrix, and for this feature matrix to be invertable it has to be full rank. In other words, the columns of it have to be linearly independent. If the columns are not linearly independent (two or more features are highly correlated), then it probabily will have no unique inverse.”, doesn’t it make much sense? This sentence includes lots of linear algebra and differentiation. If you do not understand it, you need to go over the source I wrote, if you want to be a real data scientist I guess. Yeah, it comes with a price. Understanding of shrinkage is also very important because it helps prevent overfitting. Ridge Regression, for example, adds a positive constant to the diagonal entries of your input matrix before inversion. This makes the problem nonsingular even if original input matrix is not of full rank. The LASSO(Least Absolute Shrinkage and Subset Selection) is another regularization method that performs both subset selection and shrinkage. If you want to know where and why to use them, you need to get your hands dirty.
The fourt chapter is Linear Methods for Classification. This time, linear models are developed for classification tasks. This chapter includes Linear Discriminant Analysis(LDA), Quadratic Discriminant Analysis(QDA), Logistic Regression, and Separating Hyperplanes(Rosenblatt’s Perceptron and Optimal Separating Hyper Plane). These subjects require at least elementary knowledge of statistics and probability theory. Logistic Regression and LDA have much things in common, but the main difference lies in the way they are fitted. If you want to understand it, you need to know math. And If you understand it, you will know when to use which, which makes you faster and wiser. This covers Bayes Theorem form probability.
Chapters 5, 6, 7, and 8 are build upon these four chapters. I will go over them in the next post. Until then, challenge yourself. Try to understand what those chapters say in essence.