CAIRN-INT.INFO : International Edition

1This book is for anyone wishing to discover data science. As the subtitle indicates, it includes an introduction to machine learning [1] through practice. The book is composed of three parts, the last of which is devoted exclusively to practical case studies – an effective teaching tool presenting the practical aspects of machine learning algorithms – while the first two parts establish the fundamental elements.

2The book is composed of relatively short, titled information sheets or sections, making it relatively easy to read around on the different concepts. The first sheets define terms – a useful service even for experienced researchers. The sheet on logistic regression, for example, offers a different perspective than those currently presented in statistics or social science methods manuals.

3The book teaches methods to beginners but is also useful in substantiating ongoing analyses. Its three parts correspond chronologically to the three major phases to be followed in producing an analysis: problem definition, method choice and implementation.

4In Part I, which is quite short, the authors specify the types of problems the book is designed to solve as well as the tools needed to do so. This part contains a chapter on programs that also works to clarify the book’s coverage.

5Part II presents machine learning analysis methods over 14 chapters. It follows the usual order of books introducing a complex topic, separately presenting standard analytic methods and more contemporary ones. The latter include random forests, gradient boosting, and the support vector machine (SVM), all of which may raise questions for novices. The section on random forests offers a wonderfully clear presentation that covers all the essentials in no more than ten pages.

6Part III is the most important, though it relies heavily on the other two, especially Part I. It describes in detail a substantial number of practical cases chosen by the authors from the Kaggle web platform, and so gives readers direct access to the data sets and programs. The methods presented in Part II are supplemented here with practical methods, including a presentation of online learning (p. 219), which can be used if the data set is large enough.

7The authors also point out the strengths and weaknesses of the two components of data science, namely statistics and information technology, explaining how the former help apprehend developments in the latter. They rightly favor wider diffusion of visualization methods – a point on which there can be no disagreement. But it would have been useful to specify how the methods complement each other and the different types of assistance they can provide. It is also regrettable that French scientific language was not used. English terms are everywhere despite the existence of French equivalents.

8The far-ranging bibliography, emphasizing recent books and articles but including classic, fundamental studies, is another of the book’s strengths. Relevant references are conveniently listed directly after each information sheet or section. Overall, the book is an excellent toolbox for newcomers to machine learning and is also quite pleasant to read. Readers will benefit from the authors’ long professional experience as both teachers and users of the methods they present.


  • [1]
    Machine or statistical learning refers to the development, design, application and analysis of methods that enable machines (in the broad sense) to process data systematically and so to use classic algorithms to carry out difficult tasks or resolve problems.
Elisabeth Morand
This is the latest publication of the author on cairn.
Uploaded on on 30/11/-0001
Distribution électronique pour I.N.E.D © I.N.E.D. Tous droits réservés pour tous pays. Il est interdit, sauf accord préalable et écrit de l’éditeur, de reproduire (notamment par photocopie) partiellement ou totalement le présent article, de le stocker dans une banque de données ou de le communiquer au public sous quelque forme et de quelque manière que ce soit.
Loading... Please wait