Data Mining: Change vs Continuity
Big Data, Artificial Intelligence, and Machine Learning are no longer topics that reside exclusively within Information Systems and Data Mining. The significant social and business benefits are now a recognized reality whereby these topics are now part of the public vernacular. This increased public awareness has just further emphasized the role of data mining as the key business discipline in achieving these benefits. In this new paradigm, volume of data, variety of data (structured, semi-structured, and unstructured) and velocity of data pose new challenges when determining the right data mining approach. This is further compounded with the recent breakthroughs of artificial intelligence and its ability to deliver significantly improved performance across a variety of different tasks but where these breakthroughs have occurred as a result of using Big Data. More data and better algorithms. What does it all mean for the data scientist? Does the role of the data scientist need to be reinvented?
In answering these questions, it is helpful to look at other disciplines which have undergone great change. One good example is the field of medicine which has always undergone great change. But there is a main body of knowledge which has fundamentally remained constant and which forms the core subjects at any medical school. Not that we should compare the disciplines of data science and medicine in terms of their specialized areas of knowledge expertise, but the point is that data science does have a body of knowledge which has remained constant since its birth by the credit card companies of the 1950’s.
In Data Mining for Managers, Industry-veteran Richard Boire presents a 4 step approach which has been undertaken in all data mining projects and exercises. These four steps are:
- Identification of the Business Problem
- Creation of the Analytical File
- Applying the Data Mining Techniques
- Measurement and Implementation
Each of these steps are explored in great detail with case studies across all industries. From the learning obtained through these rich examples and case studies, the reader gains a much fuller appreciation and importance of each stage within the data mining process.
Software tools are providing more automated ways of conducting various tasks. This simply allows the data miner to perform more work in a much quicker timeframe. For example in the area of predictive analytics ,we are observing the emergence of many automated modelling tools that can build dozens of different mathematical models on the fly. The data scientist then needs to determine the best model based on his or her knowledge of evaluating models. However, the selection of that model will also be based on the explainability of the solution to the business end user as well as a comfort factor in implementing this solution. The data miner will also be very knowledgeable in the analytical file that was used as inputs to these automated routines. Yet, it is imperative that the data miner rely on the foundational principles and tasks of data mining in order to more fully leverage these tools. An automated tool in the hands of an inexperienced data miner is a recipe for disaster as that person does not have the rigor and discipline of data mining which is so clearly laid out in Boire’s book.
Even in the creation of the analytical file, the requisite skills of being able to “code” in order to develop this file are no longer mission-critical. Certain software companies now provide tools where no programming code is required to create an analytical file or to conduct basic analytics. But once again, the data miner needs to understand the process of how to “work” data in order to create an analytical file which is emphasized very strongly in Boire’s book.
The data mining role will evolve where the emphasis will be on those hybrids who can understand data and math output and can best determine how to apply this knowledge in solving a given problem. This is the high growth area of data mining. Not only is it the high growth area but it represents a discipline of the future that will be critical for anyone seeking potential opportunities in the C-suite. Through many case studies and exercises, Boire’s book is amply qualified to provide real-life learning that emphasizes the hybrid approach. This book is a must read for any person who is considering a career in the exciting and ever-evolving area of data mining.
Richard Boire, author of Data Mining for Managers, is a recognized authority on predictive analytics and is among the top five experts in this field in Canada. This expertise has evolved into international speaking assignments and workshop seminars in the U.S., England, Eastern Europe, and Southeast Asia. He has also chaired numerous conferences on this topic within Canada and is the current Predictive Analytics World Conference Chair within Canada.