On the horizon are enormous challenges and opportunities for those of us involved in data science. They are wound too tightly together to be clearly distinguished. On the one hand, there is interest on the part of students, business people, and even the general public as never before. One need not be very old to remember the days when statistics was seen as a dry discipline of interest to few. On the other hand, the replication crisis continues to unfold and presents as ever more pervasive. This has the potential to produce a crisis of faith in data science and in science more generally.
Any teacher of an introductory course is implicitly in the business of selling the discipline. Our salesmanship must be honest. It is thus paramount to communicate to students what these techniques can do as well as what they cannot and further to communicate that even this boundary is not well-understood. Students must also understand that data analytics is first a way to understand data and only then a means of testing hypotheses.
There is another important element to this. While many disciplines and business concerns use data science to support decisions, the relationship between the methods themselves and the decisions lies outside the boundaries of the science considered strictly. This is where domain area knowledge or an “area of application” in more traditional statistical parlance comes in. Having worked with data both in industry and in academia, I have spent a great deal of time thinking about this relationship and I endeavor to communicate this difficulty to students.
Ultimately, students must understand that data analytic work starts and supports but does not end conversations. Even results that may seem decisive are only so with respect to some pre-existing model of the world. It is important to understand then that it is those two things together, not the data analysis alone that are decisive. The mystique of data science lends itself to magical thinking. As teachers, we must combat this thinking in the name of intellectual and professional honesty.
For this reason, I understand data science as a Socratic discipline and an applied branch of the philosophy of science. In some sense, the very framing of the replication crisis is misleading. While there has clearly been a great deal of truly problematic work, even the most carefully done work will sometimes fail to replicate. Also, failure to replicate once is hardly decisive. We rather need to think of models and methods as part of an ongoing conversation that encapsulate some of our current thinking. It is never appropriate to promulgate an “official scientific position.” Therefore, our goal as teachers of data science is to inculcate in students an attitude toward data analytics that is simultaneously careful, conservative, and also ecumenical; supporting of decisions only as intermediated by conversations. This is both an immense challenge and an immense privilege.