Thoughts on exploratory data analysis

Can developing critical thinking, broader than statistics issues, help improve data analysis practice, especially in the exploratory setting?

gdsp
good practice
data science
critical thinking
Published

March 10, 2022

Good Data Science Practice: Moving Toward a Code of Practice for Drug Development: Moving Toward a Code of Practice for Drug Development, April 2022. Statistics in Biopharmaceutical Research. DOI:10.1080/19466315.2022.2063172

There is growing interest in data science and the challenges that scientists can solve through its application. The growing interest is in part due to the promise of “extracting value from data.” The pharmaceutical industry is no different in this regard reflected by the advancement and excitement surrounding data science. Data science brings new perspectives, new methods, new skill sets and the wider use of new data modalities. For example, there is a belief that extracting value from data integrated from multiple sources and modalities using advances in statistics, machine learning, informatics and computation can answer fundamental questions. These questions span a variety of themes including disease understanding, drug and target discovery, and trial design. By answering fundamental questions, we cannot only increase knowledge and understanding but more importantly inform decision making; accelerating drug development through data-driven prioritization, increasingly precise and accurate measurements, optimized trial designs and operational excellence. However, with the promise of data science, there are obstacles to overcome, especially if data science is to live up to this promise and deliver a positive impact. These obstacles include consensus on the definition of data science, the relationship between data science and existing fields such as statistics and computing science, what should be involved in the day-to-day practices of data science, and what is “good” practice. In this article, we cover these themes, highlighting issues with scientific practice from five perspectives and argue how advances in data science will not be immune, especially exploratory, investigative, and innovative activities. We propose a definition of data science as a coming together but also a refocusing of established disciplines leading to a framework for good practice. In doing so, we aim to begin a dialogue on good data science practice in the context of drug development, where there is no industry view or consensus.