AI models are only as good as the data used to train them. In biology, this often involves large, sensitive datasets – from genomic sequences to patient health records. For engineers, this could include data pertaining to the structural health of a treasured building. The user has a duty to ensure that these data are collected ethically, employing anonymisation and caution where required, and used with necessary, explicit consent.
Data Integrity
The Responsibility of the Researcher
Data provenance – the origin, history and documented changes made to a dataset across its lifetime.
Beyond consent, however, lies the deeper issue of data provenance. For instance, using scraped or crowd-sourced data from unverified sources might compromise reliability, resulting in potential misinformation. As researchers, we must assess the source of our data, how they were generated, and whether they were produced with the necessary scientific rigour and care.
Additionally, AI can obscure the human agency behind research findings. If an algorithm generates a result or suggests a pattern, it remains the responsibility of the researcher to interrogate that output. Blind trust in AI predictions, especially in high-stakes environments such as drug discovery or assessing the structural integrity of a high-rise building, can lead to harmful consequences. Equally, transparency in the development, training and validation of new models must be treated with equal caution, in order to avoid misapplication.