Reproducible research and the information economy

eResearch and the information economy

The information economy refers to the modern-day and continually evolving economic system where information, knowledge, and data are the primary drivers of productivity, growth, and innovation. In this economy, the creation, distribution, and consumption of information are more valuable than in traditional industries. The information economy relies on technological advancements, particularly in information and communication technologies (ICTs), to enable efficient processing, storage, and data sharing. Tech companies, digital service providers, and knowledge-intensive industries are typically seen as key players in the information economy. As biologists, however, we often overlook how our information pipelines and knowledge-sharing approaches might benefit from the principles that are now deeply ingrained in just about every aspect of our daily lives.

Embracing Technological Advancements: A Pathway to Enhanced Research and Collaboration

Over the years, I have enthusiastically adopted various technological advancements, recognising their potential to elevate my research impact both locally and globally and to keep pace with the evolving global landscape. However, I have observed that not all scientists share my enthusiasm for technology, leading to a sense of alienation among some colleagues who prefer traditional research methods where buckets and spades still rule.

It appears that, for some individuals, particularly in fields such as biology or ecology, there is a belief that focusing solely on their discipline-specific subject matter is sufficient and that insights from Computer Science Departments hold little relevance. This narrow perspective, in my view, is limiting and stifles creativity.

By embracing technology, we can not only broaden our horizons but also enhance our research capabilities and expand collaboration. We must remain open-minded, explore the potential of interdisciplinary learning, and leverage technology to maximise the possibilities in our respective fields.

The Interconnected Nature of Science and Technology: An Ongoing Journey

As the practice of science has undergone dramatic changes in recent years, driven in part by Moore’s Law, we are now tackling global issues across vast timescales. This transformation is largely attributed to the availability of vast amounts of data, which has necessitated the development of efficient algorithms to establish connections, access subsets, and distil complex information using supervised and unsupervised data-analytical techniques.

Concurrently, this data explosion has spurred the advancement of hardware capable of handling the computational, memory, and data transfer demands of big data. While it is debatable whether hardware development has facilitated the collection of increasing amounts of data or vice versa, the ultimate takeaway remains the same: technological progress is relentless, and the practice of science must adapt swiftly to keep up. By acknowledging this interconnected nature of science and technology, we can work with agility, ensuring we remain at the forefront of scientific discovery and innovation.

Embracing Modern Technologies Across Disciplines for a Future-Ready Workforce

Modern technologies are indispensable for those of us working with extensive datasets, whether in climate change, computational linguistics, or small-scale studies. My disregard for traditional disciplinary boundaries has enabled me to stay informed about relevant advancements, driving my determination to develop this website, The Tangled Bank. My motivation is further fuelled by the concern that many colleagues are failing to maintain the necessary interest for continuous advancement.

A reluctance to embrace change not only affects ourselves but also has a domino effect on postgraduate and undergraduate students. By not nurturing the required skills in students, academics hinder their ability to become well-rounded graduates equipped for the modern workplace and to develop transferable skills that transcend disciplinary boundaries. It is crucial to remember that many graduates, particularly those with Bachelor and Honours degrees, will pursue careers unrelated to their original fields of study. Yet, they want to have a degree that provides skills anywhere their future selves might find themselves.

To foster a future-ready workforce, it is necessary that we embrace technological advancements and cultivate adaptable, interdisciplinary skill sets in the next generation of graduates.

Exemplifying the Importance of Reproducible Research and eResearch Frameworks

Consider the challenge of conducting reproducible research, which, when addressed, can resolve many eResearch framework issues. A typical PhD student spends a few months writing their thesis, which often serves as the sole evidence of degree completion. However, the majority of the learning and methodological expertise developed during the rest of the degree remains undocumented and eventually forgotten. This wealth of knowledge is rarely shared, leading to repeated dead-ends in knowledge transfer as new candidates embark on similar journeys.

Most research neglects the full data lifecycle, focusing mainly on the initial steps. The failure to share behind-the-scenes solutions results in non-reproducible research, making the scientific process opaque and fostering public mistrust. This opacity hinders collaboration among supervisors and co-investigators, increases error-proneness, and scales poorly as datasets and complexities grow. Additionally, the research process becomes less efficient due to inadequate documentation of data selection, filtering justifications, metadata tracking, data versions, and processing changes.

Addressing these challenges is essential to promote reproducible research, enhance collaboration, and build public trust in science, ultimately contributing to a more efficient and transparent research process. This makes the research process extremely wasteful in as far as preserving the full complexity of what a typical student learns.

Promoting Reproducible Research through Lab Notebooks and Proper Workflow Management

Many solutions exist to address research reproducibility, but I find lab notebooks using RStudio’s markdown (for R users) or Jupyter Lab/Notebooks (for Python users) particularly effective. Version tracking can be achieved using git, such as in GitHub. These notebooks integrate code with text, allowing automatic updates of tables and figures with new data. My students are proficient in this approach, ensuring their work is reproducible.

I advocate for the widespread adoption of lab notebooks at universities, making them a prerequisite for thesis submission in applicable disciplines. The thesis can be a reproducible document written in markdown, and typeset to various formats such as PDF, HTML, MS Word, or eBook. This method also incorporates proper bibliography management.

This reproducible workflow complies with funding instruments requiring data and code sharing, reproducibility, and open publication per FAIR principles. It is already prevalent in disciplines like ecology. While this example focuses on paper or thesis writing, technology impacts research practice across disciplines, commerce, arts, and law. A comprehensive overview is beyond our scope, but the examples provided illustrate the broader possibilities.

Reuse

Citation

BibTeX citation:
@online{smit,_a._j.,
  author = {Smit, A. J.,},
  title = {Reproducible Research and the Information Economy},
  url = {http://tangledbank.netlify.app/pages/reproducible_research.html},
  langid = {en}
}
For attribution, please cite this work as:
Smit, A. J. Reproducible research and the information economy. http://tangledbank.netlify.app/pages/reproducible_research.html.