Data Science and Scientific Workflows

Content

The amount of data generated in scientific projects is increasing rapidly. The increase is partly due to the fact that new data-based evaluation methods allow a better and more precise analysis of scientific data. In addition, the linking of data provides new insights. This requires a systematic organization of data. The necessary knowledge of data science and computer science is equally required for both computer simulations and experimental investigations. The preparation/classification (e.g. electronic laboratory notebook) and structuring of data is a necessary step for their reuse. The lecture introduces the principles and software tools for the corresponding scientific workflows:  Python and libraries, Jupyter notebook, shell scripts and documentation with git-tools. Furthermore, an overview is given of database systems in materials research and the FAIR data principle (findability, accessibility, interoperability and reusability).

 

 

Objective:

 

Students will be able to

- organize and document data electronically

- handle data formats: simple, hierarchical ones

- deal with software management tools (git, gitlab)

- record scientific workflows in detail and ensure traceability

- use python-based libraries for data handling and analyses

 

Detailed lecture content:

 

  1. Introduction: the need for data science and computer science basics.
  2. Programming and programming paradigms using Python
  3. Software and data management: local and central management (git, gitlab)
  4. Automating tasks: from scripts to workflow (with many examples from simulation and experiment)               
  5. Data processing                                                                                          
  6. Electronic lab book
  7. Data management requirements for publicly funded projects

 

Exercise:

The lecture material will be deepened in the exercises (exercise 1SWS).

 

Mode of examination:

  • Project:  Project topics from the areas
    • Material simulation and workflow
    • Data organization and analysis: from experiment or simulation
    • Presentation of the project in a 15 minute lecture + questions
  • Preliminary examination performance: successful start to project work
Language of instructionGerman
Bibliography

Literatur:

  • Handbuch Data Science, Hanser Verlag
  • Effective Computation in Physics, Scopatz & Huff, O’Reilly 2015
  • Python Data Science Handbook, J. VanderPlas, O’Reilly 2016.