Project Manager:
Prof. Dr. Peter Uhrig

Pose Estimation on Russian International News Media

Principal Investigators:
Prof. Dr. Peter Uhrig
Affiliation:
FAU Erlangen-Nürnberg
HPC Platform used:
NHR@FAU: Alex and Fritz

As multimodal communication analysis continues to evolve, high-performance computing (HPC) is playing a transformative role in enabling large-scale annotation and data processing. In the context of the DFG/AHRC-funded research project World Futures Multimodal Viewpoint Construction by Russian International Media a research team led by Anna Wilson (University of Oxford) and Peter Uhrig (FAU Erlangen-Nürnberg) has developed an innovative framework for automating speech, text, and gesture annotation. This interdisciplinary effort leverages state-of-the-art AI techniques, supported by the scalable HPC infrastructure provided by the Erlangen National High-Performance Computing Centre (NHR@FAU) in the project Pose Estimation on Russian

Leveraging HPC to extend research potential in the humanities

Principal Investigators:
Umut Bașaran, Florian Barth, George Dogaru, Prof. Dr. Philipp Wieder
Affiliation:
Georg-August-Universität Göttingen
HPC Platform used:
NHR@Göttingen

Within Text+, the NFDI consortium dedicated to building and providing infrastructure for the field of digital humanities, high performance computing (HPC) is gaining ground quickly. With the arrival of large language models (LLMs), the motivation for providing HPC infrastructure increased decisively. As a consequence, the first HPC service was established, easing the way for developments in several other areas where access to HPC is needed for enabling solutions otherwise not feasible. Examples of HPC use in Text+ are the Text+ LLM service and the NLP tool MONAPipe.

Project Manager:
Nicolas Flores-Herr

Open GPT-X - Evaluating the Performance of Large Language Models

Principal Investigators:
René Jäkel
Affiliation:
Techniche Universität Dresden
HPC Platform used:
NHR@TUD Barnard + Alpha + Capella

OpenGPT-X has set a goal to create and train open large language models (LLM) for European languages. Existing language models focus primarily on the English language, and hence perform unfavourably when used for any of the other commonly spoken European languages.
From large-scale benchmarking of multilingual LLMs to introducing Teuken-7B models, our research uncovers how tokenization and balanced datasets enhance cross-lingual performance. Join us in exploring transparent and reproducible innovations shaping the future of multilingual AI.

Project Manager:
Dr. José Calvo Tello

Semi-Automatic Subject Classification with Basisklassifikation

Principal Investigators:
Dr. José Calvo Tello
Affiliation:
Georg-August-Universität Göttingen
HPC Platform used:
NHR@Göttingen

In this project the goal is to use algorithms to predict classes of the library classification system “Basisklassifikation” (which can be translated as basic classification). A library classification system is a taxonomy of predefined classes that represent disciplines, subdisciplines, themes or types of publications. Subject librarians assign one or more of these classes to each publication, allowing both final users or retrieval system to use this annotated information for finding publications. As input data we observe mainly bibliographic data, such as for example the title, the name of the publisher, the year of publication and the language of the publication. The algorithms should suggest several classes, which are then analyzed by