Investigating Emergent Abilities and Their Normative Implications in Language Models by Simulating Psychology Experiments

IRIS A-1

Project description

Large language models (LLMs) are currently at the forefront of intertwining AI systems with human life and communication. Due to rapid technological advances and their extreme versatility, LLMs nowadays have millions of users and are at the cusp of being the main go-to technology for information retrieval, content generation, problem-solving, etc. Therefore, it is of great importance to scrutinize their capabilities and reflect them critically. Due to increasingly complex and novel behavior in current LLMs, detecting new abilities can be done by treating the LLMs as participants in simulated psychology experiments that were originally designed to test humans. For this purpose, the project aims at introducing and conceptualizing a new field of research called “machine psychology”. First, it will describe potential links that one can forge between different fields of psychology and machine behavior research, as well as the many open research questions that empirical studies can tackle. Second, the project’s goal is to define methodological rules that are central to the field of machine psychology, especially by focusing on policies for prompt design. Third, the project will conduct exemplary machine psychology studies and investigate pitfalls when interpreting an LLM’s behavior by using rich psychological concepts and terms. In this context, the project will focus in particular on machine behavior that is normatively relevant, like moral decision-making, deception abilities, or biases. Fourth, the project will critically reflect on the ethical implications of these potential new abilities for society.

Project information

Project title Investigating Emergent Abilities and Their Normative Implications in Language Models by Simulating Psychology Experiments
Project leader Dr. Thilo Hagendorff 
Project number IRIS A-1
To the top of the page