AI Safety, Ethics, and Society

May 10, 2024

Author:

Representation Engineering: a New Way of Understanding Models

The blog post below is partly adapted from the book's preface.
‍
We are excited to announce the launch of AI Safety, Ethics and Society, a new textbook and online course. The course aims to provide an introduction to how current AI systems work, why many experts are concerned that continued advances in AI could pose severe societal-scale risks, and how society can manage and mitigate these risks.

The textbook and other course materials are available to read online here and are forthcoming in print with Taylor & Francis. We will be running an online course based on the textbook from July-October 2024 - apply here by May 31st to take part.

Key features

Interdisciplinary approach. Ensuring that AI systems are safe is more than just a machine learning problem - it is a societal challenge that cuts across traditional disciplinary boundaries. A full understanding of the risks posed by AI requires knowledge in several disparate academic disciplines, which have so far not been combined in a single text. This book was written to fill that gap and adequately equip readers to analyze AI risk. It takes a holistic approach drawing on insights from engineering, economics and other relevant fields. The goal is to foster a thoughtful and nuanced understanding of AI safety, equipping participants equip readers with a solid understanding of the technical, ethical, and governance challenges that we will need to meet in order to harness advanced AI in a beneficial way. The book does not assume prior technical knowledge of AI.

Time-tested, flexible frameworks. In order to have a solid grasp of the challenges of AI safety, it is important to consider the broader context within which AI systems are being developed and applied. The decisions of and interplay between AI developers, policy-makers, militaries, and other actors will play an important role in shaping this context. Since AI influences many different spheres, this book focusses on time-tested, formal frameworks to provide multiple lenses for thinking about AI, relevant actors, and AI's impacts. The frameworks and concepts used are highly general and are useful for reasoning about various forms of intelligence, ranging from individual human beings to corporations, states, and AI systems. While some sections of the book focus more directly on AI risks that have already been identified and discussed today, others set out a systematic introduction to ideas from game theory, complex systems, international relations, and more. We hope that providing these flexible conceptual tools will help readers to adapt robustly to the ever-changing landscape of AI risks.

Table of contents

1. Overview of Catastrophic AI Risks: Diverse sources of societal-scale risks from advanced AI, such as malicious use, accidents, rogue AI, and the role of AI racing dynamics and organizational risks

2. AI Fundamentals: Basics of modern AI systems and deep learning, scaling laws, and their implications for AI safety

3. Single Agent Safety: Technical challenges in building safe AI including opaqueness, proxy gaming, and adversarial attacks, and their consequences for managing AI risks

4. Safety Engineering: Robust approaches to analyze and mitigate risks throughout the full sociotechnical systems of which AI is part, and approaches to managing low-probability, high-impact risks

5. Complex Systems: Analysis of AI systems and the societies they operate within as complex systems; interventions to improve safety need to take into account the unique properties of complex systems

6. Beneficial AI and Machine Ethics: opportunities and challenges in identifying beneficial goals and values and instilling these in AIs

7. Collective Action Problems: role of game theory and bargaining theories in understanding competitive pressures in AI development and obstacles to effective governance; challenges with building cooperative AI systems

8. Governance: options for AI governance at a corporate, national and international level; trade-offs between centralised and decentralised access to advanced AI

‍

‍Outline

The textbook’s content falls into three sections: AI and Societal-Scale Risks, Safety, and Ethics and Society. In the AI and Societal-Scale Risks section, we outline major categories of AI risks and introduce some key features of modern AI systems. In the Safety section, we discuss how to make individual AI systems more safe. However, if we can make them safe, how should we direct them? To answer this, we turn to the Ethics and Society section and discuss how to make AI systems that promote our most important values. In this section, we also explore the numerous challenges that emerge when there are multiple AI systems or multiple AI developers with competing interests.

Figure 1. Left: a "feature visualization" that highly activates a particular neuron in a neural network. Right: a collection of natural images that activate a particular neuron. These are examples of approaches to making the internal operations of AI systems more transparent and interpretable, which are discussed in the book

The AI and Societal-Scale Risks section starts with an informal overview of AI risks, which summarises many of the key concerns discussed in this book. We outline some scenarios where AI systems could cause catastrophic outcomes. We split risks across four categories: malicious use, AI arms race dynamics, organizational risks, and rogue AIs. These categories can be loosely mapped onto the risks discussed in more depth in Governance, Collective Action Problems, Safety Engineering, and Single-Agent Safety chapters, respectively. However, this mapping is imperfect as many of the risks and frameworks discussed in the textbook are more general and cut across scenarios. Nonetheless, we hope that the scenarios in this first chapter give readers a more concrete picture of the risks that we explore in this book. The AI Fundamentals chapter gives an accessible and non-mathematical explanation of current AI systems, setting out concepts in machine learning, deep learning, scaling laws, and so on. This provides the necessary foundations for the discussion of the safety of individual AI systems in the next section.

The Safety section aims to provide an overview of core challenges in safely building advanced AI systems. It draws on insights from both machine learning research and from general theories of safety engineering and complex systems which provide a powerful lens for understanding these issues. In Single-Agent Safety, we explore challenges in making individual AI systems safer, such as bias, transparency, and emergence. In Safety Engineering, we discuss principles for creating safer organizations and how these may apply to those developing and deploying AI. The need for a robust safety culture at organizations developing AI is crucial, so organizations do not prioritize profit at the expense of safety. Next, in Complex Systems, we show that analyzing AIs as complex systems helps us to better understand the difficulty of predicting how they will respond to external pressures or controlling the goals that may emerge in such systems. More generally, this chapter provides us with a useful vocabulary for discussing diverse systems of interest.

The Ethics and Society section focuses on how to instill beneficial objectives and constraints in AI systems and how to enable effective collaboration between stakeholders to mitigate risks. In the Beneficial AI and Machine Ethics chapter, we introduce the challenge of giving AI systems objectives that will reliably lead to beneficial outcomes for society, and discuss various proposals along with the challenges they face. In Collective Action Problems, we utilize game theory to illustrate the many ways in which multiple agents (humans, AIs, groups of humans, and AIs) can fail to secure good outcomes and come into conflict. We also consider the evolutionary dynamics shaping AI development and how these drive AI risks. These frameworks help us to understand the challenges of managing competitive pressures between AI developers, militaries, or AI systems themselves. Finally, in the Governance chapter, we discuss strategic variables such as the rate at which AI systems evolve and how widely access to powerful AI systems is distributed. We introduce a variety of potential paths for managing AI risks, including corporate governance, national regulation, and international coordination.

‍

Figure 2. Reasons for agents to cooperate with each other. Analysis of incentives to collaborate and collective action problems helps to illuminate some of the fundamental challenges of AI governance, as well as potential future challenges in a world with many highly autonomous AI systems
‍

Online course

The course will be delivered online over 9 weeks of interactive small-group discussions supported by facilitators, along with accompanying readings and lecture videos. Participants will then complete a 4-week personal project that will extend their knowledge and support their next steps, such as conducting a brief research exercise on a relevant topic.

The course is aimed at students and professionals who would like to explore the core challenges in ensuring that increasingly powerful AI systems are safe, ethical and beneficial to society. It will be particularly helpful for those looking to better understand how they could contribute via research or other careers. The course is free, designed to be accessible to a non-technical audience and can be taken alongside work or other studies, requiring around 5 hours per week. Further details are available here.

‍Apply by May 31st to join the course, or read the textbook freely online.
‍

AI Safety, Ethics, and Society

Footnotes

CAIS is an AI safety non-profit. Our mission is to reduce societal-scale risks from artificial intelligence.

AI Safety, Ethics, and Society

Footnotes

Subscribe to the AI Safety Newsletter