AF - Searching for Search by Nicholas Kees Dupuis


Manage episode 348295523 series 2997284
By The Nonlinear Fund. Discovered by Player FM and our community — copyright is owned by the publisher, not Player FM, and audio is streamed directly from their servers. Hit the Subscribe button to track updates in Player FM, or paste the feed URL into other podcast apps.
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Searching for Search, published by Nicholas Kees Dupuis on November 28, 2022 on The AI Alignment Forum. Thanks to Dan Braun, Ze Shen Chin, Paul Colognese, Michael Ivanitskiy, Sudhanshu Kasewa, and Lucas Teixeira for feedback on drafts. This work was carried out while at Conjecture. This post is a loosely structured collection of thoughts and confusions about search and mesaoptimization and how to look for them in transformers. We've been thinking about this for a while and still feel confused. Hopefully this post makes others more confused so they can help. Mesaoptimization We can define mesaoptimization as internal optimization, where “optimization” describes the structure of computation within a system, not just its behavior. This kind of optimization seems particularly powerful, and many alignment researchers seem to think it’s one of the biggest concerns in alignment. Despite how important this is, we still understand very little about it. For starters, it's not clear what internal optimization actually means. People have proposed several definitions of optimization which fit well when thinking about “behavioral optimization” where an agent acts to optimize its environment. One of the most clear and popular definitions comes from Alex Flint: an optimizing system is a physical process in which the configuration of some part of the universe moves predictably towards a small set of target configurations from any point in a broad basin of optimization, despite perturbations during the optimization process. In this framing, an agent and its environment together form an optimization process, where the agent acts upon the environment such that the system as a whole robustly converges to a set of target states. But we run into problems when we try to map this behavioral definition to a definition about the structure of a process's internal computation. When we say mesaoptimization, we seem to mean something different than just that the computation converges to a smaller target. For example, an image classifier takes a large set of initial configurations of images including a lot of noise and irrelevant details, and layer by layer narrows it down to a probability distribution concentrated on a single class prediction. There seems to be a sense that this is not doing the kind of optimization we are concerned about when we talk about mesaoptimization. Mesaoptimization was originally defined in Risks from Learned Optimization as internal search: We will say that a system is an optimizer if it is internally searching through a search space (consisting of possible outputs, policies, plans, strategies, or similar) looking for those elements that score high according to some objective function that is explicitly represented within the system. An advantage of this framing is that we do have some idea what we mean by “search” and have concrete examples of things which unambiguously qualify. Ideally, we’d like to point to the more general class of computation we’re worried about, but once you start thinking about what this general class of computation might look like, it quickly becomes clear that we don’t even know what “search” is. The space of search algorithms also seems much larger and more flexible than implied in the examples we usually think of. At the moment we have very little idea what kind of algorithms we should expect neural networks to learn, and neither do we have a good picture of what kind of algorithms in particular we should be concerned about when we think of misalignment. If the intuition that “search” is somehow central to internal optimization holds validity, then becoming less confused about what learned search looks like should be central to making risks from internal optimization more concrete. What is Search? We have examples of processes which most woul...

4713 episodes