10 to the 23 AI logo
Stephen Lieberman
Through 1023AI

AI Safety and Alignment Leadership for the Real World

Safety That Holds
Under Real-World Pressure

I work with organizations deploying consequential AI at the moment when models stop behaving like deterministic software and emergent, real-world risks become the primary safety challenge. That is where emergence foresight becomes a leadership requirement for AI safety in the real world.

My work focuses on emergent capabilities and emergent risks, the gap between how alignment is evaluated and how it holds under real deployment pressure. I deal with the technical, the organizational, and the human conditions that determine whether AI safety survives at scale.

Available for fractional and in-house executive roles through 1023AI.

Stephen Lieberman, AI safety and alignment leader

Stephen Lieberman

20+ years leading technical and operational teams

Senior leadership on DoD and VA programs in $100B+ funding environments

Principal Investigator for multimillion-dollar defense and security programs

Extensive peer-reviewed publication and presentation history, with 8 highly-influential citations

Executive leadership across government, academia, nonprofit, and industry

Select any section to expand

Consequential AI creates a new leadership problem

Scaling does not simply improve performance. It changes the operating conditions.

As capable systems scale, they produce emergent capabilities that were not designed, emergent risks that were not anticipated, and interaction effects with human systems that no evaluation framework fully captures. The gap between what the model was tested on and what it actually does — in messy organizational contexts, across human-AI teams, under real-world deployment pressure — is where the most consequential safety failures live.

This is why safety at scale is not just a research problem. It is a leadership problem.

Static controls, compliance checklists, and point-in-time evaluations are designed for systems with stable behavior and enumerable static failure modes. They were not designed for systems whose capabilities and risks emerge at scale, shift with deployment context, or interact with human behavior in ways that are not visible during testing.

Capable AI becomes hardest to govern at exactly the point it becomes most consequential.

The robustness gap

Most organizations still confuse model evaluation with real-world safety.

A system can appear safe in theory and still fail under real deployment pressure.

I use the term robustness gap to describe the distance between nominal safety and real-world resilience.

In consequential AI environments, safety claims are exposed to shifting incentives, changing contexts, adversarial pressure, organizational fragmentation, and downstream effects that do not appear in controlled settings. What looks robust in a lab can become brittle in the field.

This is also where AI iatrogenics becomes dangerous. In medicine, iatrogenics refers to harm caused by the treatment itself. AI safety has its own version. A narrow intervention can reduce one visible risk while creating new harms elsewhere, distorting incentives, increasing brittleness, or destabilizing the broader system.

In consequential AI systems, emergent misalignment is the most significant expression of the robustness gap. Alignment that appeared solid at one capability level quietly breaks down completely as the system becomes more capable. The model was not misaligned. It became misaligned, through a process that standard evaluation pipelines were not designed to detect. This is not a theoretical concern. It is the predictable result of treating alignment as a property of a model at a point in time, rather than as a property of an emergent complex system across its full deployment lifecycle.

Closing the robustness gap requires more than compliance frameworks and evaluation pipelines. It requires leadership that understands how emergence, nonlinearity, and sociotechnical systems behave in the real world. Emergent capabilities create new risk surfaces. Misalignment can develop quietly as those capabilities grow. Leadership must address the institutional and human conditions under which consequential AI safety either holds or disappears.

The Gap

Nominal safety
What evaluation environments measure
Deployment reality
Shifting incentives, adversarial pressure, context drift
Organizational complexity
Fragmentation, competing priorities, downstream effects
Real-world resilience
Safety that survives at scale

How I approach AI safety and alignment

I approach consequential AI as a systems problem, a leadership problem, and a human problem.

Complex adaptive systems

Consequential AI systems do not operate in isolation. They interact with people, organizations, incentives, feedback loops, and other systems in ways that produce behavior no single component was designed to generate. Safety is a property of the whole sociotechnical environment, not just the model.

Emergent capabilities and risks

Both capabilities and risks emerge through scale, interaction, and deployment context, not through design alone. This includes emergent misalignment: alignment that held at one capability level degrading as the system grows more capable. Safety strategy has to be designed for a moving target.

Epistemic uncertainty

Leaders deploying consequential AI make real decisions under genuine uncertainty. That uncertainty is not a temporary gap to be closed by better evaluation. At scale, in messy organizational contexts, and across human-AI teams, uncertainty is a structural feature of the domain.

Robustness gap

The key question is not whether a system looks safe under evaluation conditions. The key question is whether safety holds under deployment pressure, across human-AI teams, inside organizational structures that were not designed with this system in mind. Emergent misalignment is the most serious version of this failure.

AI iatrogenics

Poorly designed safety interventions create new failure modes in the real-world systems they enter. A narrow technical fix can reduce one visible risk while distorting organizational incentives, increasing brittleness elsewhere, or producing harms that were invisible during evaluation. The human systems that AI enters are part of the risk surface.

Safety at scale

The real test is whether safety survives growth, speed, strategic pressure, and social consequence. That standard cannot be met by evaluation frameworks alone. It requires leadership that can govern the whole system as it scales.

Emergence foresight

Emergence foresight is the capacity to reason about what a system might become, not just what it currently is. As consequential AI systems scale, the risks that matter most are often not the ones visible during development. Governing for the capability horizon, not just the current deployment state, is what emergence foresight requires in practice.

Emergent AI safety

Safety itself can be treated as an emergent property of the broader sociotechnical system, not a fixed specification applied to the model. It has to be cultivated across technical architecture, organizational design, human-AI teaming, and institutional governance simultaneously. Emergent AI safety is the governing frame that makes that cultivation possible as a leadership discipline.

Why Stephen Lieberman

My background is not conventional. That is precisely the point.

Organizations deploying consequential AI need more than a policy specialist, more than an ethicist, and more than a narrow technical reviewer. They need leadership that can move between model behavior, executive judgment, institutional design, and real-world consequence. That is the work I have been preparing for across more than two decades.

The messy reality of human-AI teaming, the ways that real organizations adopt, resist, misuse, over-rely on, and adapt to capable AI, is not separable from the safety problem. At deployment scale, it is the safety problem.

Mission-critical technical and operational leadership

I have spent more than 20 years leading technical and operational teams across government, defense, academia, nonprofit, and industry. That includes senior leadership on Department of Defense and Department of Veterans Affairs programs within funding environments exceeding $100 billion – among the largest defense and federal health program environments in the country – spanning enterprise architecture, decision-support systems, security and compliance, electronic health records, cloud systems, and data strategy.

Defense, security, and international systems experience

At the Naval Postgraduate School, I served as a Department of Defense civilian program leader and Principal Investigator responsible for programs focused on defense and security technology, modeling and simulation, collaboration platforms, and decision-support systems. My work included counterterrorism, counterinsurgency, peacekeeping operations, and international collaboration across highly complex stakeholder environments.

Recognized leadership in high-consequence environments

Undersecretary of Defense Michael G. Vickers recognized my technical leadership with an official letter of commendation for creating a ground-breaking tool that will benefit the U.S. government and our allies as we continue to combat terror. I have led programs with multimillion-dollar budgets, helped expand R&D investment, and worked directly with senior leaders across defense, government, and institutional settings.

Deep research foundation in complex systems

My research background spans modeling and simulation, agent-based modeling, network theory, human behavior forecasting, sociotechnical systems, cognitive neuroscience, and human-computer interaction. I have published and presented dozens of scholarly works in these domains across the world, with an h-index of 7, more than 100 citations, and 8 highly influential citations (Semantic Scholar).

Human systems as core variables in AI safety

Most AI safety frameworks treat human systems as context rather than as a core variable. That framing misses something consequential. The organizational dynamics, institutional incentives, social structures, and cultural conditions surrounding consequential AI are not peripheral to the safety problem. In many real deployments, they determine whether safety holds or fails. Safety interventions that ignore these dimensions do not simply miss a variable. They create new failure modes by assuming a stability that does not exist in practice.

Sociotechnical and human-centered disciplinary grounding

My approach draws on sociotechnical systems theory, organizational behavior, industrial psychology, human-centered design, industrial engineering, and macro social work and social welfare policy. Sociotechnical systems theory holds that technical and social systems co-evolve and cannot be governed in isolation. Organizational behavior and industrial psychology illuminate how people actually act inside institutions under real pressure and competing incentives. Macro social work frames intervention at the level of systems, institutions, and structural conditions, which is precisely the level at which consequential AI governance must operate.

The Grand Challenge to Harness Technology for Social Good

I am currently advancing AI safety research through the Doctor of Social Work program at the University of Southern California, where my work supports the Grand Challenge to Harness Technology for Social Good. The DSW is a practice-focused doctorate designed to situate research inside real institutional and community contexts rather than purely theoretical frameworks, and it is an unconventional path for an AI safety leader for a deliberate reason. The most significant gaps in consequential AI governance are not purely technical. They are organizational, institutional, and deeply human. My work focuses on the intersection of AI complexity, organizational strategy, regulatory policy, and human systems.

Executive leadership that is operational, not theoretical

I have served as a strategic and operational executive across technically demanding organizations since 2005. As President and Executive Director of an innovative California technology nonprofit, I led the organization through a decade of sustained growth, directing strategy, governance, board leadership, fundraising, and operations with direct executive accountability throughout. As CEO of Youbilee, an advisory firm, and as a C-suite operational executive at a pioneering internet video company, I built and ran organizations under real constraints with real accountability. As a leader in quantitative financial analysis and trading operations, I specialized in high-dimensional risk modeling and the execution of US equity index derivatives, building and operating in an environment where the cost of being wrong was immediate, measurable, and unforgiving. That reasoning structure transfers directly to consequential AI safety: how tail risks behave in nonlinear systems, how correlated failures propagate, and how to act decisively when the full distribution of outcomes is not knowable in advance.

The background spans defense programs within $100B+ DoD and VA funding environments, counterterrorism and stability operations modeling trusted by U.S. military commanders in Iraq and Afghanistan, a global collaboration platform connecting security professionals across more than 100 countries, and eight years of consistently profitable quantitative trading including through the extreme volatility of March 2020. The track record is not theoretical.

Why 1023AI

1023AI is the advisory vehicle for my AI safety and alignment work.

The name references Avogadro's number (6.022 x 1023), the precise mathematical boundary where immense collections of microscopic interactions forge emergent macroscopic behavior. That is not a metaphor for AI. It is a description of what actually happens. Scaling does not simply improve performance. It changes what the system is, what it can do, and what it can get wrong. Beyond a certain scale, aggregate behavior changes qualitatively, demanding a different approach and requiring new ways of understanding.

The European Commission's official Guidelines under the EU AI Act arrive at the same number, establishing 1023 floating-point operations (FLOPs) of training compute as the precise threshold at which AI models qualify as General Purpose AI triggering mandatory regulatory oversight. That convergence is not coincidental. It marks the boundary where AI generality becomes real, emergent capabilities and emergent risks become the dominant safety challenge, and governance must cross the same threshold the model does. Safety at that scale requires leadership that understands emergence, not just evaluation. That is what my work is about.

When to bring me in

AI Safety and Alignment become a leadership function when

Emergent capabilities are scaling faster than governance can track

Safety needs executive ownership, not downstream review or point-in-time compliance

Research, policy, product, legal, and operations need a credible senior integrator

Leadership needs a technically serious voice who also understands institutions, human systems, and real-world consequence

Human-AI teaming is creating accountability gaps that evaluation frameworks were not designed to catch

Deployment risk extends beyond benchmark performance into emergent behavior, organizational dynamics, and systemic impact

The organization wants a safety leader who understands emergent AI and can also run the operation

If you are deploying consequential AI, safety cannot stay downstream

I am open to fractional and in-house executive leadership roles with organizations deploying consequential AI that want to treat safety, alignment, and governance as core leadership functions.

If your team is working through emergent capabilities, emergent risks, robustness gaps, or safety at scale, I welcome a confidential conversation.

Start a Confidential Conversation

Your message goes directly to my private inbox. I treat every conversation as confidential.