10 to the 23 AI logo
Stephen Lieberman
Through 1023AI

Safety and Alignment Leadership

AI Safety for the Real World

Stephen Lieberman, AI safety and alignment leader

Stephen Lieberman

Stephen Lieberman builds the governance architectures that determine whether AI safety survives contact with operational reality.

Most organizations deploying AI at scale already have safety policies. What they lack is leadership that can hold the line between those policies and what actually happens when capable systems meet real institutional pressure. That is the work I do: identifying where alignment breaks down before it breaks down, and building the technical and organizational conditions that make safety durable.

Available for fractional, remote, and in-house executive roles through 1023AI.

Stephen Lieberman, AI safety and alignment leader

Stephen Lieberman

AI Safety and Alignment Leadership through 1023AI

20+ years leading technical and operational teams
Senior leadership on DoD/VA programs in $100B+ environments
Principal Investigator for multimillion-dollar defense programs
H-index 7 · 100+ citations · 8 highly influential (Semantic Scholar)
Executive leadership across government, academia, nonprofit, and industry

Selected organizations

U.S. Department of Veterans AffairsU.S. Department of Veterans Affairs
U.S. Department of DefenseU.S. Department of Defense
U.S. Department of StateU.S. Department of State
Federal Emergency Management AgencyFederal Emergency Management Agency
Naval Postgraduate SchoolNaval Postgraduate School
University of Southern CaliforniaUniversity of Southern California
United States ArmyUnited States Army
Defense Manpower Data CenterDefense Manpower Data Center
University of Connecticut
Northrop Grumman

The Challenge

Deploying AI at scale creates a leadership problem that evaluation frameworks cannot solve

As capable systems scale, they produce emergent capabilities that were not designed, emergent risks that were not anticipated, and interaction effects with human systems that no evaluation framework fully captures. The gap between what the model was tested on and what it actually does, in messy organizational contexts, across human-AI teams, under real-world deployment pressure, is where the most serious safety failures live.

This is why safety at scale is not just a research problem. It is a leadership problem.

Static controls, compliance checklists, and point-in-time evaluations are designed for systems with stable behavior and enumerable failure modes. They were not designed for systems whose capabilities and risks emerge at scale, shift with deployment context, or interact with human behavior in ways that are invisible during testing.

Capable AI becomes hardest to govern at exactly the point it becomes most consequential.

The robustness gap

A system can appear safe in theory and still fail under real deployment pressure. I use the term robustness gap to describe the distance between nominal safety and real-world resilience. In high-stakes AI deployment environments, safety claims are exposed to shifting incentives, changing contexts, adversarial pressure, organizational fragmentation, and downstream effects that do not appear in controlled settings.

In AI systems deployed at scale, emergent misalignment is the most significant expression of the robustness gap. Alignment that appeared solid at one capability level quietly breaks down as the system becomes more capable, through a process that standard evaluation pipelines were not designed to detect. Closing this gap requires leadership that understands how emergent properties, nonlinearity, and sociotechnical systems behave in the real world.

The Gap

Nominal safety
What evaluation environments measure
Deployment reality
Shifting incentives, adversarial pressure, context drift
Organizational complexity
Fragmentation, competing priorities, downstream effects
Real-world resilience
Safety that survives at scale

This is also where AI iatrogenics becomes dangerous. In medicine, iatrogenics refers to harm caused by the treatment itself. A narrow intervention can reduce one visible risk while creating new harms elsewhere, distorting incentives, increasing brittleness, or destabilizing the broader system.

AI Iatrogenics

When a safety intervention introduces new failure modes elsewhere in the system, reducing one visible risk while creating harms that were not present before, distorting incentives, or destabilizing the broader sociotechnical environment. In high-stakes AI deployment, iatrogenic failure is often invisible until scale amplifies it.

The Orchestration Requirement

No two organizations face the same version of this problem. The specific failure modes in your deployment environment are a function of your technical architecture, your institutional incentives, your capability trajectory, and the human systems operating around your models. Arriving with a predefined playbook is precisely the wrong response.

My role is to move quickly from diagnosis to structure, mapping where your safety posture is genuinely robust and where it is nominally compliant but operationally brittle. The interventions that follow are built for your organization's specific sociotechnical conditions, not for a generic deployment scenario. That is what it means to govern AI as an adaptive system rather than a static product.

When to bring me in

AI safety and alignment become a leadership function when:

Novel capabilities and risks are scaling faster than existing governance structures can track
Safety needs executive ownership, not just downstream review
You need a credible senior integrator across research, policy, product, legal, and operations
Your leadership needs a technically serious voice that also understands institutional dynamics
Human-AI teaming is creating accountability gaps that compliance frameworks were not built for
Your board or senior leadership needs safety framed in terms of institutional and operational risk, not just model performance
You are preparing a formal safety case or Responsible Scaling Policy that must survive board-level scrutiny. You need someone who has operated in those environments.
You are approaching a significant capability threshold and need safety leadership embedded in the decision process before that threshold is crossed, not after
You require governance structures that adapt to shifting capabilities without stalling organizational momentum

If more than one of these describes your situation, a conversation is worth having.

Start a Confidential Conversation →

Approach

How I approach AI safety and alignment

My work produces two things simultaneously: technical systems, and the organizational architectures that make those systems safe under real-world pressure. They are developed together, because the technical system shapes what the human system can do, and the human system shapes what the technical system needs to be.

I approach AI deployed at scale as a systems problem, a leadership problem, and a human problem. In practice, that means working from principles most safety reviews treat separately.

Complex Adaptive Systems

AI deployed at scale does not operate in isolation. It interacts with organizations, incentives, feedback loops, and people in ways that produce behavior no single component was designed to generate. Safety is a property of the whole sociotechnical environment, not just the model.

Epistemic Uncertainty

Leaders deploying AI at scale make consequential decisions under genuine uncertainty. At the frontier, that uncertainty is not a gap to be closed by better evaluation. It is a structural feature of the domain. My work is grounded in the discipline of reasoning clearly about what we do not yet know, and building governance structures that remain sound as the picture continues to change.

Emergent Foresight

Emergent foresight is the capacity to govern for what a system is becoming, not just what it currently is. Capability levels shift. Risk surfaces expand. The alignment properties that held at one threshold can degrade quietly at the next. Closing that gap requires anticipatory leadership, not retrospective review.

Safety at Scale

The real test is whether safety survives growth, speed, strategic pressure, and social consequence. That standard cannot be met by evaluation frameworks alone. It requires leadership that can govern the whole sociotechnical system as it scales, holding technical architecture, organizational design, human-AI teaming, and institutional governance in view simultaneously.

About

Stephen Lieberman

I lead AI safety and alignment work for organizations where capable AI is moving beyond what standard governance was built to handle. With more than 20 years leading technical and operational teams in mission-critical environments, I bring the institutional discipline and operational judgment that high-stakes AI governance demands.

The core argument: capable AI cannot be governed as if it were ordinary software. As systems scale, they move beyond what traditional analytic methods can model, producing emergent capabilities, emergent risks, and interaction effects with human systems that no evaluation framework fully captures. Closing that gap is a leadership problem before it is a technical one.

Through 1023AI, I work with organizations navigating the transition from controlled research conditions to real-world deployment at scale.

Twenty years of high-consequence work across defense, financial markets, and research is precisely the preparation this problem demands.

Mission-critical technical and operational leadership

More than 20 years leading technical and operational teams across government, defense, academia, nonprofit, and industry. Senior leadership on Department of Defense and Veterans Affairs programs within funding environments exceeding $100 billion , spanning enterprise architecture, decision-support systems, security and compliance, electronic health records, cloud systems, and data strategy. Led programs with multimillion-dollar budgets and worked directly with senior leaders across defense, government, and institutional settings.

Defense, security, and international systems

At the Naval Postgraduate School, served as a DoD civilian program leader and Principal Investigator for programs in defense technology, modeling and simulation, collaboration platforms, and decision-support systems. Work included counterterrorism, counterinsurgency, peacekeeping operations, and international collaboration across more than 100 countries.

That track record earned direct recognition from senior defense leadership.

“Steve, you and your team have performed superbly. Your collective skills, resourcefulness, and creativity have created a ground-breaking tool that will benefit the U.S. government and our allies.”

Michael G. Vickers

Undersecretary of Defense for Intelligence, U.S. Department of Defense

The Grand Challenge to Harness Technology for Social Good

Currently advancing AI safety research through the Doctor of Social Work program at the University of Southern California, supporting the Grand Challenge to Harness Technology for Social Good. The DSW is a practice-focused doctorate designed for real institutional contexts. The most significant gaps in real-world AI safety governance are not purely technical; they are organizational, institutional, and deeply human.

Human systems as core variables in AI safety

Most AI safety frameworks treat human systems as context rather than as a core variable. That framing misses something significant. Organizational dynamics, institutional incentives, and social structures determine whether safety holds or fails in deployment. Interventions that ignore these dimensions do not simply miss a variable. They can introduce failure modes that were not present in the original system and cannot be fully anticipated in advance.

Sociotechnical and human-centered disciplinary grounding

Approach draws on sociotechnical systems theory, organizational behavior, industrial psychology, human-centered design, and macro social work. These disciplines illuminate how people actually act inside institutions under real pressure, and how to intervene at the level of systems and structural conditions, which is precisely the level at which real-world AI governance must operate.

Executive leadership that is operational, not theoretical

Strategic and operational executive since 2005. President and Executive Director of a California technology nonprofit through a decade of sustained growth. CEO and C-suite roles across advisory, technology, and media. Quantitative trading in high-dimensional risk modeling, where the cost of being wrong is immediate and measurable. That environment trained a specific discipline in structuring decisions under genuine uncertainty where the feedback loop is unforgiving.

Strategic Integration

The First 90 Days

When I step into an organization as a fractional or in-house executive, the first priority is clarity, not process. Before any governance framework can hold, I need to understand the gap between your stated safety position and your operational reality.

Days 1–30

Contextual Discovery

I conduct structured conversations across the board, research, product, and operations, not to audit, but to map the lived experience of safety inside the institution. Where does the policy stop and the workaround begin? Where are the accountability gaps that no one has named? This surfaces the friction points where governance is weakest under real pressure.

Days 31–60

Sociotechnical Alignment

Working from the discovery findings, I identify the specific robustness gaps in your deployment pipeline and design interventions that address both technical safeguards and the organizational conditions that determine whether those safeguards hold. Complex adaptive systems fail at the seams between components. That is where I focus.

Days 61–90

Operational Safety Case

I deliver a prioritized safety case, a living document grounded in your actual capability thresholds, your specific risk surface, and the institutional conditions I have observed directly. This is not a compliance checklist. It is a durable foundation for governing AI as your systems continue to scale.

Start a Confidential Conversation →

Why 1023AI

The name references Avogadro's number (6.022 x 1023), the precise mathematical boundary where immense collections of microscopic interactions forge emergent macroscopic behavior. That is not a metaphor for AI. It is a description of what actually happens. Scaling does not simply improve performance. It changes what the system is, what it can do, and what it can get wrong. Beyond a certain scale, aggregate behavior changes qualitatively, demanding a different approach.

The European Commission's official Guidelines under the EU AI Act arrive at the same number, establishing 1023 floating-point operations of training compute as the precise threshold at which AI models qualify as General Purpose AI triggering mandatory regulatory oversight. That convergence is not coincidental. It marks the boundary where AI generality becomes real, emergent capabilities and emergent risks become the dominant safety challenge, and governance must cross the same threshold the model does. Safety at that scale requires leadership that understands emergence, not just evaluation. That is what my work is about.

Start a Confidential Conversation

If your organization is navigating the gap between evaluated safety and real-world resilience, or the human and institutional conditions that determine whether safety holds at scale, reach out. Every engagement is shaped by the specific organization, its specific challenges, and the specific sociotechnical system it is operating within.

Safety at scale. That is what I do.

Start a Confidential Conversation

Your message goes directly to my private inbox. I treat every conversation as confidential.