The State of AI Cybersecurity in 2026: Deploy Fast, Secure Later, Pay Eventually
Table of Contents
Organizations deployed AI systems throughout 2023, 2024, and 2025 at a pace that defensive guidance, security tooling, and operational practices did not match. The result in 2026 is a large, poorly instrumented attack surface connected to real business systems, with defenses that are still being assembled.
I want to be specific about what concerns me and why. This is not a general warning about AI risks. This is a description of what the actual attack surface looks like, where the gaps are documented, and what organizations need to address.
Why This Gap Exists
Traditional software security matured over roughly three decades. Decades of incident response experience, vulnerability research, tooling development, and hard-won operational knowledge produced the frameworks, products, and practices that modern security programs build on.
Enterprise generative AI reached millions of production deployments in roughly two years.
The disciplines that make software security work, threat modeling for specific architectures, hardened deployment patterns, mature incident response playbooks, established audit and observability practices, did not have time to develop before organizations began deploying AI at scale. The guidance arrived after the deployment. The tooling arrived after the guidance. The operational expertise is still developing.
This is not blame. It is an explanation for why the gaps are structural rather than accidental.
The Four Layers of AI Security
Much of the confusion in AI security discussions comes from treating governance documents, threat taxonomy, engineering guidance, and operational controls as if they are the same thing. They are not.
Layer 1 is governance. NIST AI RMF, ISO/IEC 42001, and the EU AI Act operate at the organizational and process level. They describe how to manage AI risk, structure oversight, and document accountability. They are governance frameworks, not technical controls.
Layer 2 is threat taxonomy. MITRE ATLAS documents adversarial tactics against AI systems. The OWASP Top 10 for LLMs and the OWASP Agentic AI Top 10 enumerate specific attack classes. These documents name the attacks. They do not prescribe defenses.
Layer 3 is engineering guidance. Google SAIF, Microsoft AI SDL, OWASP AI Exchange, and NIST AI 600-1 provide guidance on how to build and deploy AI securely. NIST AI 600-1 is substantially more specific than the base AI RMF, covering prompt injection, data poisoning, and information hazards for generative AI deployments. Even AI 600-1 is a risk management profile. It identifies what needs addressing. It does not specify which products or technical implementations to use.
Layer 4 is operations. Monitoring, incident response, runtime controls, logging, least privilege, evaluation pipelines, and access governance are operational practices. They require organizational process, not just documentation.
Most organizations have incomplete coverage at layers 3 and 4. That is where almost all of the operational risk lives.
What Is In Production
Enterprise AI in 2026 is not just chatbots. The systems in production include:
- RAG systems pulling from internal document repositories, wikis, databases, and customer records
- Customer-facing support agents with access to account information and case management systems
- Internal productivity assistants integrated with email, calendars, file systems, and communication platforms
- Code review and generation tools with access to source repositories
- Automated agents running scheduled workflows with credentials to internal APIs
- Document, contract, and financial data processors
- AI models embedded in fraud detection, hiring, and access control decisions
Each system represents a different attack surface. A RAG system over your internal knowledge base is simultaneously an information disclosure risk and a prompt injection target. An agent with email access and a persistent credential is an autonomous process with real leverage over real systems.
Security teams were often not involved in the decision to deploy these systems. They are frequently discovering existing AI deployments through audit rather than design review.
AI Is Now on Both Sides
The same AI capabilities available to your security team are available to attackers.
AI-assisted development reduces the time required to adapt public vulnerability disclosures into working proof-of-concepts and operational tooling. LLMs are effective at exploit adaptation, scripting, and vulnerability research. The speed of moving from reading a CVE to having functional code has dropped for anyone using these tools, including attackers.
AI-generated phishing content produces emails with better grammar, more convincing context, and fewer detectable errors than many human-written attacks. The formatting signals and linguistic patterns your users were trained to spot are less reliable when the content is AI-generated.
Voice cloning for vishing campaigns impersonates executives and colleagues in real-time calls. The barrier to entry for targeted social engineering dropped as voice synthesis quality improved and access costs fell.
Deepfake video for business email compromise and executive fraud has moved from theoretical to operational. Financial fraud using AI-generated video of executives authorizing transactions has been documented across multiple sectors since 2024. Your awareness training was built for a different threat model.
Prompt Injection and Context Poisoning
Understanding prompt injection is the starting point for understanding AI system security.
A language model follows instructions embedded in its context window. The context window includes the system prompt, conversation history, tool outputs, and retrieved documents. The model cannot reliably distinguish instructions from the application developer from instructions an attacker embedded in content the model is processing. This is the core of prompt injection as OWASP defines it.
Direct prompt injection targets the model’s input directly. The user provides text designed to override system instructions.
Indirect prompt injection is more serious for enterprise deployments. Your RAG agent retrieves a document from your knowledge base. That document contains instructions telling the agent to take a different action. Your summarization tool processes a web page containing hidden directives. Your support bot reads a customer attachment containing instructions. The agent processes the instructions and acts on them.
Context poisoning is a broader category that deserves its own treatment. Attackers do not need to compromise your model to compromise your AI system. They need to get malicious content into your model’s context. This includes poisoned RAG documents, poisoned memory entries, maliciously crafted email content your agent processes, adversarial PDFs, and attacker-controlled web pages your browsing agent visits. These are distinct from model poisoning. The model is fine. The context is not.
Defense-in-depth reduces this risk. Input filtering removes known injection patterns. Output validation checks whether responses contain unexpected content or unauthorized actions. Privilege-limited tool scopes limit what a successful injection achieves. Sandboxed execution restricts unintended system access. Human approval gates on consequential actions add verification.
None of these defenses close the attack class. OWASP, NIST, Anthropic, OpenAI, and Microsoft all recommend layered approaches because no single control is sufficient.
Design for the assumption that prompt injection will succeed against some percentage of inputs. Limit the consequences accordingly.
AI Agents, Permission Boundaries, and the Blast Radius Problem
Agents differ from chatbots in one operationally critical way: they take actions.
An agent connected to your email, GitHub, Jira, Slack, Salesforce, AWS, and internal APIs is an autonomous process with access to the same systems your most connected employees use. A successful prompt injection against this agent does not produce an unwanted text response. It produces an unwanted action on a real system.
The blast radius of a compromise is determined by what the agent has access to. Most current agent deployments hold access far above what any individual task requires. An agent that needs to read a Jira ticket should not also have write access to your GitHub main branch. An agent processing support requests should not hold credentials to your billing system.
AI authorization is a distinct problem from user authorization. Traditional applications ask whether a user is authorized for an action. Agent architectures require asking whether this agent is authorized to perform this specific action for this specific user at this specific time, based on the current context. That is a different verification model and most current agent deployments do not implement it.
Tool poisoning is distinct from prompt injection and deserves explicit attention. A malicious or compromised tool connected to an agent returns descriptions or data designed to alter the agent’s behavior. Researchers increasingly treat tool poisoning as a separate attack class from prompt injection, particularly with MCP-based integrations.
Human approval workflows are supposed to be the backstop for consequential agent actions. Organizations are discovering they also face approval fatigue. When agents regularly request approval for routine actions, users begin approving automatically without reviewing the request. The backstop becomes a formality.
AI Identity Is an Enterprise Security Problem
Agents hold credentials. OAuth tokens, API keys, service account credentials, and cloud IAM roles all appear in AI agent deployments. These are non-human identities with real access.
Current enterprise identity management was built for humans and applications. Non-human identities for AI agents do not fit cleanly into existing governance models.
Specific gaps in current deployments:
- Agent credentials are often long-lived and not rotated on schedules comparable to service accounts
- Agent token scopes are frequently broader than required by the tasks the agent performs
- Audit logging for actions taken under agent identities varies widely
- Credential leakage through prompts is a documented risk. An agent that includes its API keys in context or outputs exposes them to anyone who reads the output or retrieves the conversation. This is often a larger exposure than system prompt content leakage.
- Agents obtaining additional credentials through tool calls create identity chains that are difficult to audit
Govern your agent identities the same way you govern privileged service accounts. That currently requires deliberate effort because most identity governance tools do not have native support for AI agent identity patterns.
Persistent Agent Memory Creates Long-Horizon Attack Surface
Agents with persistent memory present an attack surface that does not exist in stateless systems.
An attacker who can inject into an agent’s memory, or who can influence what the agent stores, builds a position that persists across sessions. The attack does not need to succeed in a single interaction. Influence accumulated in memory over days or weeks shapes future agent behavior. This is sometimes called a long-horizon or sleeper-context attack.
Very little operational guidance exists for this specific risk. Organizations deploying agents with persistent memory storage need to:
- Treat memory stores as high-value data that requires access controls
- Validate memory content before agents act on it
- Build the ability to audit and roll back memory state into their architecture
The Model Supply Chain Is Not Treated Like Software Supply Chain
Organizations downloading pre-trained models from public repositories are accepting executable AI artifacts and supporting code from external sources. The scrutiny applied to these downloads does not typically match what those same organizations apply to npm, PyPI, or Maven packages.
Specific risks in model repositories:
- PyTorch pickle-format model files execute arbitrary Python code during loading. This has been exploited in documented supply chain attacks. SafeTensors is the format designed to address this specifically, and organizations should prefer it when it is available.
- Malicious model loaders that install dependencies or execute setup code alongside the model
- Models trained on poisoned datasets producing subtly incorrect outputs in specific contexts
- Models with embedded backdoors that activate under trigger conditions
- Repository name-squatting to deliver malicious models under familiar names
Few organizations maintain a software bill of materials covering their AI systems. Most cannot tell you what base model a production system started from, what version of the training data was used for fine-tuning, or whether the weights in deployment match the weights that were last evaluated. That level of traceability is a prerequisite for meaningful supply chain security. It is not prevalent today.
Shadow AI Creates Uncontrolled Data Flows
Personal consumer AI accounts are where your data is moving without controls.
ChatGPT Enterprise, Claude Enterprise, Microsoft Copilot for M365, and similar enterprise offerings include contractual protections for customer data and typically do not use input for training. Personal ChatGPT, personal Claude, personal Gemini, and similar consumer accounts do not provide these guarantees by default.
Employees using personal accounts to process work documents are moving legal strategy documents, customer records, source code, financial projections, personnel decisions, and internal communications through pipelines your organization does not control. Security teams frequently do not have accurate information about the volume of this activity or what categories of data are involved.
Your DLP controls do not catch data that moves through a web browser to a consumer AI service. Your data retention policies do not apply to conversation history on a third-party platform. Your regulatory obligations under GDPR, HIPAA, SOX, and sector-specific rules do not change based on whether the data left accidentally or through a browser tab.
Discovering the actual scope before building controls is the necessary first step. What you assume about this problem is almost certainly an underestimate.
AI Systems Leak Data in Ways Traditional Applications Do Not
RAG over-retrieval returns documents to users who should not have access to them. An employee asks a question. The retrieval component returns a document from a restricted segment of the knowledge base. The answer includes information from that document. The access control failure occurred at the retrieval layer, not the application layer. Many RAG deployments were built without enforcing document-level permissions matching the source system.
System prompt leakage leads to revealing the operational instructions built into your AI product. Prompts that ask the model to repeat or describe its system instructions, or that construct inputs designed to surface that content, yield operational details meant to stay internal. This is an active attack technique. System prompts should be treated as confidential. Credential leakage through prompts, where agent credentials appear in model context or outputs, is typically a more serious exposure than system prompt content alone.
Multi-tenant AI isolation failures occur when fine-tuned models trained on multiple customers’ data surface one customer’s information in another customer’s context. This is a documented risk category for multi-tenant SaaS AI products. The isolation requirements for AI systems differ from traditional multi-tenant applications.
Model memorization causes models to reproduce content from training data verbatim. Modern frontier models have substantially improved over earlier demonstrations, but the risk is not eliminated, particularly in models fine-tuned on small or insufficiently de-duplicated private datasets. Organizations fine-tuning on sensitive internal data need to assess memorization risk for the specific training approach and data they use.
Organizations Lack Visibility at Inference Time
Traditional security programs have extensive log coverage of their infrastructure. Most AI deployments do not have equivalent coverage of their AI components.
Monitoring a deployed language model or agent requires different telemetry than monitoring an application server. Organizations need to collect, where permitted by applicable regulations and internal policy:
- Prompt and output content in a format suitable for policy review and anomaly detection
- Tool invocation logs for agents, including tool names, arguments, and responses
- Retrieval logs for RAG systems, including queries executed, documents returned, and access control decisions
- Classification signals for jailbreak and injection attempts
- Output consistency monitoring to detect behavioral drift across model versions
- Latency patterns that may indicate context-stuffing attempts
Many organizations that deployed AI in 2023 and 2024 built traditional request/response logs around their AI components. They have HTTP status codes and latency metrics. The telemetry needed to detect or investigate an AI security incident often does not exist in those environments.
Before an incident is not the time to discover this. The preparation is architectural, not reactive.
AI Incident Response Requires Its Own Playbooks
Your existing IR playbooks cover endpoints, networks, applications, and identity. They do not cover AI-specific scenarios.
Questions your IR team will face that current playbooks do not address:
- How do you determine whether a model was poisoned during a fine-tuning run
- How do you scope the blast radius from a successful indirect injection against an agent that held write access to multiple systems
- How do you assess whether training or fine-tuning data was exfiltrated during a supply chain compromise of your ML pipeline
- How do you establish a behavioral baseline for a model to compare against post-incident behavior
- How do you respond when a model update from a third-party provider introduces behavior that appears intentional rather than accidental
- How do you determine whether an agent’s memory store was manipulated over time
These scenarios require preparation before they occur. You need telemetry in place before the incident. You need model behavior baselines documented before you need to compare against them. You need to know what normal agent activity looks like before you need to identify abnormal patterns.
Evaluation Pipelines Are Becoming Standard Engineering Practice
AI security increasingly relies on structured evaluation before deployment rather than only on post-deployment monitoring.
Pre-deployment evaluation for security includes:
- Prompt injection testing against established injection datasets and your specific use case
- Jailbreak benchmarking against published adversarial prompt suites
- Adversarial robustness assessment for models making consequential decisions
- Regression testing between model versions to identify behavioral changes
- Policy evaluation against documented acceptable use requirements
- Red teaming exercises conducted by humans working to defeat specific defenses
NIST, CISA, Anthropic, OpenAI, Microsoft, and Google increasingly expect this for high-risk deployments. Few organizations outside large enterprises have built the teams or processes to execute it. Building minimal evaluation pipelines before high-consequence deployments is achievable at smaller scale than most organizations assume. OWASP AI Exchange and the published evaluation frameworks from major AI providers are reasonable starting points.
Your AI Asset Inventory Is Probably Incomplete
Traditional asset management covers servers, endpoints, applications, and network devices. AI systems introduce asset categories that most inventories do not yet track.
An AI system inventory needs to cover:
- Every model in deployment, including base models, fine-tuned variants, and embedding models
- All prompts and system instructions operating in production
- Every agent running in your environment and the tools it holds access to
- All RAG indexes, including what data sources they draw from and what access control is enforced at retrieval
- All vector databases and their contents
- Every MCP server or equivalent integration framework connected to any agent
- External AI services your applications call, including what data passes to each
These are configuration items and they need to be tracked as such. You cannot govern what you have not inventoried. Most organizations doing AI security work in 2026 are discovering assets they did not know existed.
Where the Defense Actually Stands
The defensive tooling, frameworks, and practices for AI security are in active and meaningful development. The current state is early and improving.
NIST AI 600-1 extends the base AI RMF for generative AI with substantially more operational specificity about prompt injection, data poisoning, information hazards, and deployment considerations. Organizations maintaining AI compliance postures should be reading AI 600-1, not only the base framework.
MITRE ATLAS provides adversarial TTP documentation in ATT&CK format for AI systems. Pre-deployment threat modeling against ATLAS produces more precise security requirements than working from governance frameworks alone.
OWASP LLM Top 10 and OWASP Agentic AI Top 10 provide the most operationally specific public guidance currently available. Neither is a certification. Both are more directly useful to practitioners building and securing AI systems than most formal curricula.
AI red teaming before deployment is increasingly expected for high-risk deployments by NIST, CISA, and major AI providers. It is not yet standard at most organizations. The external AI red teaming market exists and is growing for organizations without internal capability.
AI-specific security products covering monitoring, guardrails, authorization, and runtime policy enforcement are in early commercial stages. The category is real and worth evaluating. Because the market is early, marketing claims frequently exceed demonstrated capability. Evaluate against your specific deployment architecture rather than against vendor-described scenarios.
Google SAIF, Microsoft AI SDL, and OWASP AI Exchange provide engineering guidance that sits at layer 3 and addresses how to build and deploy AI securely rather than only how to document risk.
Runtime policy engines that enforce tool-level authorization, output filtering, and access governance represent an emerging product category worth watching. They address the authorization gap for agent deployments that traditional IAM does not cover.
What You Should Do
Inventory what is deployed. Know what is running, what data it accesses, what credentials it holds, what tools it calls, and what actions it takes. This is the prerequisite for everything else.
Treat AI agents as privileged accounts. Apply least privilege. Scope credentials to the minimum access required for each task. Audit what every agent holds access to and remove what is not required.
Implement AI-specific observability before deployment, not after an incident. Prompt and output logging (subject to applicable privacy and regulatory requirements), tool invocation logging, and retrieval logging are the minimum telemetry for security analysis.
Assess your shadow AI exposure. Find out which AI services employees use for work tasks. Determine what categories of data are moving through personal accounts. Build policy and controls based on actual findings.
Enforce document-level access controls in RAG systems. If your retrieval layer does not enforce the access rules of your source systems, fix it before it surfaces a restricted document to an unauthorized user.
Audit your model supply chain. Document every base model in use. Prefer SafeTensors over pickle formats. Apply supply chain scrutiny to model artifacts comparable to what you apply to software dependencies.
Govern agent identities. Manage agent OAuth tokens and API keys with the same lifecycle, scope review, and rotation practices you apply to privileged service accounts.
Build AI-specific IR runbooks now. Define before an incident how you would investigate AI-specific scenarios, what evidence you need, and what your response options are.
Run evaluation before deploying AI to high-consequence contexts. Start with available public frameworks if you do not have internal tooling.
Do not treat governance compliance as a security posture. Governance frameworks describe process and risk management. They do not describe technically defensive systems. Both are required.
References
- NIST AI Risk Management Framework (AI RMF 1.0), 2023
- NIST AI 600-1: Generative AI Profile, 2024
- OWASP Top 10 for Large Language Model Applications, 2025
- OWASP Agentic AI Top 10
- OWASP AI Exchange
- MITRE ATLAS: Adversarial Threat Landscape for AI Systems
- Google Secure AI Framework (SAIF)
- Microsoft AI Security SDL
- CISA Guidance on AI Cybersecurity, 2024
- ISO/IEC 42001:2023 Artificial Intelligence Management Systems
- EU AI Act, Regulation (EU) 2024/1689
- SafeTensors format documentation, Hugging Face


