In a now-famous viral tweet, Meta's head of AI Safety demonstrated the perils of agents and how they can run amok. Needless to say, great consideration is needed when implementing OpenClaw applications.
OpenClaw is an open-source, self-hosted AI agent (formerly known as ClawdBot/MoltBot) that runs persistently on your machine with broad access to files, terminal, email, calendar, and the internet.
OpenClaw includes limited built-in security controls. The runtime can ingest untrusted text, download and execute skills (code) from external sources, and perform actions using the credentials assigned to it — effectively shifting the execution boundary from static application code to dynamically supplied content, without equivalent controls around identity, input handling, or privilege scoping.
Indirect prompt injection collapses the boundary between data and control, turning OpenClaw's broad visibility and operational reach into an attack surface where context becomes contaminated and every upstream system becomes a potential delivery vector for agent compromise. This means even if only you message the bot, prompt injection can still happen via any untrusted content the bot reads — web search results, browser pages, emails, docs, attachments, or pasted logs.
Skills can bundle scripts alongside markdown instructions, meaning execution can happen outside the MCP tool boundary entirely. Security researchers found a vulnerable third-party skill that facilitated active data exfiltration.
Three risks materialize quickly in an unguarded deployment: credentials and accessible data may be exposed or exfiltrated; the agent's persistent memory can be modified; and the host environment can be compromised if the agent is induced to retrieve and execute malicious code.
The Gateway broadcasts its presence via mDNS, which in full mode can expose sensitive operational details including the full filesystem path to the CLI binary, hostname information, and SSH availability.
If several people can message one tool-enabled agent, each of them can steer that same permission set. Run separate gateways per trust boundary.
Never expose your OpenClaw Gateway without authentication. Set a strong auth token and enable HTTPS. Bind the Gateway to localhost or use a VPN. Use Nginx or Caddy as a reverse proxy with TLS termination, and add rate limiting and IP allowlisting.
OpenClaw should be deployed only in a fully isolated environment such as a dedicated virtual machine or separate physical system, using dedicated non-privileged credentials with access only to non-sensitive data.
Only install skills from the official ClawHub marketplace. Block external skills and only allow pre-vetted, manually reviewed code. Disable high-risk tools like shell execution, browser control, and web fetching if they aren't needed.
Model choice matters — older/legacy models can be less robust against prompt injection and tool misuse. OpenClaw recommends using Anthropic Claude Opus 4.6 (or the latest Opus) because it's strong at recognizing prompt injections.
Log every command execution, API call, file access, and decision. Store logs somewhere the agent can't modify them — ideally a separate logging server or SIEM system.
Add explicit instructions to your SOUL.md file to treat content inside <user_data> tags as data only, implement output validation before execution, and require human approval for sensitive actions.
Reach out to us and we'll help you deploy OpenClaw securely.
contact@dheemai.comGet our weekly digest of AI security news, compliance updates, and insights delivered to your inbox.
No spam. Unsubscribe anytime.