LOADING

Type to search

Shadow AI Is the New Shadow IT: How to Find LLM Services Hiding on Your Network

Cybersecurity Cybersecurity Studies & Reports

Shadow AI Is the New Shadow IT: How to Find LLM Services Hiding on Your Network

Share
Shadow AI LLM services running on enterprise networks, showing hidden AI servers and unsecured access risks

Last month, I discovered something that stopped me cold during a routine penetration test. A developer had spun up an Ollama server to experiment with local AI models. Nothing unusual about that, except the server was publicly accessible with no authentication. The models it hosted had been trained on internal company data.

This scenario plays out thousands of times daily across enterprise networks. Recent research from Cisco found over 14,000 Ollama server instances publicly accessible on the internet right now. Twenty percent of these actively host models susceptible to unauthorized access. BankInfoSecurity separately reported discovering more than 10,000 Ollama servers with no authentication layer.

The security question has fundamentally shifted. It is no longer “are we running AI?” but “where is AI running that we do not know about?”

The Rise of Shadow AI Infrastructure

Remember when shadow IT meant employees signing up for unauthorized SaaS applications? Shadow AI is that problem on steroids. Developers under pressure to deliver AI features are spinning up local LLM servers for productivity without realizing they have exposed sensitive infrastructure to anyone who knows where to look.

The challenge compounds because the AI serving ecosystem has exploded. Ollama is just one platform among dozens. There is vLLM, LiteLLM, Hugging Face Text Generation Inference, LocalAI, LM Studio, and many more. Each has different API signatures, default ports, and response patterns. Security teams trying to discover unauthorized AI deployments face a fragmented landscape with no unified detection approach.

During assessments, I have seen organizations with dozens of AI services running that security teams had no idea existed. Engineering teams deploy these tools because they genuinely help with productivity. But without proper governance, each deployment becomes a potential entry point for attackers.

What LLM Service Fingerprinting Actually Means

Before diving into solutions, let me clarify what we are actually trying to detect here. LLM service fingerprinting identifies what server software is running on a network endpoint. This is distinct from determining which AI model generated a piece of text.

Think of it as answering a cascade of questions during an assessment. First, what ports are open? Then, what service runs on this port? Next, is this HTTP service an LLM? Only after answering these questions can you move to deeper analysis like vulnerability testing or prompt injection attempts.

The challenge is that manual detection across all these platforms is painfully slow. Each requires different knowledge:

Ollama runs on port 11434 and responds to /api/tags with a JSON list of models. vLLM typically uses port 8000 with OpenAI-compatible endpoints. LiteLLM sits on port 4000 as a proxy to multiple backends. LocalAI defaults to port 8080 with its own endpoint structure.

Manually checking every possibility during an assessment burns time and risks missing services entirely.

Why Passive Detection Falls Short

You might think existing tools like Shodan handle this problem already. They help, but with significant limitations.

Shodan queries are passive, meaning they rely on database entries that lag behind real-time deployments. When a developer spins up a new Ollama instance this morning, it might not appear in Shodan for days or weeks. During an active assessment, that gap matters.

Shodan also requires a subscription for full access, creating cost barriers for some teams. More importantly, most Shodan queries focus on specific platforms. A search optimized for Ollama will miss vLLM instances entirely.

For comprehensive discovery, you need active probing that checks for multiple platforms simultaneously and returns results in seconds.

Introducing Julius: A Practical Solution

This detection gap led Praetorian to develop Julius, an open-source tool specifically designed for LLM service fingerprinting. I have been using it during assessments for the past few months, and it has fundamentally changed how quickly I can map AI infrastructure.

Julius detects 17 different AI platforms through active HTTP probing. It is written in Go and compiles to a single binary with no external dependencies, which means you can drop it onto any system and start scanning immediately.

The basic usage is straightforward. Point Julius at a target URL, and it tells you what AI service is running:

“`

julius probe https://target.example.com:11434

“`

Within seconds, you get a table showing the detected service, a confidence score, the service category, and any models deployed on that endpoint.

What makes this approach powerful is the specificity scoring system. Many LLM platforms implement OpenAI-compatible APIs, which creates detection ambiguity.

Author

Tags:

You Might also Like