The Deployment Reality: Cloud vs. VPC vs. On-Premise Document AI

DOCUPIPE

Solutions

Resources

Pricing

The Deployment Reality: Cloud vs. VPC vs. On-Premise Document AI

Uri Merhav

Updated Mar 31st, 2026 · 12 min read

Table of Contents

The Deployment Reality: Cloud vs. VPC vs. On-Premise Document AI

The sales pitch for document AI is simple: upload documents to our API, get structured data back. What the pitch omits is where that processing actually happens, who has access during processing, and whether the deployment model survives a security audit.

For enterprise buyers in government, healthcare, and financial services, deployment architecture is not a technical detail. It is the primary evaluation criterion. A document AI solution with 99% accuracy is worthless if it cannot be deployed in a compliant environment.

This article examines why multi-tenant SaaS fails enterprise security assessments, what true air-gapped deployment requires, and how to map document types to appropriate deployment models using the DocuPipe Data Residency Decision Matrix.

For the broader context on enterprise document AI infrastructure, see the Enterprise Document AI Infrastructure hub article.

Why Multi-Tenant SaaS Fails Government and Healthcare Audits

Multi-tenant SaaS is the default deployment model for most document AI vendors. Multiple customers share the same infrastructure, the same processing queues, and often the same model instances. This architecture optimizes vendor economics at the expense of customer security.

Shared Infrastructure Creates Audit Failures

When a government agency or healthcare system conducts a vendor security assessment, they ask specific questions about data isolation:

Where are documents stored during processing?
Which other customers share that storage?
How is network traffic segmented between tenants?
Who at the vendor can access documents during support operations?

Multi-tenant vendors struggle to answer these questions satisfactorily. The honest answer is that documents from many customers flow through shared components. Logical isolation exists, but physical isolation does not.

For FedRAMP authorization, this matters enormously. The shared responsibility model requires clear boundaries between customer and vendor responsibilities. When infrastructure is shared, those boundaries blur. Auditors see risk.

Telemetry Aggregation Exposes Metadata

Even when document contents are encrypted, multi-tenant architectures leak metadata through shared telemetry. Logging systems capture processing events across all customers. Metrics dashboards aggregate performance data from all tenants. Alerting systems trigger on patterns that span customer boundaries.

This metadata exposure creates risks that encryption does not address:

Processing patterns reveal business cycles and document volumes
Error rates indicate document complexity and potential data quality issues
Schema usage exposes what types of data customers are extracting
API call patterns reveal integration architectures

Sophisticated adversaries do not need document contents. Metadata provides sufficient intelligence for competitive analysis, regulatory arbitrage, or targeted attacks.

Compliance Certification Gaps

Multi-tenant SaaS vendors typically hold baseline certifications like SOC 2 Type II and ISO 27001. These certifications validate that the vendor has security controls in place. They do not validate that those controls meet every customer's specific requirements.

Government agencies need FedRAMP authorization at Moderate or High baselines. Healthcare organizations need HIPAA Business Associate Agreements with specific technical safeguards. Financial institutions need controls aligned with FFIEC guidance and state banking regulations.

When a vendor's multi-tenant architecture cannot demonstrate the required isolation, certification gaps emerge. The vendor may be "working toward" FedRAMP, but authorization takes years. The vendor may sign a BAA, but their architecture does not support the required access controls.

These gaps force enterprise buyers to either accept risk or seek alternative solutions with compliant deployment models.

Defining True Air-Gapped AI and DoD IL6 Requirements

Air-gapping is the most misunderstood concept in secure deployment. Vendors claim air-gapped capabilities when they mean VPC isolation or private endpoints. These are not the same thing.

What Air-Gapping Actually Means

A true air-gapped system has no network path to external systems. This is physical isolation, not logical isolation. No firewall rules can create an air gap. No VPN can bridge an air gap without compromising it.

Air-gapped requirements include:

Physical network separation:

Dedicated network infrastructure with no connections to other networks
No routing tables that include external destinations
Physical verification that cables do not connect to external systems

Update mechanisms:

Software updates delivered via removable media
Cross-domain transfer procedures with content inspection
Manual verification of update integrity before installation

Operational constraints:

No wireless capabilities on any system component
No Bluetooth, WiFi, or cellular radios
USB ports disabled or physically removed except for approved transfers

Personnel requirements:

Cleared operators for classified environments
Multi-person integrity for sensitive operations
Access logging with physical verification

For document AI, air-gapping means the entire processing stack runs within the isolated environment. OCR models, extraction models, validation logic, databases, and user interfaces must all operate without any external dependencies.

DoD Impact Level Requirements

The Department of Defense defines Impact Levels (IL) that specify security requirements for cloud systems handling different data classifications.

IL4: Controlled Unclassified Information (CUI)

Deployment in FedRAMP Moderate authorized environments
Logical separation from commercial workloads
U.S. data centers with access limited to U.S. persons
Suitable for most unclassified DoD workloads

IL5: Higher Sensitivity CUI and Mission Data

Deployment in FedRAMP High authorized environments
Physically separated infrastructure from commercial cloud
National security background checks for all personnel
Dedicated government cloud regions (AWS GovCloud, Azure Government, Google Distributed Cloud)

IL6: Classified Information up to SECRET

Deployment in government-owned facilities or approved contractor facilities
Air-gapped networks with no external connectivity
Personnel with active SECRET clearances or higher
Continuous monitoring by government security operations

The path from IL4 to IL6 represents an exponential increase in complexity and cost. Organizations processing classified documents cannot use commercial cloud services, regardless of the vendor's certifications.

GovCloud Is Not Air-Gapped

A common misconception is that deploying in AWS GovCloud or Azure Government provides air-gapped security. This is incorrect.

GovCloud regions provide:

Physical separation from commercial cloud regions
Access restricted to vetted U.S. entities
Infrastructure operated by cleared U.S. persons
FedRAMP High authorization for the underlying platform

GovCloud regions do not provide:

Network isolation from the internet
Freedom from external dependencies
Air-gapped operational characteristics
Automatic IL6 authorization

GovCloud is appropriate for IL4 and IL5 workloads. IL6 workloads require true air-gapped deployment in government facilities.

The DocuPipe Data Residency Decision Matrix

Cloud vs on-premise deployment comparison

Not every document requires air-gapped processing. Over-securing low-risk documents wastes resources and creates operational friction. Under-securing high-risk documents creates compliance violations and potential breaches.

The Data Residency Decision Matrix maps document characteristics to appropriate deployment models.

Document Classification Criteria

Evaluate each document type against these criteria:

Data sensitivity:

Public: Information available to anyone
Internal: Information restricted to organization members
Confidential: Information restricted to specific roles
Classified: Information requiring government security clearances

Regulatory scope:

Unregulated: No specific compliance requirements
Industry regulated: Subject to HIPAA, GLBA, PCI-DSS, or similar
Government regulated: Subject to FedRAMP, ITAR, EAR, or similar
Classified: Subject to national security classification guidelines

Breach impact:

Low: Embarrassment but no material harm
Medium: Financial penalties or operational disruption
High: Regulatory action, litigation, or significant financial loss
Critical: National security impact or existential business risk

Deployment Model Mapping

Based on classification criteria, map documents to deployment models:

Cloud SaaS (multi-tenant):

Public data with no regulatory requirements
Internal data with low breach impact
Documents where processing speed and cost efficiency are primary concerns
Example: Marketing materials, public filings, general correspondence

Dedicated Cloud (single-tenant):

Confidential data with industry regulation
Internal data with medium breach impact
Documents requiring audit trails but not physical isolation
Example: Customer contracts, employee records, financial transactions

VPC Deployment (customer-controlled):

Confidential data with government regulation
Any data with high breach impact
Documents requiring customer-managed encryption keys
Example: Protected health information, personally identifiable information, trade secrets

On-Premise Deployment:

Classified or controlled data
Data subject to data sovereignty requirements
Documents that cannot leave organizational control under any circumstances
Example: Government contracts, defense-related technical data, classified intelligence

Air-Gapped Deployment:

Classified information at SECRET or above
Documents requiring IL6 processing
Materials subject to SCIF requirements
Example: Intelligence reports, weapons system documentation, compartmented programs

DocuPipe Deployment Options

DocuPipe supports the full spectrum of deployment models:

Cloud processing for organizations comfortable with managed infrastructure and baseline certifications.

VPC deployment for organizations requiring customer-controlled environments with private endpoints and customer-managed keys.

On-premise deployment for organizations requiring complete infrastructure ownership with no external dependencies.

Air-gapped deployment for government and defense organizations requiring true network isolation with containerized delivery of the complete processing stack.

Each deployment model provides the same extraction capabilities. The differences are in security controls, operational procedures, and infrastructure ownership.

Conclusion

Deployment architecture determines whether document AI can be used at all in enterprise environments. Accuracy benchmarks and feature comparisons are irrelevant if the solution cannot be deployed in a compliant configuration.

Organizations should evaluate document AI vendors on deployment capabilities before evaluating extraction quality. A vendor who cannot meet deployment requirements is not a viable option, regardless of other attributes.

VPC deployment provides network isolation through private endpoints and customer-managed infrastructure within a cloud environment—suitable for IL4/IL5 workloads. Air-gapped deployment requires true physical network separation with no routable path to external systems, software updates via removable media, and often cleared personnel—required for IL6 and classified information processing.

No. GovCloud provides physical separation from commercial regions and access restricted to U.S. entities, but it is not air-gapped. GovCloud systems still have internet connectivity and external dependencies. True air-gapping requires government-owned facilities with no network path to external systems.

Use the Data Residency Decision Matrix: evaluate each document type by data sensitivity (public to classified), regulatory scope (unregulated to classified), and breach impact (low to critical). Match these classifications to deployment models—cloud SaaS for low-risk documents, VPC for confidential data, on-premise or air-gapped for classified or sovereignty-constrained documents.

The Deployment Reality: Cloud vs. VPC vs. On-Premise Document AI

Uri Merhav

The Deployment Reality: Cloud vs. VPC vs. On-Premise Document AI

Why Multi-Tenant SaaS Fails Government and Healthcare Audits

Shared Infrastructure Creates Audit Failures

Telemetry Aggregation Exposes Metadata

Compliance Certification Gaps

Defining True Air-Gapped AI and DoD IL6 Requirements

What Air-Gapping Actually Means

DoD Impact Level Requirements

GovCloud Is Not Air-Gapped

The DocuPipe Data Residency Decision Matrix

Document Classification Criteria

Deployment Model Mapping

DocuPipe Deployment Options

Conclusion

Is AWS GovCloud the same as air-gapped?

How do I decide which deployment model to use?