Privacy policy
Guardian handles sensitive immigration and tax documents — SSN, passport numbers, EINs, lease agreements, tax returns. This page describes exactly what we collect, where it lives, who else touches it, and what you can ask us to do with it.
1. What we collect
Account data: email, password hash (bcrypt), timestamps of sign-in. Mixpanel analytics event on page load (anonymized visitor ID; de-anonymized only after you sign in).
Documents you upload: PDFs, images, text files, CSVs placed in your data room. Full content is stored — we extract text server-side (or locally via the MCP server when you use Claude Code / Desktop / Codex) for classification and retrieval.
Extracted facts: doc type, dates, form numbers, counterparties, amounts. Scoped to your account.
Compliance checks: findings, deadlines, risks derived from your documents. Scoped to your account.
Gmail access (optional): if you connect Gmail through the MCP server (`gmail_*` tools), Guardian uses Google OAuth2 with the scopes gmail.readonly, gmail.compose, gmail.modify. Messages are read on demand by your local MCP process; Guardian servers never see Gmail content unless you explicitly upload it via the tools.
2. Where your data lives
Application host: Fly.io (SOC 2 Type II, 2024). Data is stored on an encrypted persistent volume in the ewr (US East) region.
Database: PostgreSQL. For early deployments this is SQLite on the Fly volume; production paths use Neon Postgres (SOC 2 Type II, AES-256 at rest, TLS in transit).
Vector index (RAG): ChromaDB, stored on the same encrypted volume. Embeddings are generated by OpenAI's embedding API and contain a compressed numeric representation of document chunks.
Local data (MCP): when you use the MCP server, document parsing runs on your device. Files you do not explicitly upload stay on your device.
3. Sub-processors
Guardian uses the following third parties to deliver the service:
- Fly.io — application hosting, storage volumes, TLS termination
- Neon — managed Postgres (production tier)
- Anthropic — Claude API, used when you ask compliance questions or generate correspondence
- OpenAI — embeddings (RAG index) and occasional reasoning / tool calls
- Google — Gmail API (only if you connect it)
- Mixpanel — product analytics (page views, feature usage). No document content is sent.
Sub-processors receive only the minimum data required for the operation you invoke. LLM prompts include document text only when you request a check or question that needs it.
4. How we use your data
We use your data only to:
- Run compliance checks and cross-references you request
- Generate forms (Form 8843, 1040-NR, FBAR, 5472, I-983) and mailing kits
- Answer your questions about your own documents
- Share a case package with a collaborator you explicitly designate (via share tokens)
- Improve the product — aggregated, non-PII metrics only
We do not sell your data. We do not use it to train third-party models. We do not read it for advertising.
5. Authentication tokens
The gdn_oc_* API tokens you generate at /connect are stored as SHA-256 hashes. Raw token values are shown to you once at creation and never persisted in plaintext. You can rotate or revoke at any time from the same page.
Data-room share tokens are short-lived, scope-limited JWTs (HMAC-SHA256). They embed only the case folder path, recipient label, and expiry (default 14 days, configurable). They do not grant write access.
6. Retention
Uploaded documents: kept until you delete them from your data room or close your account.
Analytics events: kept for 12 months then aggregated.
Backups: encrypted database backups held for 30 days then destroyed.
Deleted accounts: purged from the production database within 30 days; purged from backups on normal backup rotation.
7. Your rights
You can:
- Access any data we hold about you — email the address below
- Export everything as a zip — use the
download-allfeature on your data-room share page - Delete individual documents from the data room
- Revoke MCP / OpenClaw tokens from /connect
- Revoke outstanding share tokens (rotate the JWT secret; see the email address below)
- Delete your account — email us; full deletion within 30 days
If you are in the EU, UK, California, or a jurisdiction with specific data-subject rights (GDPR, UK-GDPR, CCPA), the above satisfies your rights of access, portability, erasure, and objection.
8. Security
In transit, Guardian uses TLS 1.3. At rest, volumes and databases are AES-256 encrypted. Access to production is limited to the project maintainer via Fly.io SSH (SSH key auth, no passwords). API authentication uses Bearer tokens hashed with SHA-256 before storage.
We do not have independent SOC 2 certification as of this date — the decision log records why and under what conditions that changes. Our infrastructure providers (Fly.io, Neon) do.
If you discover a security issue, please email us before disclosing publicly. We will respond within 48 hours and coordinate disclosure.
9. Children
Guardian is not intended for users under 13. If you believe a child has created an account, contact us and we will delete it.
10. Changes to this policy
Material changes will be announced on this page with a new effective date. You will be notified by email at least 14 days before a change that narrows your rights or expands our data use.
11. Contact
Policy questions, data subject requests, account deletion, security disclosures: fretin13@gmail.com.
Source of truth for this policy: github.com/LEE-CHENYU/compliance-os.