1. Overview

GenogramAI uses AI (Google Gemini) to help users build and analyze family genograms. Before any data is sent to the AI, we apply HIPAA Safe Harbor de-identification to strip all 18 categories of protected health information (PHI). This ensures no individually identifiable health information ever leaves your device for AI processing.

This policy documents exactly what data is sent, what is redacted, and how we maintain an auditable trail of compliance.

2. De-identification Method: HIPAA Safe Harbor

We follow the Safe Harbor method defined in 45 CFR 164.514(b)(2), which requires the removal of 18 specific identifiers. Our implementation addresses every category:

Safe Harbor Identifier	Our Approach	Status
Names	Replaced with Person_1, Person_2, etc.
Geographic data (below state)	City, address, birthPlace, deathPlace stripped; only state/country sent
Dates (except year)	Birth/death month and day stripped; only year sent
Phone numbers	Not collected
Fax numbers	Not collected
Email addresses	Never sent to AI
SSN	Not collected
Medical record numbers	Not collected
Health plan beneficiary numbers	Not collected
Account numbers	Not collected
Certificate/license numbers	Not collected
Vehicle identifiers	Not collected
Device identifiers	Not collected
Web URLs	Not sent to AI
IP addresses	Not sent to AI
Biometric identifiers	Not collected
Full-face photos	Not sent to AI (image processing extracts only shapes/lines)
Any unique identifying number	Internal IDs replaced with short tokens (n1, n2, etc.)

3. What Data Is Sent to AI

Only de-identified, non-identifying attributes are transmitted to the AI model:

Sent to AI (Safe Fields)

Gender
Birth year (year only)
Death year (year only)
Living/deceased status
Index person flag
Occupation
Education level
Religion
Social class
Country code
State/province
Sexual orientation
Heritage label
Twin type
Pet species
Relationship type
Emotional connection type
Child connection type

Never Sent (Redacted)

First name
Last name
Maiden name
Middle name
Nickname
Alternative/changed name
Birth month & day
Birth place
Death month & day
Death place
Cause of death
City
Street address/location
Burial place
Notes (anonymized via token replacement)

4. Technical Implementation

4.1 Name Anonymization

Every person in the genogram is assigned a sequential anonymous identifier (Person_1, Person_2, etc.) before any data is transmitted. A mapping table is maintained only in the client's browser memory and is never persisted or transmitted. After the AI responds, anonymous identifiers are converted back to real names for display.

4.2 Date Truncation

All dates are truncated to year-only precision before transmission. Birth month, birth day, death month, and death day are stripped entirely. Only birth year and death year are sent, which is consistent with HIPAA Safe Harbor requirements (years are permitted when not combined with other identifying information).

4.3 Geographic Generalization

Geographic data is limited to state/province and country level. City names, street addresses, ZIP codes, birth places, death places, and burial places are never transmitted. Only the country code (ISO 3166-1) and state/province are sent when available.

4.4 Free-Text Anonymization

User-entered notes and free-text prompts are processed through a token-replacement system that substitutes any real names found in the text with their corresponding Person_N identifiers before transmission. This prevents accidental PHI disclosure through narrative text.

4.5 Internal ID Obfuscation

Internal database identifiers (UUIDs) are replaced with short sequential tokens (n1, n2, n3, etc.) before transmission. This prevents any possibility of cross-referencing records via ID values.

5. Audit Trail

Every AI API call generates an immutable audit log entry that records:

What function was called (e.g., streamEditGenogramWithChat, streamGenogramInsights)
What fields were sent (the safe field categories)
What fields were redacted (the PHI field categories)
De-identification confirmations: names anonymized, dates truncated, geography generalized
Request metadata: prompt character count (not content), response time, success status
Timestamp and user context

The audit log never stores the actual prompt content, AI responses, or any PHI. It only records metadata proving that de-identification was applied before each AI call.

Audit Log Schema

ai_deid_audit_logs
├── id (auto-increment)
├── created_at (timestamp)
├── user_id (UUID, nullable)
├── ai_function (text) — which AI function was called
├── model (text) — e.g., "gemini-2.5-flash"
├── ai_version (text) — e.g., "8.0"
├── node_count (int) — number of people in genogram
├── edge_count (int) — number of relationships
├── fields_sent (text[]) — safe field categories sent
├── fields_redacted (text[]) — PHI field categories redacted
├── names_anonymized (boolean) — always true
├── dates_truncated (boolean) — always true
├── geo_generalized (boolean) — always true
├── prompt_length (int) — character count only
├── response_time_ms (int)
├── was_successful (boolean)
└── user_agent (text)

6. Clinical Mode (Clinical Plan)

For users on the Clinical plan, GenogramAI offers a Clinical Mode with zero-knowledge encryption:

All genogram data is encrypted locally using AES-256 encryption with a device-bound key
The encryption key never leaves your device — true zero-knowledge architecture
Data is stored only on the user's device — never synced to cloud storage
AI features still apply the same de-identification before any AI calls
This provides defense-in-depth: even if de-identification had a gap, the data at rest is encrypted with a key only you control

7. Data Flow Summary

User enters data or sends a request

Names, dates, locations, and notes are stored locally in the browser.

De-identification applied

Names → Person_N tokens. Dates → year only. Geography → state/country only. IDs → short tokens. Notes → anonymized.

Audit log recorded

An immutable entry is logged with proof of de-identification (field lists, boolean confirmations, metadata only).

De-identified data sent to AI

Only safe fields (gender, birth year, occupation, etc.) are transmitted to Google Gemini.

AI response re-identified locally

Person_N tokens in the response are mapped back to real names in the browser only.

8. Important Disclaimer

GenogramAI applies de-identification before transmission so that the data sent to AI services does not constitute PHI under HIPAA. De-identified data is not subject to HIPAA requirements per 45 CFR 164.514(a).

GenogramAI does not currently hold a Business Associate Agreement (BAA) with Google for Gemini API services. This is not required because properly de-identified data under Safe Harbor is not PHI. However, we continuously review and improve our de-identification processes to maintain the highest standards of patient privacy.

For users handling actual patient data in clinical settings, we recommend using Clinical Mode (Clinical plan) for zero-knowledge encryption — your data stays local and the encryption key never leaves your device.

9. Contact

For questions about our de-identification practices or to report a privacy concern, contact us at support@genogramai.com.

HIPAA De-identification Policy