HIPAA De-identification Policy
How GenogramAI protects patient health information when using AI features
1. Overview
GenogramAI uses AI (Google Gemini) to help users build and analyze family genograms. Before any data is sent to the AI, we apply HIPAA Safe Harbor de-identification to strip all 18 categories of protected health information (PHI). This ensures no individually identifiable health information ever leaves your device for AI processing.
This policy documents exactly what data is sent, what is redacted, and how we maintain an auditable trail of compliance.
2. De-identification Method: HIPAA Safe Harbor
We follow the Safe Harbor method defined in 45 CFR 164.514(b)(2), which requires the removal of 18 specific identifiers. Our implementation addresses every category:
| Safe Harbor Identifier | Our Approach | Status |
|---|---|---|
| Names | Replaced with Person_1, Person_2, etc. | |
| Geographic data (below state) | City, address, birthPlace, deathPlace stripped; only state/country sent | |
| Dates (except year) | Birth/death month and day stripped; only year sent | |
| Phone numbers | Not collected | |
| Fax numbers | Not collected | |
| Email addresses | Never sent to AI | |
| SSN | Not collected | |
| Medical record numbers | Not collected | |
| Health plan beneficiary numbers | Not collected | |
| Account numbers | Not collected | |
| Certificate/license numbers | Not collected | |
| Vehicle identifiers | Not collected | |
| Device identifiers | Not collected | |
| Web URLs | Not sent to AI | |
| IP addresses | Not sent to AI | |
| Biometric identifiers | Not collected | |
| Full-face photos | Not sent to AI (image processing extracts only shapes/lines) | |
| Any unique identifying number | Internal IDs replaced with short tokens (n1, n2, etc.) |
3. What Data Is Sent to AI
Only de-identified, non-identifying attributes are transmitted to the AI model:
Sent to AI (Safe Fields)
- Gender
- Birth year (year only)
- Death year (year only)
- Living/deceased status
- Index person flag
- Occupation
- Education level
- Religion
- Social class
- Country code
- State/province
- Sexual orientation
- Heritage label
- Twin type
- Pet species
- Relationship type
- Emotional connection type
- Child connection type
Never Sent (Redacted)
- First name
- Last name
- Maiden name
- Middle name
- Nickname
- Alternative/changed name
- Birth month & day
- Birth place
- Death month & day
- Death place
- Cause of death
- City
- Street address/location
- Burial place
- Notes (anonymized via token replacement)
4. Technical Implementation
4.1 Name Anonymization
Every person in the genogram is assigned a sequential anonymous identifier (Person_1, Person_2, etc.) before any data is transmitted. A mapping table is maintained only in the client's browser memory and is never persisted or transmitted. After the AI responds, anonymous identifiers are converted back to real names for display.
4.2 Date Truncation
All dates are truncated to year-only precision before transmission. Birth month, birth day, death month, and death day are stripped entirely. Only birth year and death year are sent, which is consistent with HIPAA Safe Harbor requirements (years are permitted when not combined with other identifying information).
4.3 Geographic Generalization
Geographic data is limited to state/province and country level. City names, street addresses, ZIP codes, birth places, death places, and burial places are never transmitted. Only the country code (ISO 3166-1) and state/province are sent when available.
4.4 Free-Text Anonymization
User-entered notes and free-text prompts are processed through a token-replacement system that substitutes any real names found in the text with their corresponding Person_N identifiers before transmission. This prevents accidental PHI disclosure through narrative text.
4.5 Internal ID Obfuscation
Internal database identifiers (UUIDs) are replaced with short sequential tokens (n1, n2, n3, etc.) before transmission. This prevents any possibility of cross-referencing records via ID values.
5. Audit Trail
Every AI API call generates an immutable audit log entry that records:
- What function was called (e.g., streamEditGenogramWithChat, streamGenogramInsights)
- What fields were sent (the safe field categories)
- What fields were redacted (the PHI field categories)
- De-identification confirmations: names anonymized, dates truncated, geography generalized
- Request metadata: prompt character count (not content), response time, success status
- Timestamp and user context
The audit log never stores the actual prompt content, AI responses, or any PHI. It only records metadata proving that de-identification was applied before each AI call.
Audit Log Schema
ai_deid_audit_logs ├── id (auto-increment) ├── created_at (timestamp) ├── user_id (UUID, nullable) ├── ai_function (text) — which AI function was called ├── model (text) — e.g., "gemini-2.5-flash" ├── ai_version (text) — e.g., "8.0" ├── node_count (int) — number of people in genogram ├── edge_count (int) — number of relationships ├── fields_sent (text[]) — safe field categories sent ├── fields_redacted (text[]) — PHI field categories redacted ├── names_anonymized (boolean) — always true ├── dates_truncated (boolean) — always true ├── geo_generalized (boolean) — always true ├── prompt_length (int) — character count only ├── response_time_ms (int) ├── was_successful (boolean) └── user_agent (text)
6. Clinical Mode (Unlimited Plan)
For users on the Unlimited plan, GenogramAI offers a Clinical Mode that provides an additional layer of protection:
- All genogram data is encrypted locally using AES-256 encryption
- Data is stored only on the user's device — never synced to cloud storage
- AI features still apply the same de-identification before any AI calls
- This provides defense-in-depth: even if de-identification had a gap, the data at rest is encrypted
7. Data Flow Summary
User enters data or sends a request
Names, dates, locations, and notes are stored locally in the browser.
De-identification applied
Names → Person_N tokens. Dates → year only. Geography → state/country only. IDs → short tokens. Notes → anonymized.
Audit log recorded
An immutable entry is logged with proof of de-identification (field lists, boolean confirmations, metadata only).
De-identified data sent to AI
Only safe fields (gender, birth year, occupation, etc.) are transmitted to Google Gemini.
AI response re-identified locally
Person_N tokens in the response are mapped back to real names in the browser only.
8. Important Disclaimer
GenogramAI applies de-identification before transmission so that the data sent to AI services does not constitute PHI under HIPAA. De-identified data is not subject to HIPAA requirements per 45 CFR 164.514(a).
GenogramAI does not currently hold a Business Associate Agreement (BAA) with Google for Gemini API services. This is not required because properly de-identified data under Safe Harbor is not PHI. However, we continuously review and improve our de-identification processes to maintain the highest standards of patient privacy.
For users handling actual patient data in clinical settings, we recommend using Clinical Mode (Unlimited plan) for the additional protection of local AES-256 encryption with no cloud sync.
9. Contact
For questions about our de-identification practices or to report a privacy concern, contact us at support@genogramai.com.