This project provides a modular AI system that emulates a user's personality using psychometric modeling, memory streams, and narrative identity. It aims to serve as a personal assistant, memory companion, and expressive interface—all under user control.
For more detailed architecture, schema examples, ethical principles, and tutorials, visit the GitHub Wiki. If you'd prefer to ask questions or get into a philosophical argument check in with the Digital Persona Helper Bot.
See the Mission Statement for the project’s guiding principles. New features and pull requests should be checked against this statement to ensure ethical use, user control, and safe handling of personality data.
A "Digital Persona" is an AI clone that mirrors your thinking style, goals, and values. It differs from generic chatbots by learning from your actual data—emails, notes, journals—to reflect your true voice and behavior. It can:
- Simulate realistic conversations in your voice
- Recall and reason over memories
- Assist with daily tasks while reflecting your preferences
- Offer psychologically grounded reflections using MBTI, Big Five, Dark Triad, and other validated models
-
Ethical Guardrails: Guided by four principles:
- Do No Harm
- Respect User Autonomy
- Integrity and Self-Protection
- Honest Identity
-
Directory Structure:
schema/context/→ JSON-LD context extensionsschema/schemas/→ JSON Schema files for trait and test validation plus interview resultsschema/ontologies/→ Markdown definitions for traits and goalsschema/utils/→ Helper utilities for trait conversionsschema/tests/→ Unit tests for schema validationsrc/digital_persona/→ Python package containing utilitiessrc/digital_persona/interview.py→ Interview assistant that derives personality traits from unstructured user datasrc/frontend/→ Static HTML and CSS for the basic web interfacescripts/→ Helper scripts likestart-services.pyused by the devcontainerdocs/→ Research papers used as additional prompt context (available in the container, otherwise in the wiki)
-
Prompt Engineering: Prompts use structured memory, personality traits, and psychological insight to produce deeply personalized responses.
-
Semantic Standards: Includes trait schemas, tagging ontologies, and alignment to scientific models (MMPIs, BFI, etc).
-
Security & Privacy:
- Private vs. Public memory tagging
- User-controlled memory deletion and editing
- Output filters to prevent impersonation or data leaks
The project is designed for interactive local development using either OpenAI or Ollama models.
-
OpenAI Setup:
- Set
OPENAI_API_KEYin your environment. - Optionally set
OPENAI_MODEL(e.g.,gpt-4o).
- Set
-
Ollama Setup:
- Install Ollama locally.
- Run a model (e.g.,
ollama run llama3). - Set environment variables:
export OLLAMA_HOST=http://localhost:11434 export OLLAMA_MODEL=llama3
Environment variables:
OPENAI_API_KEY– API key for OpenAI models when using theopenaiprovider.OPENAI_MODEL– optional model name (e.g.,gpt-4o).LLM_PROVIDER– set toollamaoropenaito override provider auto-detection.OLLAMA_HOST– base URL of your Ollama server (defaulthttp://localhost:11434).OLLAMA_MODEL– model name served by Ollama (e.g.,llama3).PERSONA_DIR– directory where the API stores encrypted profile and memory files.PERSONA_KEY– optional symmetric key for encryption. If unset a key is created in<PERSONA_DIR>/.persona.key.PLAINTEXT_MEMORIES– set totrueto disable encryption during development.
-
Install Dependencies:
- Run
poetry install --with dev --extras mediato set up the project locally.
- Run
-
Run Dev CLI:
- Use the CLI directly or within the devcontainer:
digital-persona-interview data/my_notes.txt -p openaior-p ollama - Add
--dry-runto simulate answers from the model.
- Use the CLI directly or within the devcontainer:
-
Devcontainer Notes:
- The container automatically runs
scripts/start-services.py(viapoetry runandnohup) so the API server and ingest loop keep running in the background. - If they fail to start, run
~/.local/bin/poetry run python scripts/start-services.py >/tmp/services.log 2>&1 &. - Logs are written to
/tmp/uvicorn.log,/tmp/ingest.log, and/tmp/services.log. The ingest loop prints a message each time it processes a file so you can watch that log to confirm activity. - Copy
.devcontainer/.env.exampleto.devcontainer/.envto provide your API keys and other settings. The container loads this file automatically via a Docker--env-fileargument. - Add your markdown files to
docs/for inclusion in the runtime prompt context.
- Run the Ingest Loop:
- Execute
digital-persona-ingestto poll theinputfolder and convert new files into JSON memories. - Place any text, image, audio, or video files you want processed into
PERSONA_DIR/input(defaults to./persona/input).
- Execute
- Install optional media dependencies with
pip install -e .[media]to enable image, audio, and video processing (the devcontainer installs them automatically). - If you want local audio transcription, also install
pip install -e .[speech](orpoetry install --with speech) and setTRANSCRIBE_PROVIDER=whisper. - Ensure the
ffmpegbinary is available on your PATH for video extraction (preinstalled in the devcontainer). - After cloning the repo run
git lfs installso the sample media files are fetched correctly. - Image files are detected automatically; EXIF metadata is stored and a short caption is generated so they can be used during interviews.
- HEIC/HEIF photos are converted to JPEG for captioning. The converter uses
pillow-heifwhen available or falls back toffmpeg. - Image metadata may include GPS coordinates and the original timestamp if present in EXIF headers.
- Audio files are transcribed using the OpenAI API by default. Set
TRANSCRIBE_PROVIDER=whisperto use a local Whisper model instead. - Audio metadata captures duration, sample rate, and channel count when available.
- Video files are processed by extracting a preview frame and audio track. The frame is captioned and the audio is transcribed, summarized, and tagged with sentiment.
- Video metadata includes duration, resolution, and frame rate extracted via
ffprobe. - Captions, summaries, and sentiment default to Ollama models. Set
CAPTION_PROVIDER=openaito use OpenAI APIs instead (or rely on automatic fallback when Ollama fails). UseCAPTION_MODELto select the Ollama model, andOPENAI_MODELto choose the OpenAI model when that provider is used. - Sanitize input text to remove injection phrases and convert HTML or JSON to clean plain text before creating ActivityStreams memories.
- Non-text inputs are transcribed or captioned by the ingest loop so the interview script can reason over them.
- Files that fail to process are moved to
PERSONA_DIR/troubleshootingfor manual review.
- API Usage:
- The
/pendingand/start_interviewendpoints operate on files inPERSONA_DIR/memoryproduced by the ingest loop. - Each memory is a JSON object with a
contentfield used for interview questions. - The object also stores a relative
sourcepath to the processed original file so you can reference images or audio later. - Non-text media should be ingested first so a text summary is available.
- Completed memories are moved to
PERSONA_DIR/archiveafter/complete_interviewso they won't be processed twice.
- The
The data/ folder contains example files for testing ingestion. Binary media
files aren't stored in the repository. Generate them locally with
python scripts/generate_samples.py:
my_notes.txt– snippet of email, journal, and chat messagessample_page.html– short blog-style page describing a weekend hikesample_data.json– example daily schedule in JSON formsample_image.jpg– generated 10×10 sky-blue imagesample_audio.wav– generated one-second sine wavesample_video.mp4– generated one-second red-square video with audio (uses AAC encoding for broad compatibility)
Copy any of these files into PERSONA_DIR/input and run the ingest loop to see how different media types are processed.
The interview script analyzes text (emails, journal entries, etc.), generates reflection questions, and compiles a JSON profile.
from digital_persona.interview import PersonalityInterviewer
data = """Email: I'm looking forward to the team retreat next month.\n
Journal: I've been worried about meeting deadlines but remain optimistic."""
interviewer = PersonalityInterviewer(num_questions=3)
questions = interviewer.generate_questions(data)
print("\n".join(questions))Question sample:
Q: How do you manage approaching deadlines?
A: I set priorities and talk with the team.
Q: What steps do you take to resolve conflicts at work?
A: I try to consider everyone’s viewpoint.
Q: What hobbies help you relax?
A: Hiking and reading help me unwind.
JSON output:
{
"unstructuredData": "Email: I'm looking forward to the team retreat next month.\nJournal: I've been worried about meeting deadlines but remain optimistic.",
"userID": "anon-1234",
"interview": [
{"question": "How do you manage approaching deadlines?", "answer": "I set priorities and talk with the team."},
{"question": "What steps do you take to resolve conflicts at work?", "answer": "I try to consider everyone's viewpoint."},
{"question": "What hobbies help you relax?", "answer": "Hiking and reading help me unwind."}
],
"traits": {
"openness": 0.63,
"conscientiousness": 0.72,
"extraversion": 0.55,
"agreeableness": 0.68,
"neuroticism": 0.40,
"honestyHumility": 0.59,
"emotionality": null
},
"darkTriad": {
"narcissism": null,
"machiavellianism": null,
"psychopathy": null
},
"mbti": {"mbti": null},
"mmpi": {
"hypochondriasis": null,
"depression": null,
"hysteria": null,
"psychopathicDeviate": null,
"masculinityFemininity": null,
"paranoia": null,
"psychasthenia": null,
"schizophrenia": null,
"hypomania": null,
"socialIntroversion": null
},
"goal": {"description": null, "status": null, "targetDate": null},
"value": {"valueName": null, "importance": null},
"narrative": {
"eventRef": null,
"narrativeTheme": null,
"significance": null,
"copingStyle": null
},
"psychologicalSummary": "Assigned openness=0.63 for your interest in new ideas, conscientiousness=0.72 because you plan tasks carefully, extraversion=0.55 since you enjoy team activities, agreeableness=0.68 due to collaborative comments, and neuroticism=0.40 reflecting only mild worry",
"timestamp": "2024-05-04T15:32:10Z"
}The API stores memories and processed uploads in encrypted JSON files. When digital_persona.api starts up it calls secure_storage.get_fernet() with PERSONA_DIR as the base directory. This loads a key from the PERSONA_KEY environment variable if set, otherwise a key is created or reused in <PERSONA_DIR>/.persona.key. Reads and writes of memory entries, completed output files, and processed uploads go through save_json_encrypted() and related helpers so data remains encrypted at rest. Old plain JSON files are still read correctly.
All files under PERSONA_DIR/processed, PERSONA_DIR/output, and PERSONA_DIR/archive are encrypted with the same Fernet key. Binary images, audio, and video are stored as encrypted bytes while JSON memories use save_json_encrypted. The API decrypts memory files on demand so the interview logic can still understand them.
The research notes that structured stores work best as a canonical source of truth with a vector index built for fast semantic lookups【F:docs/Memory-Architecture-in-Digital-Clones,-Generative-Agents,-and-Personal-AIs.md†L21-L31】. The API decrypts each memory on demand using the Fernet key and can cache embeddings locally to retrieve relevant entries efficiently. Both the JSON store and any search index should remain encrypted as advised in the security guidelines【F:docs/Ensuring-Safe,-Ethical,-and-Legal-Implementation-of-the-Digital-Persona-Project.md†L8-L10】.
You can temporarily decrypt a persona for debugging inside the devcontainer:
digital-persona-decrypt decrypted/This command writes plaintext copies of your processed uploads and archived memories under decrypted/ using PERSONA_KEY (or the key saved in <PERSONA_DIR>/.persona.key). Delete the folder when done to keep your data private.
MIT — see the LICENSE file for details.
Explore research and schema docs at digital-persona Wiki. Markdown files in docs/ (available inside the container) power the site and help explain schemas, interviews, and integrations.