Video summary
I Let AI Analyze 5 Years of My Journals… Here's what it found
Main summary
Key takeaways
Goal / Concept
The creator wants to understand themselves better by analyzing 5 years of personal journals using an AI workflow, rather than relying on memory (which they describe as biased).
The proposed approach:
- Convert handwritten journal pages into machine-readable text
- Run an LLM Q&A workflow over the extracted content to ask reflective questions
Input Formats
- Regular handwritten journal entries (not daily; cadence varies weekly/monthly)
- Sketch journaling (drawings plus short text)
Vision-Language Model (VLM) Approach
What they plan to do with VLMs
Use a vision-language model to:
- Read handwritten text (transcribe it)
- Interpret drawings and convert sketches/diagrams into textual descriptions
Why not “older OCR”
They contrast this with OCR, arguing:
- OCR historically performs best on typed text
- Handwriting generally requires modern deep learning–based vision models / VLMs rather than classic OCR
Privacy / Deployment Strategy
- They avoid using hosted commercial APIs for private journal data (they mention providers like OpenAI and Anthropic but reject them for privacy reasons).
- Instead, they run open-source models locally on a laptop so the journal content stays on-device.
- They use Ollama to download and run models locally.
- Model size is constrained by hardware; they note that smaller variants are needed to fit memory limits on an M1 MacBook Pro.
Implementation Pipeline
- Scan journal pages into images (total scan time: ~1 hour)
- Create a local Python project with a virtual environment
- Use a structured prompt to force consistent, parseable output:
- The model must return exactly two sections:
- “Transcription”
- Copy every word exactly
- Use “illegible” for unreadable words
- “Description”
- List drawings/sketches/diagrams
- Use “none” if there are no drawings
- “Transcription”
- The model must return exactly two sections:
- Save per-image results into
extracted_content.json - Concatenate extracted results into a single large context file for Q&A (e.g.,
journal context.txt)
Model Testing & Results (Key Analysis)
Attempt 1: LLaVA (large model)
- Worked poorly due to resource constraints
- The laptop crashed repeatedly
Attempt 2: LLaVA 53 (smaller/faster model)
- Produced hallucinations
- It could “make up” content, reducing trust (e.g., “Did I write that?”)
Attempt 3: Qwen 3 VL
- Stable and accurate
- Successfully transcribed and interpreted pages
Performance notes
- Roughly 2–3 minutes per image, depending on text density
- Handwriting quality declined over the years, but the chosen model still handled it reliably
Q&A Layer (LLM Over Extracted Journal Content)
- They build a terminal app in
ask_questions.pywith a streaming UI - They use a local Llama 3.2 model (open-source from Meta) for Q&A
System prompt behavior
- Answer using only the journal content
- If the answer isn’t present, explicitly say so
Context handling
- They increase the context window to reduce truncation
- Truncation would risk incomplete answers
Examples of Output Themes
-
Motivations
- Personal growth/self-improvement/learning
- Creativity/artistic expression
- Financial independence/freedom
- Helping others and positive impact
- Relationships and meaningful connections
-
Recurring struggles
- Overthinking / analysis paralysis
- Self-doubt / negative self-talk
- Balancing work and personal life
-
Ikigai-style purpose exercise (ikigai = framework for purpose)
- The model suggests directions such as:
- Teaching/coaching/writing/content creation
- Artistic sharing (potentially later in life)
- AI/data science consulting
- The model suggests directions such as:
Overall Takeaway
The creator finds the results reassuring and “weirdly refreshing” because:
- The model reflects patterns from the journal text itself
- There’s no emotional attachment
- It reduces the effects of memory editing or external opinions
They also provide the code in a GitHub repo (link mentioned in the video description).
Main Speakers / Sources
- Main speaker: The video creator (first-person narrator), who scanned journals and built the scripts
Technical sources / models mentioned
- Ollama (local model runner)
- VLMs: LLaVA (including “LLaVA 53”), Qwen 3 VL
- LLM for Q&A: Llama 3.2 (Meta; used locally)
- Optional hosted providers mentioned: OpenAI, Anthropic