a local-first, undetectable AI desktop assistant that listens to meetings and on-screen content, then provides real-time answers using Google's Gemini API. Built with Python and designed to be invisible to others in video calls.
- Screen Context Capture: OCR technology reads text from your screen (presentations, code, chat, etc.)
- Audio Transcription: Real-time speech-to-text using Whisper for meeting audio
- AI-Powered Responses: Google Gemini API provides context-aware answers
- Translucent Overlay UI: Always-on-top, draggable interface that never blocks your work
- ⌨Global Hotkeys: Quick access with customizable keyboard shortcuts
- Privacy-First: All processing happens locally except AI API calls
- Invisible to Others: Never joins meetings or appears in screen shares
The assistant appears as a translucent overlay in the top-right corner of your screen, providing instant AI-powered responses based on your current context.
- Python 3.9 or higher
- Windows 10/11 (primary support)
- Google Gemini API key (free tier available)
git clone <repository-url>
cd hintly
pip install -r requirements.txt
- Option A: Install Tesseract OCR
- Windows: Download from GitHub
- Add to PATH:
C:\Program Files\Tesseract-OCR
- Option B: Use screen-ocr (automatic fallback)
- Ensure your microphone is working
- For system audio capture on Windows, enable "Stereo Mix" in sound settings
- Get a free API key from Google AI Studio
- Set the environment variable:
Windows:
set GEMINI_API_KEY=your_api_key_here
Linux/Mac:
export GEMINI_API_KEY=your_api_key_here
python cluely_assistant.py
- Toggle Overlay:
Ctrl + Alt + C
(default) - Voice Trigger:
Ctrl + Alt + V
(optional) - Drag: Click and drag the overlay to reposition
- Minimize: Click the
−
button to hide - Close: Click the
×
button to exit
- Start the assistant - It will appear in the top-right corner
- Join a meeting - The assistant listens to audio and captures screen content
- Ask questions - Type queries about the meeting, presentation, or screen content
- Get instant answers - AI provides context-aware responses
- "What's the main topic of this meeting?"
- "Summarize the key points from the presentation"
- "What code is currently on my screen?"
- "What questions should I ask about this topic?"
- "Help me understand this technical discussion"
Edit config.py
to customize:
- Hotkeys: Change keyboard shortcuts
- UI Settings: Adjust overlay size, opacity, position
- Audio Settings: Modify recording duration, sample rate
- OCR Settings: Change capture intervals, confidence thresholds
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Screen Capture│ │ Audio Capture │ │ Gemini API │
│ (OCR) │ │ (Whisper) │ │ (AI) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │ │
└───────────────────────┼───────────────────────┘
│
┌─────────────────┐
│ Main Assistant │
│ (Orchestrator)│
└─────────────────┘
│
┌─────────────────┐
│ Overlay UI │
│ (PyQt5) │
└─────────────────┘
- Local Processing: Screen capture and audio transcription happen locally
- Selective Sharing: Only your query and necessary context are sent to Gemini
- No Meeting Participation: The assistant never joins calls or appears in recordings
- Data Retention: Audio transcripts are kept only in memory and cleared regularly
-
"GEMINI_API_KEY not set"
- Ensure the environment variable is set correctly
- Restart your terminal after setting the variable
-
OCR not working
- Install Tesseract OCR or ensure screen-ocr is available
- Check that text is visible and not too small
-
Audio not capturing
- Check microphone permissions
- Enable "Stereo Mix" for system audio capture
- Ensure Whisper model downloaded successfully
-
Overlay not appearing
- Check if another application is blocking it
- Try the hotkey
Ctrl + Alt + C
- Restart the application
Check cluely_assistant.log
for detailed error information.
hintly/
├── cluely_assistant.py # Main application entry point
├── config.py # Configuration settings
├── screen_capture.py # OCR and screen capture
├── audio_capture.py # Audio recording and transcription
├── gemini_client.py # Google Gemini API integration
├── overlay_ui.py # PyQt5 overlay interface
├── hotkey_manager.py # Global hotkey handling
├── requirements.txt # Python dependencies
└── README.md # This file
- Fork the repository
- Create a feature branch
- Make your changes
- Test thoroughly
- Submit a pull request
This project is open source. Please respect the privacy and security guidelines when using or modifying this software.
- Inspired by Cluely
- Built with Google Gemini API
- Uses OpenAI Whisper for speech recognition
- PyQt5 for the user interface
For issues and questions:
- Check the troubleshooting section
- Review the logs
- Open an issue on GitHub
Note: This is a local-first implementation designed for personal use. Always respect privacy and security best practices when using AI assistants.