PDF FileLoader ignores images — how to include them in RAG with Kotaemon? #736
Unanswered
ThomasESEO
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
I would like to know if it's possible to perform RAG (Retrieval-Augmented Generation) with Kotaemon on PDF files that include text, tables, and images.
I'm currently running everything locally and offline, using Ollama for both the LLM and the embedding model.
The issue I'm facing is that when I load a PDF containing images using the default FileLoader, only the text and tables are extracted—images are completely ignored.
I tried using docling /path/to/my/doc.pdf to execute it outside of Kotaemon and i get images converted to base64 converted images, that can't be seen by models though.
Do you have any solution or recommendation to include image content in the processing pipeline? I'd really appreciate any insight or direction.
Thanks in advance!
Beta Was this translation helpful? Give feedback.
All reactions