PDF FileLoader ignores images — how to include them in RAG with Kotaemon? #736

ThomasESEO · 2025-04-16T07:49:28Z

ThomasESEO
Apr 16, 2025

Hello,

I would like to know if it's possible to perform RAG (Retrieval-Augmented Generation) with Kotaemon on PDF files that include text, tables, and images.

I'm currently running everything locally and offline, using Ollama for both the LLM and the embedding model.

The issue I'm facing is that when I load a PDF containing images using the default FileLoader, only the text and tables are extracted—images are completely ignored.

I tried using docling /path/to/my/doc.pdf to execute it outside of Kotaemon and i get images converted to base64 converted images, that can't be seen by models though.

Do you have any solution or recommendation to include image content in the processing pipeline? I'd really appreciate any insight or direction.

Thanks in advance!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PDF FileLoader ignores images — how to include them in RAG with Kotaemon? #736

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

PDF FileLoader ignores images — how to include them in RAG with Kotaemon? #736

Uh oh!

Uh oh!

ThomasESEO Apr 16, 2025

Replies: 0 comments

ThomasESEO
Apr 16, 2025