Skip to content

Conversation

gangadharrr
Copy link

Implemented the python logic of file encoding detection using (JsChardet) to detect all the file encodings and tested out with utf-16le encoded legacy file for raw text. This PR fixes the issue of raw text format representation in the Text loder load() medthod which creates documents in raw format.
python helpers.py
https://github.com/langchain-ai/langchain/blob/b075eab3e0af9a578af80c6e38f869419e770b5c/libs/community/langchain_community/document_loaders/helpers.py#L19
python Textloader.py
https://github.com/langchain-ai/langchain/blob/b075eab3e0af9a578af80c6e38f869419e770b5c/libs/community/langchain_community/document_loaders/text.py#L13

Fixes # (issue)

Copy link

vercel bot commented Mar 30, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
langchainjs-docs ❌ Failed (Inspect) 💬 Add feedback Mar 30, 2025 5:31am
1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
langchainjs-api-refs ⬜️ Ignored (Inspect) Mar 30, 2025 5:31am

@dosubot dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Mar 30, 2025
@dosubot dosubot bot added the auto:bug Related to a bug, vulnerability, unexpected error with an existing feature label Mar 30, 2025
@gangadharrr
Copy link
Author

Hey team, tried to fix the docs deployment with the ways I know, still not sure why docs are failing, LMK if some changes need to done from my side. Happy Learning!

@christian-bromann
Copy link
Member

@gangadharrr thanks for your contribution 🙏 our team has been focused on working on LangChain v1 and unfortunately couldn't give PRs as much attention as they need. We are in an effort to picking up where we left off. I will take this PR and discuss it with the team. Please stay tuned!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto:bug Related to a bug, vulnerability, unexpected error with an existing feature size:M This PR changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TextLoader is restricted to UTF-8 file encoding format and doesn't support dynamic encoding similar to Python Version
2 participants