A command-line utility built in C++ that creates a searchable index from a collection of text files.
This project reads multiple source files, processes the text to build an in-memory inverse index, and then allows a user to interactively search for words to find a list of all the documents where they appear.
- Multi-File Indexing: Processes any number of specified text files.
- Text Normalization: Cleans words by converting them to lowercase and removing all punctuation to ensure accurate matching.
- Efficient Indexing: Uses
std::map
to build an efficient inverse index, mapping each unique word to astd::set
of the files it appears in. - Interactive Query Mode: After building the index, the program enters a loop where the user can perform multiple searches.
- STL Containers: Extensive use of
std::vector
,std::map
, andstd::set
to manage data. - File I/O: Reading and processing multiple files using
std::ifstream
. - String Manipulation: Parsing text, cleaning words, and using helper functions with
std::string
. - Modern C++: Use of range-based
for
loops, theauto
keyword, and lambda functions in helper algorithms.
- A C++ compiler that supports C++11 or newer (e.g., g++, Clang, MSVC).
-
Create Sample Files: Before running, create a few text files (e.g.,
doc1.txt
,doc2.txt
) in the same directory and add some text to them. Make sure the filenames match those in themain.cpp
file. -
Compile the Program: Open a terminal in the project directory and compile the source code:
g++ main.cpp -o indexer
-
Run the Program: Execute the compiled program:
./indexer
-
Search: The program will build its index and then prompt you to enter a word to search for. Follow the on-screen instructions to perform queries.