A command-line web crawler built in JavaScript that analyzes the internal linking structure of websites.
- Crawls websites and analyzes internal links
- Generates a report showing the number of internal links to each page
- Handles both absolute and relative URLs
- Supports HTTP and HTTPS protocols
- Progress tracking with crawl count
- Interactive crawling with pause every 25 pages
- Clone the repository:
git clone https://github.com/aryan55254/web-crawler-cli.git
- Navigate to the project directory:
cd web-crawler-cli
- Install dependencies:
npm install
Run the crawler by providing a website URL as an argument:
npm start https://example.com
The crawler will:
- Start crawling from the provided URL
- Only crawl pages within the same domain
- Generate a report showing internal linking structure
- Ask for confirmation every 25 pages crawled
=========================================
REPORT
=========================================
Found 5 links to page : example.com/path2
Found 4 links to page : example.com/path3
Found 3 links to page : example.com
Found 2 links to page : example.com/path4
Found 1 links to page : example.com/path
=========================================
REPORT END
=========================================
The project includes unit tests for core functionality. Run tests with:
npm test
This project is licensed under the MIT License - see the LICENSE file for details.