Skip to content

Web Crawler CLI Built In Javascript . Purpose is to crawl a website and show a report of internal linking structure of the website.

License

Notifications You must be signed in to change notification settings

aryan55254/web-crawler-cli

Repository files navigation

Web Crawler CLI

A command-line web crawler built in JavaScript that analyzes the internal linking structure of websites.

Features

  • Crawls websites and analyzes internal links
  • Generates a report showing the number of internal links to each page
  • Handles both absolute and relative URLs
  • Supports HTTP and HTTPS protocols
  • Progress tracking with crawl count
  • Interactive crawling with pause every 25 pages

Installation

  1. Clone the repository:
git clone https://github.com/aryan55254/web-crawler-cli.git
  1. Navigate to the project directory:
cd web-crawler-cli
  1. Install dependencies:
npm install

Usage

Run the crawler by providing a website URL as an argument:

npm start https://example.com

The crawler will:

  • Start crawling from the provided URL
  • Only crawl pages within the same domain
  • Generate a report showing internal linking structure
  • Ask for confirmation every 25 pages crawled

Example Output

=========================================
                 REPORT                  
=========================================
Found 5 links to page : example.com/path2
Found 4 links to page : example.com/path3
Found 3 links to page : example.com
Found 2 links to page : example.com/path4
Found 1 links to page : example.com/path
=========================================
              REPORT END                 
=========================================

Running Tests

The project includes unit tests for core functionality. Run tests with:

npm test

Dependencies

  • jsdom - For parsing HTML and extracting links
  • jest - For running tests

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

Web Crawler CLI Built In Javascript . Purpose is to crawl a website and show a report of internal linking structure of the website.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published