Web Crawler CLI

A command-line web crawler built in JavaScript that analyzes the internal linking structure of websites.

Features

Crawls websites and analyzes internal links
Generates a report showing the number of internal links to each page
Handles both absolute and relative URLs
Supports HTTP and HTTPS protocols
Progress tracking with crawl count
Interactive crawling with pause every 25 pages

Installation

Clone the repository:

git clone https://github.com/aryan55254/web-crawler-cli.git

Navigate to the project directory:

cd web-crawler-cli

Install dependencies:

npm install

Usage

Run the crawler by providing a website URL as an argument:

npm start https://example.com

The crawler will:

Start crawling from the provided URL
Only crawl pages within the same domain
Generate a report showing internal linking structure
Ask for confirmation every 25 pages crawled

Example Output

=========================================
                 REPORT                  
=========================================
Found 5 links to page : example.com/path2
Found 4 links to page : example.com/path3
Found 3 links to page : example.com
Found 2 links to page : example.com/path4
Found 1 links to page : example.com/path
=========================================
              REPORT END                 
=========================================

Running Tests

The project includes unit tests for core functionality. Run tests with:

npm test

Dependencies

jsdom - For parsing HTML and extracting links
jest - For running tests

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.gitignore		.gitignore
.nvmrc		.nvmrc
LICENSE		LICENSE
README.md		README.md
crawl.js		crawl.js
crawl.test.js		crawl.test.js
main.js		main.js
package-lock.json		package-lock.json
package.json		package.json
report.js		report.js
report.test.js		report.test.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Web Crawler CLI

Features

Installation

Usage

Example Output

Running Tests

Dependencies

License

About

Uh oh!

Releases

Packages

Languages

License

aryan55254/web-crawler-cli

Folders and files

Latest commit

History

Repository files navigation

Web Crawler CLI

Features

Installation

Usage

Example Output

Running Tests

Dependencies

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages