Skip to content

.halguru-webscraping.yaml

Represents the configuration settings for a website crawler or scraper. Defines parameters such as the name of the website, the starting URL, maximum allowed levels and pages, specific URL patterns to process, and connectors required for linking external components like LLMs and file systems.

StartUrl: https://www.example.com
MaxLevel: 5
MaxPages: 5
Pages: []
UrlsStartWith: []

Properties#

Name Type Required Description
StartUrl Url ✔️ The starting URL for the website.
MaxLevel Integer ✔️ The maximum level allowed for processing or operations in the website.
MaxPages Integer ✔️ The maximum number of pages to process for the website.
Pages List ✔️ The collection of website pages configuration.
UrlsStartWith List ✔️ The collection of URL prefixes used to filter and process relevant URLs.

Technical Information#

Property Value
Path .halguru-webscraping.yaml:
Internal Root Type WebScrapingConfiguration
File Extension .halguru-webscraping.yaml
JSON Schema halguru-webscraping-schema.json

Last updated: 2025-10-13
Autogenerated: Yes
AI powered: Yes
Core version: 1.66.0