Skip to content

.halguru-webscraping.yaml

Represents the configuration settings for a website crawler or scraper. Defines parameters such as the name of the website, the starting URL, maximum allowed levels and pages, specific URL patterns to process, and connectors required for linking external components like LLMs and file systems.

StartUrl: https://www.example.com
MaxLevel: 5
MaxPages: 5
Pages: []
UrlsStartWith: []

Field List#

Name Type Required Description
StartUrl Url ✔️ The starting URL for the website.
MaxLevel Integer ✔️ The maximum level allowed for processing or operations in the website.
MaxPages Integer ✔️ The maximum number of pages to process for the website.
Pages[] ObjectList ✔️ The collection of website pages configuration.
UrlsStartWith[] SimpleList ✔️ The collection of URL prefixes used to filter and process relevant URLs.

Technical Information#

Property Value
Path .halguru-webscraping.yaml:
Internal Root Type WebScrapingHalGuru
File Extension .halguru-webscraping.yaml
JSON Schema halguru-webscraping-schema.json

.halguru-webscraping.yaml#

Reference Index#

Configuration Files#


Last updated: 2026-03-19
Autogenerated: Yes
AI powered: Yes
Core version: 1.93.0