.halguru-webscraping.yaml
Represents the configuration settings for a website crawler or scraper. Defines parameters such as the name of the website, the starting URL, maximum allowed levels and pages, specific URL patterns to process, and connectors required for linking external components like LLMs and file systems.
StartUrl: https://www.example.com
MaxLevel: 5
MaxPages: 5
Pages: []
UrlsStartWith: []
Properties
Name |
Type |
Required |
Description |
StartUrl |
Url |
✔️ |
The starting URL for the website. |
MaxLevel |
Integer |
✔️ |
The maximum level allowed for processing or operations in the website. |
MaxPages |
Integer |
✔️ |
The maximum number of pages to process for the website. |
Pages |
List |
✔️ |
The collection of website pages configuration. |
UrlsStartWith |
List |
✔️ |
The collection of URL prefixes used to filter and process relevant URLs. |
Property |
Value |
Path |
.halguru-webscraping.yaml: |
Internal Root Type |
WebScrapingConfiguration |
File Extension |
.halguru-webscraping.yaml |
JSON Schema |
halguru-webscraping-schema.json |
Last updated: | | 2025-10-13 |
Autogenerated: | | Yes |
AI powered: | | Yes |
Core version: | | 1.66.0 |