Pages
Pages:
-
Summary#
Represents a configuration for a web page that can be scraped. Provides various properties to define the inclusion or exclusion of HTML and text content, as well as ways to filter or process the extracted data.
Remarks#
Used in scenarios requiring web scraping, this class provides capabilities to extract specific elements or content from web pages based on defined parameters. Filtering, HTML normalization, and XPath query capabilities are supported.
Properties#
Parent models#
Summary#
- Path:
.halguru-website.yaml: Pages[]:
- Internal type:
WebsitePage
- Internal root type:
WebsiteConfiguration
- JSON Schema for YAML: https://docs.hal.guru/schemas/halguru-website-schema.json