Skip to content

Pages[]

.halguru-webscraping.yamlPages

Gets or sets the collection of website pages configuration.

Pages:
  - Name: Any text
    TagName: Any text
    Description: Any text
    IncludeHtml: true
    IncludeText: true
    NormalizeWhitespaces: true
    RemoveHtmlTags: true
    RemoveHtmlAttributes: true
    UrlContains: Any text
    ContainsText: Any text
    ContainsXpath: Any text
    ContentXPath: Any text
    DisabledTags: []
    Features: []
    Files: []

Properties#

Name Type Required Description
Name Text ✔️ The name of the web page or a specific element within the web page.
TagName Text ✔️ The tag name of an HTML element or feature within a website page.
Description Text Provides details or information about the web page or a specific element within the web page.
IncludeHtml Boolean ✔️ Indicates whether the HTML content of the web page should be included in the output during the web scraping process.
IncludeText Boolean ✔️ Determines whether the textual content of a web page should be included during the web scraping process.
NormalizeWhitespaces Boolean ✔️ Indicates whether whitespaces should be normalized in the text content extracted from an HTML node.
RemoveHtmlTags Boolean ✔️ Indicates whether HTML tags should be removed from the content of a web page during processing.
RemoveHtmlAttributes Boolean ✔️ Indicates whether HTML attributes should be removed during the web scraping process.
UrlContains Text Filter to process only web pages whose URL contains the specified substring.
ContainsText Text Only wepage contains text will be processed.
ContainsXpath Text If defined, ContainsText will be checked only in the xpath.
ContentXPath Text Only html tags from this xpath. If not empty.
DisabledTags List ✔️ List of HTML tags that should be disabled or ignored during processing.
Features List ✔️ Represents a collection of features extracted or associated with a specific web page, defining key elements or properties of interest within the page.
Files List ✔️ Represents a collection of files associated with the webpage for processing or extraction.

Technical Information#

Property Value
Path Pages[]
Internal Type WebScrappingModels.WebScrapingPage
Internal Root Type WebScrapingConfiguration
File Extension .halguru-webscraping.yaml
JSON Schema halguru-webscraping-schema.json

Last updated: 2025-10-13
Autogenerated: Yes
AI powered: Yes
Core version: 1.66.0