Skip to content

Features[]

.halguru-webscraping.yamlPages[]Features[]

Represents a collection of features extracted or associated with a specific web page, defining key elements or properties of interest within the page.

Pages:
  - Features:
      - Name: Any text
        TagName: Any text
        NameRegex: Any text
        ValueRegex: Any text
        NameXpath: Any text
        ValueXpath: Any text
        IncludeHtml: true
        IncludeText: true
        NormalizeWhitespaces: true
        RemoveHtmlTags: true
        RemoveHtmlAttributes: true

Field Information#

Name Description
Title Features
Field Type ObjectList
Required True

Field List#

Name Type Required Description
Name Text ✔️ The name of the website feature.
TagName Text ✔️ The tag name used to identify or categorize the website feature.
NameRegex Text The regular expression pattern to identify the name component of a website feature.
ValueRegex Text The regular expression used to extract specific value matches from the HTML content of a website.
NameXpath Text The XPath expression used to locate the name of a specific feature within the website content.
ValueXpath Text The XPath expression used to locate and extract the value of a specific feature within a website's HTML content.
IncludeHtml Boolean ✔️ Determines if the raw HTML representation of a specific web feature is extracted and added to the feature's output during processing.
IncludeText Boolean ✔️ Controls whether the extracted plain text, processed via relevant scraping logic, is added to the resulting feature output.
NormalizeWhitespaces Boolean ✔️ When enabled, consecutive whitespace characters are collapsed into a single space, facilitating cleaner and more standardized output after web scraping.
RemoveHtmlTags Boolean ✔️ Determines if the raw HTML content will have tags stripped for plain text processing.
RemoveHtmlAttributes Boolean ✔️ This property is primarily used to strip unnecessary attributes from HTML elements for cleaner and more optimized data extraction.

Technical Information#

Property Value
Path Pages[].Features[]
Internal Type WebScrappingModels.FeatureItem
Internal Root Type WebScrapingHalGuru
File Extension .halguru-webscraping.yaml
JSON Schema halguru-webscraping-schema.json

.halguru-webscraping.yaml#

Reference Index#

Configuration Files#


Last updated: 2026-03-19
Autogenerated: Yes
AI powered: Yes
Core version: 1.93.0