Features[]
.halguru-webscraping.yaml ➤ Pages ➤ Features
Represents a collection of features extracted or associated with a specific web page, defining key elements or properties of interest within the page.
Pages:
Features:
- Name: Any text
TagName: Any text
NameRegex: Any text
ValueRegex: Any text
NameXpath: Any text
ValueXpath: Any text
IncludeHtml: true
IncludeText: true
NormalizeWhitespaces: true
RemoveHtmlTags: true
RemoveHtmlAttributes: true
Properties
Name |
Type |
Required |
Description |
Name |
Text |
✔️ |
The name of the website feature. |
TagName |
Text |
✔️ |
The tag name used to identify or categorize the website feature. |
NameRegex |
Text |
|
The regular expression pattern to identify the name component of a website feature. |
ValueRegex |
Text |
|
The regular expression used to extract specific value matches from the HTML content of a website. |
NameXpath |
Text |
|
The XPath expression used to locate the name of a specific feature within the website content. |
ValueXpath |
Text |
|
The XPath expression used to locate and extract the value of a specific feature within a website's HTML content. |
IncludeHtml |
Boolean |
✔️ |
Determines if the raw HTML representation of a specific web feature is extracted and added to the feature's output during processing. |
IncludeText |
Boolean |
✔️ |
Controls whether the extracted plain text, processed via relevant scraping logic, is added to the resulting feature output. |
NormalizeWhitespaces |
Boolean |
✔️ |
When enabled, consecutive whitespace characters are collapsed into a single space, facilitating cleaner and more standardized output after web scraping. |
RemoveHtmlTags |
Boolean |
✔️ |
Determines if the raw HTML content will have tags stripped for plain text processing. |
RemoveHtmlAttributes |
Boolean |
✔️ |
This property is primarily used to strip unnecessary attributes from HTML elements for cleaner and more optimized data extraction. |
Property |
Value |
Path |
Pages[].Features[] |
Internal Type |
WebScrappingModels.WebScrapingFeature |
Internal Root Type |
WebScrapingConfiguration |
File Extension |
.halguru-webscraping.yaml |
JSON Schema |
halguru-webscraping-schema.json |
Last updated: | | 2025-10-13 |
Autogenerated: | | Yes |
AI powered: | | Yes |
Core version: | | 1.66.0 |