Quick Information

0 ratings

codeWonderland

Godot DOM Parser

For parsing the DOM / HTML of webpages for use in your games / applications

Supported Engine Version

4.6

Version String

1.0.0

License Version

MIT

Support Level

community

Modified Date

11 hours ago

Git URL

Link to Git URL

Issue URL

Link to Issues URL

GodotDOMParser

Fetch a URL, parse its HTML, and query the DOM with CSS-like selectors — all in pure GDScript. No native dependencies, works on every platform Godot supports.

Engine: Godot 4.2+
License: MIT
Status: 0.1.0 — usable, forgiving HTML parser, subset of CSS selectors.

Install

Copy the addons/godot_dom_parser/ folder into your project's addons/ directory. (Or install via the AssetLib tab in the editor.)
Open Project → Project Settings → Plugins and enable GodotDOMParser.

All public classes register their class_name globally, so you can use DOMParser, DOMDocument, DOMNode, HTMLParser, and CSSSelector from anywhere without preload.

Quick start

extends Node

func _ready() -> void:
    var parser := DOMParser.new()
    add_child(parser)

    var doc: DOMDocument = await parser.fetch("https://example.com")
    if doc == null:
        push_error("fetch failed")
        return

    print("Title: ", doc.get_title())

    for link in doc.query_selector_all("a[href]"):
        print(link.get_attribute("href"), " -> ", link.get_text_content())

Parsing a raw HTML string

var html := "<html><body><p class='hi'>hello <b>world</b></p></body></html>"
var doc := DOMParser.parse_html(html)
print(doc.query_selector("p.hi").get_text_content())  # "hello world"

API

`DOMParser` (Node)

Member	Description
`fetch(url: String) -> DOMDocument`	Awaitable. GETs the URL and returns a parsed document, or `null` on error.
`static parse_html(html: String) -> DOMDocument`	Parse an HTML string directly.
`user_agent: String`	UA string sent with requests.
`extra_headers: PackedStringArray`	Extra request headers, `"Name: value"` format.
`timeout_seconds: float`	Request timeout.
`max_redirects: int`	Redirects to follow.
signal `document_loaded(document)`	Emitted after a successful fetch.
signal `fetch_failed(error, response_code)`	Emitted on network or HTTP error.

`DOMDocument` (extends `DOMNode`)

Member	Description
`source_url: String`	URL this document was fetched from (if any).
`raw_html: String`	The original HTML text.
`get_document_element()`	The `<html>` element (or first element child).
`get_head()` / `get_body()`	Convenience accessors.
`get_title() -> String`	Text of the `<title>` element.

`DOMNode`

Member	Description
`tag_name: String`	Lowercase tag (e.g. `"div"`). Empty for text/comment.
`attributes: Dictionary`	Attribute map (keys lowercased).
`children: Array[DOMNode]`	Child nodes.
`parent: DOMNode`	Parent (may be `null`).
`text: String`	Text content for text/comment nodes.
`is_element()` / `is_text()` / `is_void()`	Type predicates.
`get_attribute(name, default="")`	Read attribute.
`has_attribute(name)` / `set_attribute(name, value)` / `remove_attribute(name)`	Attribute CRUD.
`get_id()` / `get_classes()` / `has_class(cls)`	Shortcuts.
`get_text_content()`	Concatenated text of this node and descendants.
`get_inner_html()` / `get_outer_html()`	Serialize back to HTML.
`append_child(n)` / `remove_child(n)` / `remove()`	Tree mutation.
`get_element_by_id(id)`	First descendant element with that `id`.
`get_elements_by_tag_name(tag)`	All descendant elements with that tag (`"*"` for all).
`get_elements_by_class_name(cls)`	All descendant elements with that class.
`query_selector(sel)`	First descendant matching the selector.
`query_selector_all(sel)`	All descendants matching the selector.
`matches(sel)`	Does this node match the selector?
`walk()` / `walk_elements()`	Pre-order traversal helpers.

Supported CSS selectors

Type / universal: div, *
ID: #main
Class: .title, .a.b (multiple)
Attribute:
- [disabled] — present
- [type="text"] — exact
- [class~="hero"] — whitespace-separated word
- [href^="https"] — prefix
- [href$=".pdf"] — suffix
- [href*="foo"] — substring
- [lang|="en"] — exact or "en-" prefix
Combinators: descendant (space), child (>), adjacent sibling (+), general sibling (~)
Selector lists: a, b, c
Pseudo-classes: :first-child, :last-child, :only-child, :first-of-type, :last-of-type, :not(<simple>)

Examples:

doc.query_selector_all("article.post > h2 a[href^='https']")
doc.query_selector_all("ul.nav li:first-child")
doc.query_selector_all("p:not(.muted)")

Interacting with the DOM

The tree is fully mutable. Changes are reflected by get_outer_html().

var body := doc.get_body()
var new_p := DOMNode.create_element("p")
new_p.set_attribute("class", "added")
new_p.append_child(DOMNode.create_text("injected from Godot"))
body.append_child(new_p)

for node in doc.query_selector_all(".advert"):
    node.remove()

print(doc.get_outer_html())

Limitations

Not a spec-compliant HTML5 parser. It's forgiving enough for typical pages (void elements, unquoted attributes, implicit <p>/<li> closing, raw-text for <script>/<style>), but edge cases in table foster-parenting, <template>, and malformed markup are handled heuristically.
Entity decoding covers the numeric (&#...;, &#x...;) forms plus a small named-entity table. Uncommon named entities pass through as-is.
Selectors do not (yet) support :nth-child(...), namespaces, or case-sensitive attribute matching ([attr=val i]).
JavaScript is not executed. If a page renders its content client-side, you'll only see the initial HTML.

Contributing

Bug reports and PRs welcome. If you hit HTML that parses incorrectly, a minimal reproducing snippet is the most useful thing you can send.

For parsing the DOM / HTML of webpages for use in your games / applications

Reviews

0 ratings

Godot DOM Parser

Quick Information

GodotDOMParser

Install

Quick start

Parsing a raw HTML string

API

`DOMParser` (Node)

`DOMDocument` (extends `DOMNode`)

`DOMNode`

Supported CSS selectors

Interacting with the DOM

Limitations

Contributing

Reviews

Your Rating

Quick Information

Open Source

Plug and Play

Community Driven

Godot DOM Parser

Install Asset

Install via Godot

Install Manually

Report comment

Delete Comment

Quick Information

Asset Description

GodotDOMParser

Install

Quick start

Parsing a raw HTML string

API

DOMParser (Node)

DOMDocument (extends DOMNode)

DOMNode

Supported CSS selectors

Interacting with the DOM

Limitations

Contributing

Reviews

Your Rating

Quick Information

Open Source

Plug and Play

Community Driven

`DOMParser` (Node)

`DOMDocument` (extends `DOMNode`)

`DOMNode`