Install Asset
Install via Godot
To maintain one source of truth, Godot Asset Library is just a mirror of the old asset library so you can download directly on Godot via the integrated asset library browser
Quick Information
For parsing the DOM / HTML of webpages for use in your games / applications
GodotDOMParser
Fetch a URL, parse its HTML, and query the DOM with CSS-like selectors β all in pure GDScript. No native dependencies, works on every platform Godot supports.
- Engine: Godot 4.2+
- License: MIT
- Status: 0.1.0 β usable, forgiving HTML parser, subset of CSS selectors.
Install
- Copy the
addons/godot_dom_parser/folder into your project'saddons/directory. (Or install via the AssetLib tab in the editor.) - Open Project β Project Settings β Plugins and enable GodotDOMParser.
All public classes register their class_name globally, so you can use
DOMParser, DOMDocument, DOMNode, HTMLParser, and CSSSelector from
anywhere without preload.
Quick start
extends Node
func _ready() -> void:
var parser := DOMParser.new()
add_child(parser)
var doc: DOMDocument = await parser.fetch("https://example.com")
if doc == null:
push_error("fetch failed")
return
print("Title: ", doc.get_title())
for link in doc.query_selector_all("a[href]"):
print(link.get_attribute("href"), " -> ", link.get_text_content())
Parsing a raw HTML string
var html := "<html><body><p class='hi'>hello <b>world</b></p></body></html>"
var doc := DOMParser.parse_html(html)
print(doc.query_selector("p.hi").get_text_content()) # "hello world"
API
DOMParser (Node)
| Member | Description |
|---|---|
fetch(url: String) -> DOMDocument |
Awaitable. GETs the URL and returns a parsed document, or null on error. |
static parse_html(html: String) -> DOMDocument |
Parse an HTML string directly. |
user_agent: String |
UA string sent with requests. |
extra_headers: PackedStringArray |
Extra request headers, "Name: value" format. |
timeout_seconds: float |
Request timeout. |
max_redirects: int |
Redirects to follow. |
signal document_loaded(document) |
Emitted after a successful fetch. |
signal fetch_failed(error, response_code) |
Emitted on network or HTTP error. |
DOMDocument (extends DOMNode)
| Member | Description |
|---|---|
source_url: String |
URL this document was fetched from (if any). |
raw_html: String |
The original HTML text. |
get_document_element() |
The <html> element (or first element child). |
get_head() / get_body() |
Convenience accessors. |
get_title() -> String |
Text of the <title> element. |
DOMNode
| Member | Description |
|---|---|
tag_name: String |
Lowercase tag (e.g. "div"). Empty for text/comment. |
attributes: Dictionary |
Attribute map (keys lowercased). |
children: Array[DOMNode] |
Child nodes. |
parent: DOMNode |
Parent (may be null). |
text: String |
Text content for text/comment nodes. |
is_element() / is_text() / is_void() |
Type predicates. |
get_attribute(name, default="") |
Read attribute. |
has_attribute(name) / set_attribute(name, value) / remove_attribute(name) |
Attribute CRUD. |
get_id() / get_classes() / has_class(cls) |
Shortcuts. |
get_text_content() |
Concatenated text of this node and descendants. |
get_inner_html() / get_outer_html() |
Serialize back to HTML. |
append_child(n) / remove_child(n) / remove() |
Tree mutation. |
get_element_by_id(id) |
First descendant element with that id. |
get_elements_by_tag_name(tag) |
All descendant elements with that tag ("*" for all). |
get_elements_by_class_name(cls) |
All descendant elements with that class. |
query_selector(sel) |
First descendant matching the selector. |
query_selector_all(sel) |
All descendants matching the selector. |
matches(sel) |
Does this node match the selector? |
walk() / walk_elements() |
Pre-order traversal helpers. |
Supported CSS selectors
- Type / universal:
div,* - ID:
#main - Class:
.title,.a.b(multiple) - Attribute:
[disabled]β present[type="text"]β exact[class~="hero"]β whitespace-separated word[href^="https"]β prefix[href$=".pdf"]β suffix[href*="foo"]β substring[lang|="en"]β exact or"en-"prefix
- Combinators: descendant (space), child (
>), adjacent sibling (+), general sibling (~) - Selector lists:
a, b, c - Pseudo-classes:
:first-child,:last-child,:only-child,:first-of-type,:last-of-type,:not(<simple>)
Examples:
doc.query_selector_all("article.post > h2 a[href^='https']")
doc.query_selector_all("ul.nav li:first-child")
doc.query_selector_all("p:not(.muted)")
Interacting with the DOM
The tree is fully mutable. Changes are reflected by get_outer_html().
var body := doc.get_body()
var new_p := DOMNode.create_element("p")
new_p.set_attribute("class", "added")
new_p.append_child(DOMNode.create_text("injected from Godot"))
body.append_child(new_p)
for node in doc.query_selector_all(".advert"):
node.remove()
print(doc.get_outer_html())
Limitations
- Not a spec-compliant HTML5 parser. It's forgiving enough for typical pages
(void elements, unquoted attributes, implicit
<p>/<li>closing, raw-text for<script>/<style>), but edge cases in table foster-parenting,<template>, and malformed markup are handled heuristically. - Entity decoding covers the numeric (
&#...;,&#x...;) forms plus a small named-entity table. Uncommon named entities pass through as-is. - Selectors do not (yet) support
:nth-child(...), namespaces, or case-sensitive attribute matching ([attr=val i]). - JavaScript is not executed. If a page renders its content client-side, you'll only see the initial HTML.
Contributing
Bug reports and PRs welcome. If you hit HTML that parses incorrectly, a minimal reproducing snippet is the most useful thing you can send.
For parsing the DOM / HTML of webpages for use in your games / applications
Reviews
Quick Information
For parsing the DOM / HTML of webpages for use in your games / applications