Install Asset
Install via Godot
To maintain one source of truth, Godot Asset Library is just a mirror of the old asset library so you can download directly on Godot via the integrated asset library browser
Quick Information
For parsing the DOM of webpages for use in your games / applications
GodotDOMParser
Fetch a URL, parse its HTML, and query the DOM with CSS-like selectors β all in pure GDScript. No native dependencies, works on every platform Godot supports.
- Engine: Godot 4.2+
- License: MIT
- Status: 0.1.0 β usable, forgiving HTML parser, subset of CSS selectors.
Install
- Copy the
addons/godot_dom_parser/folder into your project'saddons/directory. (Or install via the AssetLib tab in the editor.) - Open Project β Project Settings β Plugins and enable GodotDOMParser.
All public classes register their class_name globally, so you can use
DOMParser, DOMDocument, DOMNode, HTMLParser, and CSSSelector from
anywhere without preload.
Quick start
extends Node
func _ready() -> void:
var parser := DOMParser.new()
add_child(parser)
var doc: DOMDocument = await parser.fetch("https://example.com")
if doc == null:
push_error("fetch failed")
return
print("Title: ", doc.get_title())
for link in doc.query_selector_all("a[href]"):
print(link.get_attribute("href"), " -> ", link.get_text_content())
Parsing a raw HTML string
var html := "<html><body><p class='hi'>hello <b>world</b></p></body></html>"
var doc := DOMParser.parse_html(html)
print(doc.query_selector("p.hi").get_text_content()) # "hello world"
API
DOMParser (Node)
| Member | Description |
|---|---|
fetch(url: String) -> DOMDocument |
Awaitable. GETs the URL and returns a parsed document, or null on error. |
static parse_html(html: String) -> DOMDocument |
Parse an HTML string directly. |
user_agent: String |
UA string sent with requests. |
extra_headers: PackedStringArray |
Extra request headers, "Name: value" format. |
timeout_seconds: float |
Request timeout. |
max_redirects: int |
Redirects to follow. |
signal document_loaded(document) |
Emitted after a successful fetch. |
signal fetch_failed(error, response_code) |
Emitted on network or HTTP error. |
DOMDocument (extends DOMNode)
| Member | Description |
|---|---|
source_url: String |
URL this document was fetched from (if any). |
raw_html: String |
The original HTML text. |
get_document_element() |
The <html> element (or first element child). |
get_head() / get_body() |
Convenience accessors. |
get_title() -> String |
Text of the <title> element. |
DOMNode
| Member | Description |
|---|---|
tag_name: String |
Lowercase tag (e.g. "div"). Empty for text/comment. |
attributes: Dictionary |
Attribute map (keys lowercased). |
children: Array[DOMNode] |
Child nodes. |
parent: DOMNode |
Parent (may be null). |
text: String |
Text content for text/comment nodes. |
is_element() / is_text() / is_void() |
Type predicates. |
get_attribute(name, default="") |
Read attribute. |
has_attribute(name) / set_attribute(name, value) / remove_attribute(name) |
Attribute CRUD. |
get_id() / get_classes() / has_class(cls) |
Shortcuts. |
get_text_content() |
Concatenated text of this node and descendants. |
get_inner_html() / get_outer_html() |
Serialize back to HTML. |
append_child(n) / remove_child(n) / remove() |
Tree mutation. |
get_element_by_id(id) |
First descendant element with that id. |
get_elements_by_tag_name(tag) |
All descendant elements with that tag ("*" for all). |
get_elements_by_class_name(cls) |
All descendant elements with that class. |
query_selector(sel) |
First descendant matching the selector. |
query_selector_all(sel) |
All descendants matching the selector. |
matches(sel) |
Does this node match the selector? |
walk() / walk_elements() |
Pre-order traversal helpers. |
Supported CSS selectors
- Type / universal:
div,* - ID:
#main - Class:
.title,.a.b(multiple) - Attribute:
[disabled]β present[type="text"]β exact[class~="hero"]β whitespace-separated word[href^="https"]β prefix[href$=".pdf"]β suffix[href*="foo"]β substring[lang|="en"]β exact or"en-"prefix
- Combinators: descendant (space), child (
>), adjacent sibling (+), general sibling (~) - Selector lists:
a, b, c - Pseudo-classes:
:first-child,:last-child,:only-child,:first-of-type,:last-of-type,:not(<simple>)
Examples:
doc.query_selector_all("article.post > h2 a[href^='https']")
doc.query_selector_all("ul.nav li:first-child")
doc.query_selector_all("p:not(.muted)")
Interacting with the DOM
The tree is fully mutable. Changes are reflected by get_outer_html().
var body := doc.get_body()
var new_p := DOMNode.create_element("p")
new_p.set_attribute("class", "added")
new_p.append_child(DOMNode.create_text("injected from Godot"))
body.append_child(new_p)
for node in doc.query_selector_all(".advert"):
node.remove()
print(doc.get_outer_html())
Limitations
- Not a spec-compliant HTML5 parser. It's forgiving enough for typical pages
(void elements, unquoted attributes, implicit
<p>/<li>closing, raw-text for<script>/<style>), but edge cases in table foster-parenting,<template>, and malformed markup are handled heuristically. - Entity decoding covers the numeric (
&#...;,&#x...;) forms plus a small named-entity table. Uncommon named entities pass through as-is. - Selectors do not (yet) support
:nth-child(...), namespaces, or case-sensitive attribute matching ([attr=val i]). - JavaScript is not executed. If a page renders its content client-side, you'll only see the initial HTML.
Contributing
Bug reports and PRs welcome. If you hit HTML that parses incorrectly, a minimal reproducing snippet is the most useful thing you can send.
For parsing the DOM of webpages for use in your games / applications
Reviews
Quick Information
For parsing the DOM of webpages for use in your games / applications