Check out our latest project ✨ OpenChapter.io: free ebooks the way its meant to be πŸ“–

Godot DOM Parser

An asset by codeWonderland
The page banner background of a mountain and forest
Godot DOM Parser hero image

Quick Information

0 ratings
Godot DOM Parser icon image
codeWonderland
Godot DOM Parser

For parsing the DOM / HTML of webpages for use in your games / applications

Supported Engine Version
4.6
Version String
1.0.0
License Version
MIT
Support Level
community
Modified Date
11 hours ago
Git URL
Issue URL

GodotDOMParser

Fetch a URL, parse its HTML, and query the DOM with CSS-like selectors β€” all in pure GDScript. No native dependencies, works on every platform Godot supports.

  • Engine: Godot 4.2+
  • License: MIT
  • Status: 0.1.0 β€” usable, forgiving HTML parser, subset of CSS selectors.

Install

  1. Copy the addons/godot_dom_parser/ folder into your project's addons/ directory. (Or install via the AssetLib tab in the editor.)
  2. Open Project β†’ Project Settings β†’ Plugins and enable GodotDOMParser.

All public classes register their class_name globally, so you can use DOMParser, DOMDocument, DOMNode, HTMLParser, and CSSSelector from anywhere without preload.

Quick start

extends Node

func _ready() -> void:
    var parser := DOMParser.new()
    add_child(parser)

    var doc: DOMDocument = await parser.fetch("https://example.com")
    if doc == null:
        push_error("fetch failed")
        return

    print("Title: ", doc.get_title())

    for link in doc.query_selector_all("a[href]"):
        print(link.get_attribute("href"), " -> ", link.get_text_content())

Parsing a raw HTML string

var html := "<html><body><p class='hi'>hello <b>world</b></p></body></html>"
var doc := DOMParser.parse_html(html)
print(doc.query_selector("p.hi").get_text_content())  # "hello world"

API

DOMParser (Node)

Member Description
fetch(url: String) -> DOMDocument Awaitable. GETs the URL and returns a parsed document, or null on error.
static parse_html(html: String) -> DOMDocument Parse an HTML string directly.
user_agent: String UA string sent with requests.
extra_headers: PackedStringArray Extra request headers, "Name: value" format.
timeout_seconds: float Request timeout.
max_redirects: int Redirects to follow.
signal document_loaded(document) Emitted after a successful fetch.
signal fetch_failed(error, response_code) Emitted on network or HTTP error.

DOMDocument (extends DOMNode)

Member Description
source_url: String URL this document was fetched from (if any).
raw_html: String The original HTML text.
get_document_element() The <html> element (or first element child).
get_head() / get_body() Convenience accessors.
get_title() -> String Text of the <title> element.

DOMNode

Member Description
tag_name: String Lowercase tag (e.g. "div"). Empty for text/comment.
attributes: Dictionary Attribute map (keys lowercased).
children: Array[DOMNode] Child nodes.
parent: DOMNode Parent (may be null).
text: String Text content for text/comment nodes.
is_element() / is_text() / is_void() Type predicates.
get_attribute(name, default="") Read attribute.
has_attribute(name) / set_attribute(name, value) / remove_attribute(name) Attribute CRUD.
get_id() / get_classes() / has_class(cls) Shortcuts.
get_text_content() Concatenated text of this node and descendants.
get_inner_html() / get_outer_html() Serialize back to HTML.
append_child(n) / remove_child(n) / remove() Tree mutation.
get_element_by_id(id) First descendant element with that id.
get_elements_by_tag_name(tag) All descendant elements with that tag ("*" for all).
get_elements_by_class_name(cls) All descendant elements with that class.
query_selector(sel) First descendant matching the selector.
query_selector_all(sel) All descendants matching the selector.
matches(sel) Does this node match the selector?
walk() / walk_elements() Pre-order traversal helpers.

Supported CSS selectors

  • Type / universal: div, *
  • ID: #main
  • Class: .title, .a.b (multiple)
  • Attribute:
    • [disabled] β€” present
    • [type="text"] β€” exact
    • [class~="hero"] β€” whitespace-separated word
    • [href^="https"] β€” prefix
    • [href$=".pdf"] β€” suffix
    • [href*="foo"] β€” substring
    • [lang|="en"] β€” exact or "en-" prefix
  • Combinators: descendant (space), child (>), adjacent sibling (+), general sibling (~)
  • Selector lists: a, b, c
  • Pseudo-classes: :first-child, :last-child, :only-child, :first-of-type, :last-of-type, :not(<simple>)

Examples:

doc.query_selector_all("article.post > h2 a[href^='https']")
doc.query_selector_all("ul.nav li:first-child")
doc.query_selector_all("p:not(.muted)")

Interacting with the DOM

The tree is fully mutable. Changes are reflected by get_outer_html().

var body := doc.get_body()
var new_p := DOMNode.create_element("p")
new_p.set_attribute("class", "added")
new_p.append_child(DOMNode.create_text("injected from Godot"))
body.append_child(new_p)

for node in doc.query_selector_all(".advert"):
    node.remove()

print(doc.get_outer_html())

Limitations

  • Not a spec-compliant HTML5 parser. It's forgiving enough for typical pages (void elements, unquoted attributes, implicit <p>/<li> closing, raw-text for <script>/<style>), but edge cases in table foster-parenting, <template>, and malformed markup are handled heuristically.
  • Entity decoding covers the numeric (&#...;, &#x...;) forms plus a small named-entity table. Uncommon named entities pass through as-is.
  • Selectors do not (yet) support :nth-child(...), namespaces, or case-sensitive attribute matching ([attr=val i]).
  • JavaScript is not executed. If a page renders its content client-side, you'll only see the initial HTML.

Contributing

Bug reports and PRs welcome. If you hit HTML that parses incorrectly, a minimal reproducing snippet is the most useful thing you can send.

For parsing the DOM / HTML of webpages for use in your games / applications

Reviews

0 ratings

Your Rating

Headline must be at least 3 characters but not more than 50
Review must be at least 5 characters but not more than 500
Please sign in to add a review

Quick Information

0 ratings
Godot DOM Parser icon image
codeWonderland
Godot DOM Parser

For parsing the DOM / HTML of webpages for use in your games / applications

Supported Engine Version
4.6
Version String
1.0.0
License Version
MIT
Support Level
community
Modified Date
11 hours ago
Git URL
Issue URL

Open Source

Released under the AGPLv3 license

Plug and Play

Browse assets directly from Godot

Community Driven

Created by developers for developers