Check out our latest project ✨ OpenChapter.io: free ebooks the way its meant to be πŸ“–

Godot DOM Parser

An asset by codeWonderland
The page banner background of a mountain and forest
Godot DOM Parser hero image

Quick Information

0 ratings
Godot DOM Parser icon image
codeWonderland
Godot DOM Parser

For parsing the DOM of webpages for use in your games / applications

Supported Engine Version
4.6
Version String
1.0.0
License Version
MIT
Support Level
community
Modified Date
8 hours ago
Git URL
Issue URL

GodotDOMParser

Fetch a URL, parse its HTML, and query the DOM with CSS-like selectors β€” all in pure GDScript. No native dependencies, works on every platform Godot supports.

  • Engine: Godot 4.2+
  • License: MIT
  • Status: 0.1.0 β€” usable, forgiving HTML parser, subset of CSS selectors.

Install

  1. Copy the addons/godot_dom_parser/ folder into your project's addons/ directory. (Or install via the AssetLib tab in the editor.)
  2. Open Project β†’ Project Settings β†’ Plugins and enable GodotDOMParser.

All public classes register their class_name globally, so you can use DOMParser, DOMDocument, DOMNode, HTMLParser, and CSSSelector from anywhere without preload.

Quick start

extends Node

func _ready() -> void:
    var parser := DOMParser.new()
    add_child(parser)

    var doc: DOMDocument = await parser.fetch("https://example.com")
    if doc == null:
        push_error("fetch failed")
        return

    print("Title: ", doc.get_title())

    for link in doc.query_selector_all("a[href]"):
        print(link.get_attribute("href"), " -> ", link.get_text_content())

Parsing a raw HTML string

var html := "<html><body><p class='hi'>hello <b>world</b></p></body></html>"
var doc := DOMParser.parse_html(html)
print(doc.query_selector("p.hi").get_text_content())  # "hello world"

API

DOMParser (Node)

Member Description
fetch(url: String) -> DOMDocument Awaitable. GETs the URL and returns a parsed document, or null on error.
static parse_html(html: String) -> DOMDocument Parse an HTML string directly.
user_agent: String UA string sent with requests.
extra_headers: PackedStringArray Extra request headers, "Name: value" format.
timeout_seconds: float Request timeout.
max_redirects: int Redirects to follow.
signal document_loaded(document) Emitted after a successful fetch.
signal fetch_failed(error, response_code) Emitted on network or HTTP error.

DOMDocument (extends DOMNode)

Member Description
source_url: String URL this document was fetched from (if any).
raw_html: String The original HTML text.
get_document_element() The <html> element (or first element child).
get_head() / get_body() Convenience accessors.
get_title() -> String Text of the <title> element.

DOMNode

Member Description
tag_name: String Lowercase tag (e.g. "div"). Empty for text/comment.
attributes: Dictionary Attribute map (keys lowercased).
children: Array[DOMNode] Child nodes.
parent: DOMNode Parent (may be null).
text: String Text content for text/comment nodes.
is_element() / is_text() / is_void() Type predicates.
get_attribute(name, default="") Read attribute.
has_attribute(name) / set_attribute(name, value) / remove_attribute(name) Attribute CRUD.
get_id() / get_classes() / has_class(cls) Shortcuts.
get_text_content() Concatenated text of this node and descendants.
get_inner_html() / get_outer_html() Serialize back to HTML.
append_child(n) / remove_child(n) / remove() Tree mutation.
get_element_by_id(id) First descendant element with that id.
get_elements_by_tag_name(tag) All descendant elements with that tag ("*" for all).
get_elements_by_class_name(cls) All descendant elements with that class.
query_selector(sel) First descendant matching the selector.
query_selector_all(sel) All descendants matching the selector.
matches(sel) Does this node match the selector?
walk() / walk_elements() Pre-order traversal helpers.

Supported CSS selectors

  • Type / universal: div, *
  • ID: #main
  • Class: .title, .a.b (multiple)
  • Attribute:
    • [disabled] β€” present
    • [type="text"] β€” exact
    • [class~="hero"] β€” whitespace-separated word
    • [href^="https"] β€” prefix
    • [href$=".pdf"] β€” suffix
    • [href*="foo"] β€” substring
    • [lang|="en"] β€” exact or "en-" prefix
  • Combinators: descendant (space), child (>), adjacent sibling (+), general sibling (~)
  • Selector lists: a, b, c
  • Pseudo-classes: :first-child, :last-child, :only-child, :first-of-type, :last-of-type, :not(<simple>)

Examples:

doc.query_selector_all("article.post > h2 a[href^='https']")
doc.query_selector_all("ul.nav li:first-child")
doc.query_selector_all("p:not(.muted)")

Interacting with the DOM

The tree is fully mutable. Changes are reflected by get_outer_html().

var body := doc.get_body()
var new_p := DOMNode.create_element("p")
new_p.set_attribute("class", "added")
new_p.append_child(DOMNode.create_text("injected from Godot"))
body.append_child(new_p)

for node in doc.query_selector_all(".advert"):
    node.remove()

print(doc.get_outer_html())

Limitations

  • Not a spec-compliant HTML5 parser. It's forgiving enough for typical pages (void elements, unquoted attributes, implicit <p>/<li> closing, raw-text for <script>/<style>), but edge cases in table foster-parenting, <template>, and malformed markup are handled heuristically.
  • Entity decoding covers the numeric (&#...;, &#x...;) forms plus a small named-entity table. Uncommon named entities pass through as-is.
  • Selectors do not (yet) support :nth-child(...), namespaces, or case-sensitive attribute matching ([attr=val i]).
  • JavaScript is not executed. If a page renders its content client-side, you'll only see the initial HTML.

Contributing

Bug reports and PRs welcome. If you hit HTML that parses incorrectly, a minimal reproducing snippet is the most useful thing you can send.

For parsing the DOM of webpages for use in your games / applications

Reviews

0 ratings

Your Rating

Headline must be at least 3 characters but not more than 50
Review must be at least 5 characters but not more than 500
Please sign in to add a review

Quick Information

0 ratings
Godot DOM Parser icon image
codeWonderland
Godot DOM Parser

For parsing the DOM of webpages for use in your games / applications

Supported Engine Version
4.6
Version String
1.0.0
License Version
MIT
Support Level
community
Modified Date
8 hours ago
Git URL
Issue URL

Open Source

Released under the AGPLv3 license

Plug and Play

Browse assets directly from Godot

Community Driven

Created by developers for developers