WP_HTML_Processor_State

Internal state management for the HTML processor during parsing.

Source: wp-includes/html-api/class-wp-html-processor-state.php
Since: 6.4.0
Access: private


Constants

Insertion Modes

Constant Description
INSERTION_MODE_INITIAL Initial state for full parser
INSERTION_MODE_BEFORE_HTML Before HTML element
INSERTION_MODE_BEFORE_HEAD Before HEAD element
INSERTION_MODE_IN_HEAD Inside HEAD element
INSERTION_MODE_IN_HEAD_NOSCRIPT Inside NOSCRIPT in HEAD
INSERTION_MODE_AFTER_HEAD After HEAD, before BODY
INSERTION_MODE_IN_BODY Inside BODY (main mode)
INSERTION_MODE_IN_TABLE Inside TABLE element
INSERTION_MODE_IN_TABLE_TEXT Processing table text
INSERTION_MODE_IN_CAPTION Inside CAPTION element
INSERTION_MODE_IN_COLUMN_GROUP Inside COLGROUP element
INSERTION_MODE_IN_TABLE_BODY Inside TBODY/THEAD/TFOOT
INSERTION_MODE_IN_ROW Inside TR element
INSERTION_MODE_IN_CELL Inside TD/TH element
INSERTION_MODE_IN_SELECT Inside SELECT element
INSERTION_MODE_IN_SELECT_IN_TABLE SELECT inside TABLE
INSERTION_MODE_IN_TEMPLATE Inside TEMPLATE element
INSERTION_MODE_AFTER_BODY After BODY closes
INSERTION_MODE_IN_FRAMESET Inside FRAMESET element
INSERTION_MODE_AFTER_FRAMESET After FRAMESET closes
INSERTION_MODE_AFTER_AFTER_BODY Final state after body
INSERTION_MODE_AFTER_AFTER_FRAMESET Final state after frameset

Properties

Stack of Open Elements

public $stack_of_open_elements;

Type: WP_HTML_Open_Elements

Tracks which elements are currently open. The stack grows downwards; the first added is at the top, the most recent at the bottom.


Active Formatting Elements

public $active_formatting_elements;

Type: WP_HTML_Active_Formatting_Elements

Tracks formatting elements for handling mis-nested tags.


Current Token

public $current_token = null;

Type: WP_HTML_Token|null

Reference to the currently-matched tag.


Insertion Mode

public $insertion_mode = self::INSERTION_MODE_INITIAL;

Type: string

Current insertion mode for tree construction.


Stack of Template Insertion Modes

public $stack_of_template_insertion_modes = array();

Type: array<string>

Stack for tracking insertion modes when entering TEMPLATE elements.


Context Node (Deprecated)

public $context_node = null;

Type: null

Deprecated: 6.8.0 — Context node is tracked internally by WP_HTML_Processor.


Encoding

public $encoding = null;

Type: string|null

The recognized encoding of the input byte stream.


Encoding Confidence

public $encoding_confidence = 'tentative';

Type: string

Confidence level: "tentative", "certain", or "irrelevant".


HEAD Element Pointer

public $head_element = null;

Type: WP_HTML_Token|null

Points to the HEAD element when encountered.


FORM Element Pointer

public $form_element = null;

Type: WP_HTML_Token|null

Points to the last opened FORM element. Used to associate form controls even with badly nested markup.


Frameset-OK Flag

public $frameset_ok = true;

Type: bool

Indicates whether a FRAMESET element is allowed. Initially true, set to false after certain tokens.


Methods

__construct()

Creates a new empty state.

public function __construct()

Initializes the stack of open elements and active formatting elements.


Usage Example

// State is managed internally by WP_HTML_Processor
$processor = WP_HTML_Processor::create_fragment( '<div><p>Hello' );

// The processor maintains state internally:
// - stack_of_open_elements: [HTML, BODY, DIV, P]
// - insertion_mode: IN_BODY
// - active_formatting_elements: []

while ( $processor->next_tag() ) {
    // As the processor parses, state is updated
    // - Elements are pushed/popped from stacks
    // - Insertion mode changes based on context
}

Insertion Mode Flow

INITIAL
    └─ (DOCTYPE or implied)
BEFORE_HTML
    └─ (HTML tag or implied)
BEFORE_HEAD
    └─ (HEAD tag or implied)
IN_HEAD
    ├─ IN_HEAD_NOSCRIPT (if NOSCRIPT)
    └─ (HEAD closes or implied)
AFTER_HEAD
    └─ (BODY or FRAMESET or implied)
IN_BODY                     IN_FRAMESET
    ├─ IN_TABLE                 └─ AFTER_FRAMESET
    │   ├─ IN_CAPTION               └─ AFTER_AFTER_FRAMESET
    │   ├─ IN_COLUMN_GROUP
    │   ├─ IN_TABLE_BODY
    │   │   └─ IN_ROW
    │   │       └─ IN_CELL
    │   ├─ IN_SELECT
    │   │   └─ IN_SELECT_IN_TABLE
    │   └─ IN_TABLE_TEXT
    ├─ IN_TEMPLATE (recursive)
    └─ (BODY closes)
AFTER_BODY
    └─ AFTER_AFTER_BODY

Encoding Confidence Levels

Level Meaning
tentative Encoding may change if contradicting information found
certain Encoding is definitely known
irrelevant No encoding needed (already Unicode)