WordPress Formatting API Overview
The Formatting API is one of WordPress’s largest and most critical subsystems, handling text transformation, sanitization, and escaping throughout the platform. It lives primarily in wp-includes/formatting.php (6,304 lines) and wp-includes/kses.php (3,046 lines).
Core Concepts
Escaping vs Sanitization
These terms are often confused but serve distinct purposes:
Escaping (Output Safety)
- Applied when displaying data
- Makes data safe for a specific context (HTML, attributes, URLs, JS)
- Does NOT alter the stored data
- Functions:
esc_html(),esc_attr(),esc_url(),esc_js(),esc_textarea(),esc_xml()
Sanitization (Input Safety)
- Applied when saving data
- Permanently modifies data to be safe
- Removes or transforms dangerous content
- Functions:
sanitize_*(),wp_kses*(),wp_filter_*_kses()
The Golden Rule:
Sanitize early, escape late. Never trust user input. Always escape on output.
The KSES System
KSES ("KSES Strips Evil Scripts") is WordPress’s HTML filtering engine. It:
- Maintains allowlists of safe HTML elements and attributes
- Strips disallowed tags completely
- Validates attribute values against allowed protocols
- Removes malicious JavaScript and event handlers
// Core KSES contexts
'post' // Full HTML for post content ($allowedposttags)
'data' // Basic HTML ($allowedtags - very limited)
'strip' // Remove ALL HTML
'entities' // Return allowed entity names
Autop (Auto-Paragraphing)
The wpautop() function converts double line breaks into HTML paragraphs and single line breaks into <br /> tags. This is fundamental to how WordPress converts plain text content into proper HTML.
Key behaviors:
- Double newlines (
nn) become<p>tags - Single newlines become
<br />(unless$br = false) - Preserves content in
<pre>,<script>,<style>,<svg>,<math>tags - Properly handles block-level elements (won’t wrap them in
<p>) - Applied via
the_contentfilter
Texturize (Typography Enhancement)
The wptexturize() function transforms plain text typography into typographically correct HTML entities:
| Plain Text | Transformed |
|---|---|
"quotes" |
"smart quotes" (curly) |
'apostrophe' |
‘smart’ |
-- |
en-dash (–) |
--- |
em-dash (—) |
... |
ellipsis (…) |
(tm) |
trademark (™) |
9x9 |
9×9 (multiplication sign) |
Skip protection: Content in <pre>, <code>, <kbd>, <style>, <script>, <tt> tags is NOT texturized.
Security Model
Trust Levels
WordPress uses capability-based trust:
// Users with 'unfiltered_html' capability bypass KSES
if ( current_user_can( 'unfiltered_html' ) ) {
// Full HTML allowed
} else {
// KSES filters applied
}
Allowed HTML by Context
Post content ($allowedposttags):
- Extensive HTML5 elements
- MathML support
- Media elements (
audio,video,source,track) - Semantic elements (
article,aside,section,nav,header,footer) - Interactive elements (
details,summary,dialog) - Form elements (limited)
Basic content ($allowedtags):
- Links (
<a>withhrefandtitle) - Basic formatting (
<b>,<strong>,<em>,<i>) - Block quotes, code, abbreviations
- NO block-level elements
Global attributes (added to all allowed elements):
class,id,style,title,roledir,lang,xml:langaria-*,data-*
Protocol Filtering
URLs are validated against allowed protocols:
$allowed_protocols = array(
'http', 'https', 'ftp', 'ftps', 'mailto', 'news', 'irc', 'irc6', 'ircs',
'gopher', 'nntp', 'feed', 'telnet', 'mms', 'rtsp', 'sms', 'svn', 'tel',
'fax', 'xmpp', 'webcal', 'urn'
);
CSS Filtering
The safecss_filter_attr() function filters inline style attributes, allowing only safe CSS properties and values. It supports:
- Layout:
display,position,flex,grid, CSS variables - Box model:
margin,padding,border,width,height - Typography:
font-*,text-*,color - Backgrounds:
background-*(including gradients) - Modern CSS:
calc(),var(),min(),max(),clamp()
Common Patterns
Content Display Pipeline
// Raw content from database
$content = get_post_field('post_content', $post_id);
// Apply filters (wpautop, wptexturize, shortcodes, embeds, blocks)
$content = apply_filters('the_content', $content);
// Content is now safe for display (filters include escaping)
echo $content;
Manual Escaping Pattern
// For attributes
echo '<input value="' . esc_attr($user_input) . '">';
// For HTML content
echo '<p>' . esc_html($user_input) . '</p>';
// For URLs
echo '<a href="' . esc_url($url) . '">';
// For JavaScript (inline)
echo '<div onclick="handleClick('' . esc_js($value) . '')">';
Sanitization Pattern
// On form submission
$title = sanitize_text_field($_POST['title']);
$email = sanitize_email($_POST['email']);
$content = wp_kses_post($_POST['content']);
// Then save
update_post_meta($id, 'title', $title);
Performance Considerations
- Static caching:
wptexturize()caches its character/replacement arrays - Skip filters: Use
run_wptexturizefilter to disable texturization site-wide - KSES is expensive: For trusted content, consider
wp_kses_post()over fullwp_kses() - Batch operations:
wp_kses_post_deep()handles arrays efficiently
Related APIs
- Shortcodes API: Processed before autop, can be protected from texturization
- Blocks API: Block content parsed separately, has its own sanitization
- Comments API: Uses stricter KSES rules by default
- REST API: Has its own sanitization callbacks per field