I recently stumbled upon encoding issue in WordPress where user’s site
post_content column of
wp_posts table still uses
utf8 charset instead of
utf8mb4 which makes saved content comparison against
$_POST‘s post content check fails when there is an emoji used inside the
Thus, here’s how to check what charset is used in given column in WordPress:
global $wpdb; // Let's assume you want to check `post_content` column of `wp_posts` // $charset value here is either `utf8` or `utf8mb4` $charset = $wpdb->get_col_charset( $wpdb->posts, 'post_content' );
As in why charset matters, here’s a quote of
utf8 charset compared to
The difference betweenSource: https://make.wordpress.org/core/2015/04/02/the-utf8mb4-upgrade/
utf8mb4is that the former can only store 3 byte characters, while the latter can store 4 byte characters. In Unicode terms,
utf8can only store characters in the Basic Multilingual Plane, while
utf8mb4can store any Unicode character. This greatly expands the language usability of WordPress, especially in countries that use Han character sets. Unicode isn’t without its problems, but it’s the best option available.
There’s also larger story behind why this charset update matters: The Trojan Emoji.