Categories
WordPress

How to Check Database’s Column Charset

I recently stumbled upon encoding issue in WordPress where user’s site post_content column of wp_posts table still uses utf8 charset instead of utf8mb4 which makes saved content comparison against $_POST‘s post content check fails when there is an emoji used inside the post_content given.

Thus, here’s how to check what charset is used in given column in WordPress:

global $wpdb;

// Let's assume you want to check `post_content` column of `wp_posts`
// $charset value here is either `utf8` or `utf8mb4`
$charset = $wpdb->get_col_charset( $wpdb->posts, 'post_content' );

As in why charset matters, here’s a quote of utf8 charset compared to utf8mb4 charsets:

The difference between utf8 and utf8mb4 is that the former can only store 3 byte characters, while the latter can store 4 byte characters. In Unicode terms, utf8 can only store characters in the Basic Multilingual Plane, while utf8mb4 can store any Unicode character. This greatly expands the language usability of WordPress, especially in countries that use Han character sets. Unicode isn’t without its problems, but it’s the best option available.

Source: https://make.wordpress.org/core/2015/04/02/the-utf8mb4-upgrade/

There’s also larger story behind why this charset update matters: The Trojan Emoji.

Photo by Fikret tozak on Unsplash

Leave a Reply

Your email address will not be published. Required fields are marked *