Garbled characters and question marks in database data

Are you coding your own website with PHP and MySQL? Did you create or upload some records through phpMyAdmin and trying to show them on your website? Or did you submit some data through your website and are trying to check it in phpMyAdmin?

If so, you may have problems with special characters, i.e. all characters that are not standard letters, numbers or basic punctuation, including letters with accents, non-latin alphabets or emoji. Data submitted through phpMyAdmin may end up as question marks on your website, and data submitted through your site may end up garbled in phpMyAdmin.

This article explains what causes this issue, and how you can fix it.

What issues you may see

The issue most apparent if you interact with your database both through phpMyAdmin and through code on your own website.

For example, if you try to submit data from PHP:

Then it may end up garbled in phpMyAdmin:

image

And if you submit data through phpMyAdmin:

image

It may end up as question marks on your website:

image

What causes this issue

This issue is related to the so-called “charset” of the MySQL database connection. Simply put: the charset determines how the characters in your text are converted to binary data to be sent to the database and stored. MySQL supports many different charsets, each optimized for certain types of data and languages.

For reading and writing of the database data to work correctly, you must use the same (or at least a compatible) charset at all times when interacting with text.

And this is not the case out of the box when using our hosting.

phpMyAdmin always uses the charset utf8mb4. The utf8mb4 charset is generally a good charset to use, it’s based on the very widely used UTF-8 character encoding. UTF-8 is also the encoding used for web pages, and it supports basically all special characters imaginable.

However, our PHP installation does not use utf8mb4, it uses the latin1 charset by default. And utf8mb4 and latin1 are not compatible, which is why you see these issues with broken characters.

How can you fix this issue

To fix this issue, you can change the charset used in PHP to connect to your database.

Both PHP and MySQLi support specifying which charset should be used for the connection, regardless of the server default setting.

It’s important you set this for every database connection on every page, or you may still see issues like this. Having a single PHP file that creates the database connection is extremely helpful for this reason.

IMPORTANT: THIS WILL BREAK DATA PREVIOUSLY INSERTED THROUGH PHP

When you change the connection charset from latin1 to utf8mb4, the text on your website will be shown as it is in phpMyAdmin. If you have already inserted data from PHP in your database before changing the charset, reading it will result in garbled text, like you see in phpMyAdmin.

So if possible, try to fix this issue early during development, when you can safely throw out all invalid data and start over. If your website is already live and is reading and writing real data, please decide for yourself how much you want to keep the data and how much being able to manage it through phpMyAdmin is worth it for you.

Setting the MySQL charset with MySQLi

Setting the database charset from MySQLi is done using the special mysqli_set_charset function. You need to use this function after creating the database connection but before running any queries.

If you’re using MySQLi in object-oriented style, it works like this:

$mysqli = new mysqli('sql999.infinityfree.com', 'if0_12345678', 'hunter2', 'if0_12345678_mydb');
$mysqli->set_charset('utf8mb4');

$result = $mysqli->query('SELECT * FROM my_table');

If you’re using MySQLi the procedural way instead, you can do it like this:

$mysqli = mysqli_connect('sql999.infinityfree.com', 'if0_12345678', 'hunter2', 'if0_12345678_mydb');
mysqli_set_charset($mysqli, 'utf8mb4');

$result = mysqli_query($mysqli, 'SELECT * FROM my_table');

For more information, please see the official documentation: PHP: mysqli::set_charset - Manual

Setting the MySQL charset with PDO

Setting the database charset when using the PDO extension is done by setting the charset parameter in the DSN when creating the database connection. The charset can be added by adding ;charset=utf8mb4 to the end of the DSN, like so:

$dsn = 'mysql:host=sql999.infinityfree.com;dbname=if0_12345678_mydb;charset=utf8mb4'
$pdo = new PDO($dsn, 'if0_12345678', 'hunter2');

$stmt = $pdo->query('SELECT * FROM my_table');

Other questions

Why don’t you just change this on the servers?

If we were to launch a hosting service from scratch, then of course we would use utf8mb4 by default. However, over the years, many websites have been created that do not set the charset, and use the default latin1 encoding.

Changing this default system wide would make all of these old websites switch from latin1 to utf8mb4, which would break all special characters in their databases.

We don’t want to break so many of our users’ websites, which is why we are stuck with this old default.

And no, it’s sadly not possible to change this only for new accounts.

Why don’t you just change this in phpMyAdmin?

Most website software, including WordPress, already explicitly use utf8 or utf8mb4. Sometimes this is the default, and sometimes they don’t support customizing this at all. Custom coded websites where the developer has thought about charsets will also use one of these two charsets.

Given that most people don’t completely custom-code their website, the overwhelming majority of websites are already using utf8 or utf8mb4 encoding. And changing this setting in phpMyAdmin would impact their ability to manage their own database.

And no, it’s also not possible to make this configurable.

Why can’t I fix this by just setting the collation in the database?

Every text-type column in MySQL needs to have a “collation” set. In many cases, this will be set to something like utf8mb4_unicode_ci.

The collation of a column only affects how data is compared, not written or retrieved. This is necessary for things like comparing and sorting data, and affects how for example accents are handled (like whether Bar or Bär are the same).

While you should always choose a collation that matches the charset you are using (or comparing and sorting special characters will not work correctly), you will still need to choose and use a charset for the database connection.

8 Likes