Mysql 字符集(Character) 介绍

字符集介绍

MySQL支持多种字符集,不同字符集有区别;广泛使用是utf8和utf8mb4。

utf8是utf8mb3的别名,使用1~3byte表示一个Unicode字符;gbk占2byte,使用utf8表示latin1只需要1byte。

utf8mb4使用1~4byte表示一个Unicode字符,兼容utf8,是utf8的超集,比utf8能存储更多的字符,比如emoji表情等特殊图形符号字符;有特殊字符存储需求建议使用utf8mb4。

MySQL 5.7 及之前版本默认字符集是latin1, MySQL 8.0 版本默认字符集是utf8mb4。

MySQL支持四种级别字符集比较规则,分别是:服务器级别character_set_server、数据库级别character_set_database、表级别、列级别。

对客户端连接有三个字符集系统变量,用于客户端编码和解码,分别是:character_set_client,character_set_connection,character_set_results

MySQL默认字符集配置

启动配置

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
$ cat /etc/my.cnf
[mysqld]
character-set-server = utf8mb4
collation-server = utf8mb4_general_ci
skip-character-set-client-handshake
#忽略应用连接自己设置的字符编码,保持与全局设置一致
[client]
default-character-set=utf8mb4
[mysql]
default-character-set=utf8mb4

查看当前配置

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
mysql> show variables like '%char%';
+--------------------------+-----------------------------------------------+
| Variable_name            | Value                                         |
+--------------------------+-----------------------------------------------+
| character_set_client     | utf8mb4                                       |
| character_set_connection | utf8mb4                                       |
| character_set_database   | utf8mb4                                       |
| character_set_filesystem | binary                                        |
| character_set_results    | utf8mb4                                       |
| character_set_server     | utf8mb4                                       |
| character_set_system     | utf8                                          |
| character_sets_dir       | /mysql-lin/mysql/share/charsets/              |
+--------------------------+-----------------------------------------------+

字符集编码参数说明和作用范围

变量 影响范围 默认编码 说明
character_set_client Global, Session utf8mb4 客户端请求的数据使用的字符集
character_set_connection Global, Session utf8mb4 连接传输过程中使用的字符集
character_set_database Global, Session utf8mb4 数据库的默认编码格式
character_set_filesystem Global, Session binary 访问文件系统上的文件名时使用的编码格式
character_set_results Global, Session utf8mb4 给客户端返回的数据的编码格式
character_set_server Global, Session latin1 Mysql Server的默认编码格式
character_set_system Global utf8 数据库服务器存储使用的编码格式,不可动态变更
character_sets_dir Global dirname 字符集安装的目录,不可动态变更

影响范围Global表示是全局配置,Session表示会影响会话连接传输的数据。 character_set_systemcharacter_sets_dir是MySQL Server自身使用的编码格式,不会影响客户端会话。

在连接参数中使用--default-character-set=utf8或命令行执行set names utf8,会同时修改当前会话的character_set_clientcharacter_set_connectioncharacter_set_results三个字符集配置。

设置字符集

创建数据库时指定字符集

1
CREATE DATABASE `testdb` CHARACTER SET utf8 COLLATE utf8_general_ci;

修改数据库字符集

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
# 设置数据库字符集编码
ALTER DATABASE `testdb` CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;

# 示例
mysql> show create database testdb;
+----------+-----------------------------------------------------------------+
| Database | Create Database                                                 |
+----------+-----------------------------------------------------------------+
| testdb   | CREATE DATABASE `testdb` /*!40100 DEFAULT CHARACTER SET utf8 */ |
+----------+-----------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> alter database `testdb` CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;
Query OK, 1 row affected (0.01 sec)

mysql> show create database testdb;
+----------+--------------------------------------------------------------------+
| Database | Create Database                                                    |
+----------+--------------------------------------------------------------------+
| testdb   | CREATE DATABASE `testdb` /*!40100 DEFAULT CHARACTER SET utf8mb4 */ |
+----------+--------------------------------------------------------------------+
1 row in set (0.00 sec)

创建表指定表及字段字符集

1
CREATE TABLE tb_name( col_name VARCHAR(20) CHARACTER SET utf8 COLLATE utf8_general_ci) CHARACTER SET utf8 COLLATE utf8_general_ci;

修改表及字段字符集

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
# 修改表字符集
alter table `tb_name` default character set utf8mb4;

# 修改某字段字符集
alter table `tb_name` modify col_name varchar(20) character set utf8mb4;

# 同时修改表及字段字符集
alter table `tb_name` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;

# 如果某个库里面表比较多 可以拼接出要执行的批量修改语句
SELECT
 CONCAT( 'ALTER TABLE ', TABLE_NAME, ' CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;' ) 
FROM
 information_schema.`TABLES` 
WHERE
 TABLE_SCHEMA = 'testdb';

character set与collation的关系

MySQL5.7中存在41 charset和222种 collation

character set即字符集

collation即比对方法,相同字符集的数据的排序规则

每个character set会对应一定数量的collation。

collation名字的规则分两类:

<character set>_<language/other>_<ci/cs>
<character set>_bin

ci是case insensitive的缩写,cs是case sensitive的缩写,指大小写是否敏感。

查看当前支持的字符集

1
2
3
4
5
6
7
8
9
mysql> show character set;
+----------+---------------------------------+---------------------+--------+
| Charset  | Description                     | Default collation   | Maxlen |
+----------+---------------------------------+---------------------+--------+
| big5     | Big5 Traditional Chinese        | big5_chinese_ci     |      2 |
| latin1   | cp1252 West European            | latin1_swedish_ci   |      1 |
| gbk      | GBK Simplified Chinese          | gbk_chinese_ci      |      2 |
| utf8     | UTF-8 Unicode                   | utf8_general_ci     |      3 |
| utf8mb4  | UTF-8 Unicode                   | utf8mb4_general_ci  |      4 |

查看utf8字符集的collation排序规则列表

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
mysql> show collation like 'utf8%'
+--------------------------+---------+-----+---------+----------+---------+
| Collation                | Charset | Id  | Default | Compiled | Sortlen |
+--------------------------+---------+-----+---------+----------+---------+
| utf8_general_ci          | utf8    |  33 | Yes     | Yes      |       1 |
| utf8_bin                 | utf8    |  83 |         | Yes      |       1 |
| utf8_unicode_ci          | utf8    | 192 |         | Yes      |       8 |
| utf8_icelandic_ci        | utf8    | 193 |         | Yes      |       8 |
| utf8_latvian_ci          | utf8    | 194 |         | Yes      |       8 |
| utf8_romanian_ci         | utf8    | 195 |         | Yes      |       8 |
| utf8_slovenian_ci        | utf8    | 196 |         | Yes      |       8 |
| utf8_polish_ci           | utf8    | 197 |         | Yes      |       8 |
| utf8_estonian_ci         | utf8    | 198 |         | Yes      |       8 |
| utf8_spanish_ci          | utf8    | 199 |         | Yes      |       8 |
| utf8_swedish_ci          | utf8    | 200 |         | Yes      |       8 |
| utf8_turkish_ci          | utf8    | 201 |         | Yes      |       8 |
| utf8_czech_ci            | utf8    | 202 |         | Yes      |       8 |
| utf8_danish_ci           | utf8    | 203 |         | Yes      |       8 |
| utf8_lithuanian_ci       | utf8    | 204 |         | Yes      |       8 |
| utf8_slovak_ci           | utf8    | 205 |         | Yes      |       8 |
| utf8_spanish2_ci         | utf8    | 206 |         | Yes      |       8 |
| utf8_roman_ci            | utf8    | 207 |         | Yes      |       8 |
| utf8_persian_ci          | utf8    | 208 |         | Yes      |       8 |
| utf8_esperanto_ci        | utf8    | 209 |         | Yes      |       8 |
| utf8_hungarian_ci        | utf8    | 210 |         | Yes      |       8 |
| utf8_sinhala_ci          | utf8    | 211 |         | Yes      |       8 |
| utf8_german2_ci          | utf8    | 212 |         | Yes      |       8 |
| utf8_croatian_ci         | utf8    | 213 |         | Yes      |       8 |
| utf8_unicode_520_ci      | utf8    | 214 |         | Yes      |       8 |
| utf8_vietnamese_ci       | utf8    | 215 |         | Yes      |       8 |
| utf8_general_mysql500_ci | utf8    | 223 |         | Yes      |       1 |

utf8编码的默认排序规则为utf8_general_ci

参考