Mysql 字符集(Character) 介绍
字符集介绍
MySQL支持多种字符集,不同字符集有区别;广泛使用是utf8和utf8mb4。
utf8
是utf8mb3的别名,使用1~3byte表示一个Unicode字符;gbk占2byte,使用utf8表示latin1只需要1byte。
utf8mb4
使用1~4byte表示一个Unicode字符,兼容utf8,是utf8的超集,比utf8能存储更多的字符,比如emoji表情等特殊图形符号字符;有特殊字符存储需求建议使用utf8mb4。
MySQL 5.7 及之前版本默认字符集是latin1, MySQL 8.0 版本默认字符集是utf8mb4。
MySQL支持四种级别字符集比较规则,分别是:服务器级别character_set_server
、数据库级别character_set_database
、表级别、列级别。
对客户端连接有三个字符集系统变量,用于客户端编码和解码,分别是:character_set_client
,character_set_connection
,character_set_results
。
MySQL默认字符集配置
启动配置
1
2
3
4
5
6
7
8
9
10
|
$ cat /etc/my.cnf
[mysqld]
character-set-server = utf8mb4
collation-server = utf8mb4_general_ci
skip-character-set-client-handshake
#忽略应用连接自己设置的字符编码,保持与全局设置一致
[client]
default-character-set=utf8mb4
[mysql]
default-character-set=utf8mb4
|
查看当前配置
1
2
3
4
5
6
7
8
9
10
11
12
13
|
mysql> show variables like '%char%';
+--------------------------+-----------------------------------------------+
| Variable_name | Value |
+--------------------------+-----------------------------------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| character_sets_dir | /mysql-lin/mysql/share/charsets/ |
+--------------------------+-----------------------------------------------+
|
字符集编码参数说明和作用范围
变量 |
影响范围 |
默认编码 |
说明 |
character_set_client |
Global, Session |
utf8mb4 |
客户端请求的数据使用的字符集 |
character_set_connection |
Global, Session |
utf8mb4 |
连接传输过程中使用的字符集 |
character_set_database |
Global, Session |
utf8mb4 |
数据库的默认编码格式 |
character_set_filesystem |
Global, Session |
binary |
访问文件系统上的文件名时使用的编码格式 |
character_set_results |
Global, Session |
utf8mb4 |
给客户端返回的数据的编码格式 |
character_set_server |
Global, Session |
latin1 |
Mysql Server的默认编码格式 |
character_set_system |
Global |
utf8 |
数据库服务器存储使用的编码格式,不可动态变更 |
character_sets_dir |
Global |
dirname |
字符集安装的目录,不可动态变更 |
影响范围Global表示是全局配置,Session表示会影响会话连接传输的数据。
character_set_system
和character_sets_dir
是MySQL Server自身使用的编码格式,不会影响客户端会话。
在连接参数中使用--default-character-set=utf8
或命令行执行set names utf8
,会同时修改当前会话的character_set_client
、character_set_connection
和character_set_results
三个字符集配置。
设置字符集
创建数据库时指定字符集
1
|
CREATE DATABASE `testdb` CHARACTER SET utf8 COLLATE utf8_general_ci;
|
修改数据库字符集
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
|
# 设置数据库字符集编码
ALTER DATABASE `testdb` CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;
# 示例
mysql> show create database testdb;
+----------+-----------------------------------------------------------------+
| Database | Create Database |
+----------+-----------------------------------------------------------------+
| testdb | CREATE DATABASE `testdb` /*!40100 DEFAULT CHARACTER SET utf8 */ |
+----------+-----------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> alter database `testdb` CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;
Query OK, 1 row affected (0.01 sec)
mysql> show create database testdb;
+----------+--------------------------------------------------------------------+
| Database | Create Database |
+----------+--------------------------------------------------------------------+
| testdb | CREATE DATABASE `testdb` /*!40100 DEFAULT CHARACTER SET utf8mb4 */ |
+----------+--------------------------------------------------------------------+
1 row in set (0.00 sec)
|
创建表指定表及字段字符集
1
|
CREATE TABLE tb_name( col_name VARCHAR(20) CHARACTER SET utf8 COLLATE utf8_general_ci) CHARACTER SET utf8 COLLATE utf8_general_ci;
|
修改表及字段字符集
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
|
# 修改表字符集
alter table `tb_name` default character set utf8mb4;
# 修改某字段字符集
alter table `tb_name` modify col_name varchar(20) character set utf8mb4;
# 同时修改表及字段字符集
alter table `tb_name` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;
# 如果某个库里面表比较多 可以拼接出要执行的批量修改语句
SELECT
CONCAT( 'ALTER TABLE ', TABLE_NAME, ' CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;' )
FROM
information_schema.`TABLES`
WHERE
TABLE_SCHEMA = 'testdb';
|
character set与collation的关系
MySQL5.7中存在41 charset和222种 collation
character set即字符集
collation即比对方法,相同字符集的数据的排序规则
每个character set会对应一定数量的collation。
collation名字的规则分两类:
<character set>_<language/other>_<ci/cs>
<character set>_bin
ci是case insensitive的缩写,cs是case sensitive的缩写,指大小写是否敏感。
查看当前支持的字符集
1
2
3
4
5
6
7
8
9
|
mysql> show character set;
+----------+---------------------------------+---------------------+--------+
| Charset | Description | Default collation | Maxlen |
+----------+---------------------------------+---------------------+--------+
| big5 | Big5 Traditional Chinese | big5_chinese_ci | 2 |
| latin1 | cp1252 West European | latin1_swedish_ci | 1 |
| gbk | GBK Simplified Chinese | gbk_chinese_ci | 2 |
| utf8 | UTF-8 Unicode | utf8_general_ci | 3 |
| utf8mb4 | UTF-8 Unicode | utf8mb4_general_ci | 4 |
|
查看utf8字符集的collation排序规则列表
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
|
mysql> show collation like 'utf8%'
+--------------------------+---------+-----+---------+----------+---------+
| Collation | Charset | Id | Default | Compiled | Sortlen |
+--------------------------+---------+-----+---------+----------+---------+
| utf8_general_ci | utf8 | 33 | Yes | Yes | 1 |
| utf8_bin | utf8 | 83 | | Yes | 1 |
| utf8_unicode_ci | utf8 | 192 | | Yes | 8 |
| utf8_icelandic_ci | utf8 | 193 | | Yes | 8 |
| utf8_latvian_ci | utf8 | 194 | | Yes | 8 |
| utf8_romanian_ci | utf8 | 195 | | Yes | 8 |
| utf8_slovenian_ci | utf8 | 196 | | Yes | 8 |
| utf8_polish_ci | utf8 | 197 | | Yes | 8 |
| utf8_estonian_ci | utf8 | 198 | | Yes | 8 |
| utf8_spanish_ci | utf8 | 199 | | Yes | 8 |
| utf8_swedish_ci | utf8 | 200 | | Yes | 8 |
| utf8_turkish_ci | utf8 | 201 | | Yes | 8 |
| utf8_czech_ci | utf8 | 202 | | Yes | 8 |
| utf8_danish_ci | utf8 | 203 | | Yes | 8 |
| utf8_lithuanian_ci | utf8 | 204 | | Yes | 8 |
| utf8_slovak_ci | utf8 | 205 | | Yes | 8 |
| utf8_spanish2_ci | utf8 | 206 | | Yes | 8 |
| utf8_roman_ci | utf8 | 207 | | Yes | 8 |
| utf8_persian_ci | utf8 | 208 | | Yes | 8 |
| utf8_esperanto_ci | utf8 | 209 | | Yes | 8 |
| utf8_hungarian_ci | utf8 | 210 | | Yes | 8 |
| utf8_sinhala_ci | utf8 | 211 | | Yes | 8 |
| utf8_german2_ci | utf8 | 212 | | Yes | 8 |
| utf8_croatian_ci | utf8 | 213 | | Yes | 8 |
| utf8_unicode_520_ci | utf8 | 214 | | Yes | 8 |
| utf8_vietnamese_ci | utf8 | 215 | | Yes | 8 |
| utf8_general_mysql500_ci | utf8 | 223 | | Yes | 1 |
|
utf8
编码的默认排序规则为utf8_general_ci
参考