MYSQL中count(*)、count(1)、count(col)之间的差异学习心得分享

文章目录[隐藏]

MYISAM引擎表
INNODB引擎表

在昨天的一篇闲聊《说说心里话》中，提到了面试中经常会被面试官问到的一个知识点：MYSQL中count(*)、count(1)、count(col)之间的差异；

如何才能够完美给面试官呈现出一个比较好的答案呢？下面我们就来详细的进行介绍；网上关于count(*)、count(1)、count(col)之间的差异的文章非常多，但是看完网上的回答，你会更加的迷惑，至少我是这样的，所以，我需要通过实验去进行验证；

在SQL查询中一个比较常见查询应当是COUNT操作。对于带WHERE子句的查询没太多可说的，有索引就用索引，没有索引——嘿嘿，累坏机器啊！！！而不带WHERE子句简单COUNT查询的，比如count(*)、count(1)、count(col)之间又有什么区别呢？

下面我们就通过一组实验来验证一下，这三者之间到底有什么区别？

前提说明：

1、实验环境：MYSQL 5.7.19

root@localhost [wjq]>select @@version;
+------------+
| @@version  |
+------------+
| 5.7.19-log |
+------------+
1 row in set (0.00 sec)

2、为了验证是否在不同存储引擎之间是否也存在区别，下面的一组实验分别在MYISAM和INNODB存储引擎表做的测试；

MYISAM引擎表

1、创建一张MYISAM测试表（无主键，无索引）

root@localhost [wjq]>CREATE TABLE `wjq_myisam_count1` (
    -> `id` int,
    -> `c1` varchar(10) DEFAULT NULL,
    -> `c2` varchar(10) DEFAULT NULL
    -> ) ENGINE=MYISAM;
Query OK, 0 rows affected (0.00 sec)

2、创建批量插入数据的存储过程

root@localhost [wjq]>DELIMITER //
root@localhost [wjq]>CREATE PROCEDURE count_insert (x int(10),y int(10),z int(10))
    -> BEGIN
    -> DECLARE i INT DEFAULT x;
    -> WHILE i DO
    -> insert into wjq_myisam_count1 values(i,rand_string(z),rand_string(z));
    -> SET i=i+1;
    -> END WHILE ;
    -> commit;
    -> END //
Query OK, 0 rows affected (0.01 sec)

3、使用存储过程批量插入数据

root@localhost [wjq]>call count_insert(1,100000,8);
Query OK, 0 rows affected (1 min 6.26 sec)

4、分别查看count(*)、count(1)、count(col)的执行计划

explain select count(*) from wjq_myisam_count1\G;
explain select count(1) from wjq_myisam_count1\G;

explain select count(id) from wjq_myisam_count1\G;

通过上面三个执行计划对比发现：

在MYISAM表中，当表中没有主键和索引的时候，count(*)和count(1)的结果都是一样的，Extra列输出了”Select tables optimized away”语句。很明显，MYISAM引擎表已经保存了记录的总数，直接返回结果；而count(col)还需要进行全表扫描。

5、为表中的id列增加主键

root@localhost [wjq]>alter table wjq_myisam_count1 add primary key(`id`);
Query OK, 100002 rows affected (0.15 sec)
Records: 100002  Duplicates: 0  Warnings: 0

6、增加完主键之后，继续查看执行计划

explain select count(*) from wjq_myisam_count1\G;
explain select count(1) from wjq_myisam_count1\G;
explain select count(id) from wjq_myisam_count1\G;

explain select count(c2) from wjq_myisam_count1\G;

通过上面四个执行计划对比发现：

在MYISAM表中，当表中只有主键的时候，count(*)和count(1)的效果都是一样的；但是count(col)出现了区别，也就是:

如果col是主键，那么count(col)和count(*)和count(1)的效果是一样的，直接返回结果；

如果col不是普通列，那么count(col)还是需要进行全表扫描。

7、下面为c1列增加一个普通索引

root@localhost [wjq]>alter table wjq_myisam_count1 add index idx_c1(`c1`);
Query OK, 100002 rows affected (0.27 sec)
Records: 100002  Duplicates: 0  Warnings: 0

8、继续查看相关的执行计划

explain select count(*) from wjq_myisam_count1\G;
explain select count(1) from wjq_myisam_count1\G;
explain select count(id) from wjq_myisam_count1\G;

explain select count(c1) from wjq_myisam_count1\G;

通过上面四个执行计划对比发现：

在MYISAM表中，当表中即有主键由于普通索引的时候，count(*)和count(1)、count(主键列)的效果都是一样的，直接返回结果；count(普通索引列)使用普通索引；如果col是普通列，那么count(col)还是需要进行全表扫描。

INNODB引擎表

1、创建一张INNODB引擎测试表（无主键，无索引）

root@localhost [wjq]>CREATE TABLE `wjq_innodb_count1` (
    -> `id` int,
    -> `c1` varchar(10) DEFAULT NULL,
    -> `c2` varchar(10) DEFAULT NULL
    -> ) ENGINE=INNODB;
Query OK, 0 rows affected (0.09 sec)

2、创建批量插入数据的存储过程

root@localhost [wjq]>DELIMITER //
root@localhost [wjq]>CREATE PROCEDURE count_insert_innodb (x int(10),y int(10),z int(10))
    -> BEGIN
    -> DECLARE i INT DEFAULT x;
    -> WHILE i DO
    -> insert into wjq_innodb_count1 values(i,rand_string(z),rand_string(z));
    -> SET i=i+1;
    -> END WHILE ;
    -> commit;
    -> END //
Query OK, 0 rows affected (0.00 sec)
root@localhost [wjq]>DELIMITER ;

3、调用存储过程进行批量插入数据

root@localhost [wjq]>call count_insert_innodb(1,100000,8);
Query OK, 0 rows affected (1 min 47.67 sec)

4、查询count的执行计划

explain select count(*) from wjq_innodb_count1\G;
explain select count(1) from wjq_innodb_count1\G;
explain select count(id) from wjq_innodb_count1\G;

通过上面三个执行计划对比发现：

在INNODB表中，当表中即没有主键也没有索引的时候，count(*)和count(1)、count(col)的效果都是一样的，都需要进行全表扫描，如果数据量大的时候，查询效果可想而知。

5、为INNODB表的id列增加主键

root@localhost [wjq]>alter table wjq_innodb_count1 add primary key(`id`);
Query OK, 0 rows affected (0.26 sec)
Records: 0  Duplicates: 0  Warnings: 0

6、接着继续查看count查看的执行计划

explain select count(*) from wjq_innodb_count1\G;
explain select count(1) from wjq_innodb_count1\G;
explain select count(id) from wjq_innodb_count1\G;

explain select count(c1) from wjq_innodb_count1\G;

通过上面四个执行计划对比发现：

在INNODB表中，当表中只有主键的时候，count(*)和count(1)、count(主键列)的效果都是一样的，都会走主键索引；count(普通列)如果col是普通列，那么count(col)还是需要进行全表扫描。

7、为INNODB表的c1列增加普通索引

root@localhost [wjq]>alter table wjq_innodb_count1 add index idx_c1(`c1`);
Query OK, 0 rows affected (0.25 sec)
Records: 0  Duplicates: 0  Warnings: 0

8、查看count的执行计划

explain select count(*) from wjq_innodb_count1\G;
explain select count(1) from wjq_innodb_count1\G;
explain select count(id) from wjq_innodb_count1\G;
explain select count(c1) from wjq_innodb_count1\G;

通过上面四个执行计划对比发现：

在INNODB表中，当表中即有主键又有普通索引的时候，count(*)和count(1)、count(主键列)、count(普通索引列)的效果都是一样的，都会走普通索引；如果col是普通列，那么count(col)还是需要进行全表扫描。

到这里我就有一个很大的疑惑，为什么既有主键又有普通索引的情况下， count(*)、 count(1)、 count(主键列)都走普通索引，而不走主键索引呢？详细请参考文末官方文档的说明；

9、删除主键索引，只保留普通索引

root@localhost [wjq]>alter table wjq_innodb_count1 drop primary key;
Query OK, 100001 rows affected (0.82 sec)
Records: 100001  Duplicates: 0  Warnings: 0

10、在只有普通索引的情况下，查看count操作的执行计划

explain select count(*) from wjq_innodb_count1\G;
explain select count(1) from wjq_innodb_count1\G;
explain select count(c1) from wjq_innodb_count1\G;

explain select count(id) from wjq_innodb_count1\G;

通过上面四个执行计划对比发现：

在INNODB表中，当表中只有普通索引的时候，count(*)和count(1)会走普通索引，和count(普通索引列)的效果都是一样的，都会走普通索引；但是如果col是普通列，那么count(col)还是需要进行全表扫描。

如下是MYSQL官方文档中相关的描述：https://dev.mysql.com/doc/refman/5.7/en/group-by-functions.html#function_count

COUNT(*) is somewhat different in that it returns a count of the number of rows retrieved, whether or not they contain NULL values.
For transactional storage engines such as InnoDB, storing an exact row count is problematic. Multiple transactions may be occurring at the same time, each of which may affect the count.
InnoDB does not keep an internal count of rows in a table because concurrent transactions might “see” different numbers of rows at the same time. Consequently, SELECT COUNT(*) statements only count rows visible to the current transaction.
Prior to MySQL 5.7.18, InnoDB processes SELECT COUNT(*) statements by scanning the clustered index. As of MySQL 5.7.18, InnoDB processes SELECT COUNT(*)statements by traversing the smallest available secondary index unless an index or optimizer hint directs the optimizer to use a different index. If a secondary index is not present, the clustered index is scanned.
Processing SELECT COUNT(*) statements takes some time if index records are not entirely in the buffer pool. For a faster count, create a counter table and let your application update it according to the inserts and deletes it does. However, this method may not scale well in situations where thousands of concurrent transactions are initiating updates to the same counter table. If an approximate row count is sufficient, use SHOW TABLE STATUS.
InnoDB handles SELECT COUNT(*) and SELECT COUNT(1) operations in the same way. There is no performance difference.
For MyISAM tables, COUNT(*) is optimized to return very quickly if the SELECT retrieves from one table, no other columns are retrieved, and there is no WHERE clause.

This optimization only applies to MyISAM tables, because an exact row count is stored for this storage engine and can be accessed very quickly. COUNT(1) is only subject to the same optimization if the first column is defined as NOT NULL.

COUNT(*)它返回检索的行数的计数，无论它们是否包含NULL值。

对于事务性存储引擎，如InnoDB，存储精确的行数是有问题的。多个事务可能同时发生，每个事务都可能影响计数。InnoDB不保留表中的内部行数，因为并发事务可能同时“看到”不同数量的行。因此，SELECT COUNT(*)语句只计算当前事务可见的行。

在MySQL 5.7.18之前，通过扫描聚簇索引来InnoDB处理 SELECT COUNT(*)语句。从MySQL 5.7.18开始，通过遍历最小的可用二级索引来InnoDB处理SELECT COUNT(*)语句，除非索引或优化器提示指示优化器使用不同的索引。如果不存在辅助索引，则扫描聚簇索引。

SELECT COUNT(*)如果索引记录不完全在缓冲池中，则处理语句需要一些时间。为了更快地计算，请创建一个计数器表，让应用程序根据插入和删除更新它。但是，在数千个并发事务正在启动对同一计数器表的更新的情况下，此方法可能无法很好地扩展。如果要求不是很精确的统计，可以考虑使用 SHOW TABLE STATUS。

InnoDB以相同的方式处理SELECT COUNT(*)和SELECT COUNT(1)操作。没有性能差异。

对于MyISAM表，COUNT(*)如果SELECT从一个表检索，没有检索到其他列，并且没有WHERE子句，则优化为非常快速地返回。

此优化仅适用于MyISAM表，因为为此存储引擎存储了精确的行数，并且可以非常快速地访问。COUNT(1)如果第一列定义为，则仅受相同优化的影响NOT NULL。

相关建议

如果你的COUNT查询只打算在MyISAM引擎上执行，直接使用上面的语句没有性能问题，无需特殊的优化。而如果在InnoDB上执行，或者不确定以后会不会在InnoDB上执行的话，需要考虑一下其中的性能问题。

如果不是要求很精确的统计，安装官方文档的建议是可以使用如下的语句：SHOW TABLE STATUS LIKE ‘tablename’;

对于MyISAM引擎，该查询返回的行数是准确值；对于InnoDB，该查询返回的行数是一个近似值（事实上，对于InnoDB，该查询返回的各字段值绝大多数是近似值）。

当然，由于”SHOW TABLE STATUS”语句是MySQL特有的语句，而不是标准SQL语句。出于某些考量，这个方案无法接受，那么为了性能另一个建议是建立一个计数表，存放各种COUNT计数。