Shell

shell一行命令搞定，但对于带点号的目录，这种方法会出Bug。应该可以对此改进，但后面找到了Python的方式。

find -type f -name "*.*" | cut -f3 -d'.' | sort | uniq -c -i  (递归查找当前文件夹下的所有子文件夹)

下面对该命令进行分解解释。

find

find命令参数说明

-type c
File is of type c:
d directory
f regular file
-name pattern
Base of file name (the path with the leading directories removed) matches shell pattern pattern.

只查找文件，而不需要关心文件夹，所以使用-type f参数(注意，在linux上会有链接文件、块文件等其它格式的文件类型，而且Window系统上的快捷方式其格式也是一般文件)
只过滤出有后缀名的文件，所以使用-name “.“参数(注意，使用-name “.“参数也会过滤出.name和name.这样的文件，因此要严格过滤出.且”.”前后都有字符的文件，可以使用-regex “./.+..+”参数，具体作用请百度“find正则表达式”)

cut

cut命令参数说明

-f, —fields=LIST
select only these fields; also print any line that contains no delimiter character, unless the -s option is specified
-d, —delimiter=DELIM
use DELIM instead of TAB for field delimiter

-f3表示截取第三部分

sort和uniq

sort命令很好理解，就是对前面的输出进行一下排序，以便与uniq命令操作。

uniq命令说明

uniq - report or omit repeated lines

-c, —count
prefix lines by the number of occurrences
-i, —ignore-case
ignore differences in case when comparing

uniq的作用就是找到连续重复的行
-c 统计次数
-i 忽略大小写

Python

import os
import argparse
import re

parser = argparse.ArgumentParser(description="count suffix from dir")
# 设置脚本传参，传入需要统计后缀名的文件夹
parser.add_argument('-d', '--directory', required=True, type=str, help='need a full path')
args = parser.parse_args()
directory = args.directory
# 结果dict
res = {}

print("suffix : count")
# print(directory)
# 如果目录不存在则退出
if not os.path.exists(directory):
    print("dir does not exist")
    exit(1)
# 如果传递的不是目录则退出
if not os.path.isdir(directory):
    print("need a dir not a file")
    exit(1)
'''
os.walk得到的三元组列表：分别指出了目录，目录下目录列表，目录下的文件列表
('.', ['dd', 'll.dir'], ['1.jpg'])
('./dd', ['e'], ['2.jpg'])
('./dd/e', [], ['3.jpg', '4.jpg', '5.txt'])
('./ll.dir', [], ['5.rar'])
'''
for path, dirs, files in os.walk(directory):
    # print(str(files))
    for file in files:
        file = file.lower()  # 都统一转换成小写
        # 如果没有后缀名则不纳入统计范围
        if not re.match(r'[\s\S]*\.[\s\S]*', file):
            continue
        # 得到后缀字符串
        suffix = file.split(".")[-1]
        # 从字典中取出后缀对应的count，如果没有则默认为0
        count = res.setdefault(suffix, 0)
        # count ++
        count += 1
        # 将更新的count放回字典
        res[suffix] = count

# print
for key in res.keys():
    print(key + " : " + str(res[key]))

将上述代码保存为.py文件，然后命令行执行python xxx.py -d <目录>

总结

本文涉及到的知识点：

Shell
- find命令，查询目录/文件
- cut命令，分割字符串
- sort命令，排序
- uniq命令，找到连续重复的行
Python
- dict 基本操作
- argparse库，脚本传参
- os库，读目录/文件
- re库，正则匹配

Shell

find

cut

sort和uniq

Python

总结

参考资料