python实现WordCount--688IT编程网

python实现WordCount

Github项⽬地址：

⼀、WC 项⽬要求

< 是⼀个常见的⼯具，它能统计⽂本⽂件的字符数、单词数和⾏数。这个项⽬要求写⼀个命令⾏程序，模仿已有wc.exe 的功能，并加以扩充，给出某程序设计语⾔源⽂件的字符数、单词数和⾏数。

实现⼀个统计程序，它能正确统计程序⽂件中的字符数、单词数、⾏数，以及还具备其他扩展功能，并能够快速地处理多个⽂件。

具体功能要求：

程序处理⽤户需求的模式为：

< [parameter] [file_name]

基本功能列表：

< -c file.c //返回⽂件 file.c 的字符数

< -w file.c //返回⽂件 file.c 的词的数⽬

< -l file.c //返回⽂件 file.c 的⾏数

扩展功能：

-s 递归处理⽬录下符合条件的⽂件。

-a 返回更复杂的数据（代码⾏ / 空⾏ / 注释⾏）。

空⾏：本⾏全部是空格或格式控制字符，如果包括代码，则只有不超过⼀个可显⽰的字符，例如“{”。

代码⾏：本⾏包括多于⼀个字符的代码。

注释⾏：本⾏不是代码⾏，并且本⾏包括注释。⼀个有趣的例⼦是有些程序员会在单字符后⾯加注释：

} //注释

在这种情况下，这⼀⾏属于注释⾏。

[file_name]: ⽂件或⽬录名，可以处理⼀般通配符。

⾼级功能：

-x 参数。这个参数单独使⽤。如果命令⾏有这个参数，则程序会显⽰图形界⾯，⽤户可以通过界⾯选取单个⽂件，程序就会显⽰⽂件的字符数、⾏数等全部统计信息。

需求举例：

wc.exe -s -a *.c

返回当前⽬录及⼦⽬录中所有*.c ⽂件的代码⾏数、空⾏数、注释⾏数。

⼆、解题思路

使⽤的编程语⾔：python；

使⽤⼯具及库：pycharm，os，re，tkinter；

使⽤的新知识：⽂件的读取，简单ui界⾯的编写，git base的使⽤；

思路：根据题⽬来看，涉及的问题主要有通过命令进⾏功能选择、⽂件的定位与读取、⽂本信息的提取、可视界⾯的设计，⼀个功能接着⼀个功能的完成完善。

三、设计实现过程

根据我的思路，我先后完成的是命令识别、基础需求函数、拓展需求函数、GUI界⾯。

命令识别⽤的是最笨的字符串截取，例如指令-，通过截取得到-c，, .txt三个部分，分别对应的是指令关键字，⽂件名，⽂件拓展名；接下来是函数的编写，函数的参数由四部分组成，⽂件路径path，⽂件名target，⽂件拓展名file_extension，模式model，model 参数是为了识别-s参数。GUI界⾯通过python的tkinter实现，（最初设想的是gui也调⽤函数⽂件中的函数，但由于函数功能简单，直接在gui ⽂件中重写了⼀遍，有投机取巧的嫌疑）。

四、关键代码与设计说明

1、实现功能选择

存在问题：由于语⾔知识掌握不够，只想到⼀种简单的思路。

print('输⼊进⾏操作的路径')

path = input()

print('输⼊命令：')

str1 = input()

point = str1.find('.')

# ⽂件拓展名

file_extension = str1[point:]

if str1[0:2] == '-s':

# ⽂件名

target = str1[6:]

# 指令名

order = str1[3:5]

model = '-s'

else:

# ⽂件名

target = str1[3:]

# 指令名

order = str1[0:2]

model = 'normal'

if order == '-c':

wf.c(path, target, file_extension, model)

elif order == '-w':

wf.w(path, target, file_extension, model)

elif order == '-l':

wf.l(path, target, file_extension, model)

elif order == '-a':

wf.a(path, target, file_extension, model)

elif order == '-x':

os.system("GUI.py")

else:

print('⾮法输⼊')

2、获取⽤户所查的⽂件

# 获取⽬录下所有后缀为txt的⽂件和它的路径

def file_name(path, extension):

l1 = []

l2 = []

for root, dirs, files in os.walk(path):

for file in files:

# 指定后缀的⽂件并将路径与⽂件名存⼊列表中

if os.path.splitext(file)[1] == extension:

l1.append(file)

l2.append(root)

return l1, l2

# 到⽤户所选择的⽂件并返回它的路径

def find_target(path, target, extension, model=''):

count = -1

filename, root = file_name(path, extension)

for i in filename:

count = count + 1

# 查⽂件

if target == i:

return os.path.join(root[count], filename[count])

# 返回⽂件绝对路径

3、基本功能的实现（包含拓展功能s）

字符串截取工具存在问题：三个功能的主体基本⽆差别，可以整合为⼀个函数以节约资源，事先未考虑到，应该在设计之初更周全的考虑。# 功能c

def c(path, target, file_extension, model):

# s模式

if model == '-s':

# 获取⽬录下⽂件名以及其路径

filename, root = file_name(path, file_extension)

# 对列表中的每⼀个⽂件进⾏操作

for i in range(len(filename)):

file_path = os.path.join(root[i], filename[i])

file = open(file_path, encoding="UTF-8")

list2 = ad()

print(filename[i], ':Char number->', place('\n', '')))

# 普通模式

else:

# 获取⽬录下⽂件名以及其路径

file_path = find_target(path, target, file_extension)

file = open(file_path, encoding="UTF-8")

list1 = ad()

print('Char number->', place('\n', '')))

# 功能w

def w(path, target, file_extension, model):

# s模式

if model == '-s':

# 获取⽬录下⽂件名以及其路径

filename, root = file_name(path, file_extension)

# 对列表中的每⼀个⽂件进⾏操作

for i in range(len(filename)):

file_path = os.path.join(root[i], filename[i])

file = open(file_path, encoding="UTF-8")

print(filename[i], ':word number->', len(re.split(r'[^a-zA-Z]+', ad())))

# 普通模式

else:

# 获取⽬录下⽂件名以及其路径

file_path = find_target(path, target, file_extension)

file = open(file_path, encoding="UTF-8")

print('word number->', (len(re.split(r'[^a-zA-Z]+', ad()))-1))

# 功能l

def l(path, target, file_extension, model):

# s模式

if model == '-s':

filename, root = file_name(path, file_extension)

for i in range(len(filename)):

file_path = os.path.join(root[i], filename[i])

file = open(file_path, encoding="UTF-8")

print(filename[i], ':line number->', (adlines())))

# 普通模式

else:

file_path = find_target(path, target, file_extension)

file = open(file_path, encoding="UTF-8")

print('line number->', adlines()))

4、拓展功能a的实现

# 功能a

def a(path, target, file_extension, model):

code_line = 0

blank_line = 0

comment_line = 0

if model == '-s':

filename, root = file_name(path, file_extension)

for i in range(len(filename)):

# 计数清零，进⾏下⼀轮统计

code_line = 0

blank_line = 0

comment_line = 0

file_path = os.path.join(root[i], filename[i])

file = open(file_path, encoding="UTF-8")

for line adlines():

line = line.strip()

# 空⾏统计

if not len(line):

blank_line += 1

# 注释统计

elif line.startswith('#'):

comment_line += 1

elif line.startswith('//'):

comment_line += 1

# 代码⾏统计

elif len(line) > 1:

code_line += 1

print(filename[i], 'blank_line->', blank_line, ', comment_line->', comment_line

, ', code_line->', code_line)

else:

file_path = find_target(path, target, file_extension)

file = open(file_path, encoding="UTF-8")

for line adlines():

# 去除空格

line = line.strip()

# 空⾏统计

if not len(line):

blank_line += 1

# 注释统计

elif line.startswith('#'):

comment_line += 1

elif line.startswith('//'):

comment_line += 1

# 代码⾏统计

elif len(line) > 1:

code_line += 1

print('blank_line->', blank_line, ', comment_line->', comment_line

, ', code_line->', code_line)

5、GUI功能实现

存在问题：第⼀次接触UI界⾯编写，没有深⼊学习，据同学了解tkinter这个库是⽐较不好⽤的，后续会学习更多的可视界⾯编写⼯具def enterorder():

order = en.get()

btn = Button(tiw, text="choose your file", command=choose_file(order))

btn.pack()

# 选择⽂件

def choose_file(order):

global tiw

file_path = filedialog.askopenfilename()

f(order, file_path)

# 实现基本操作

def f(order, file_path):

global tiw

out_put = Listbox(tiw)

if order == '-c':

file = open(file_path, 'r')

list1 = ad()

str_out = ('Char number->', place('\n', '')))

elif order == '-w':

file = open(file_path, 'r')

str_out = ('word number->', len(re.split(r'[^a-zA-Z]+', ad())))

elif order == '-l':

file = open(file_path, 'r')

str_out = ('line number->', adlines()))

out_put.insert(0, str_out)

out_put.pack()

tiw = Tk()

tiw.title("wc.exe")

en = Entry(tiw, show=None)

en.pack()

btn = Button(tiw, text="输⼊指令", command=enterorder).pack()

la = Label(tiw).pack()

tiw.mainloop()

五、测试运⾏截图

测试⽤⽂件

路径：D:\Project\wc\

⽂件夹结构

#test

#------test.c

#-------test1

#-----------test1.c

#-------test2

#-----------test2.c

# test_funtion_a

# ------ typical.c

# -------1

# -----------typical1.c

# -------2

# -----------typical2.c

⽂件内容

test.c：

hello,how are you？

uhh,not bad.

cool.

　test1.c：

hey，nice to meet you.

me too.

test2.c：

　hello,how old are you？

what？

1、基础功能测试

test1. test2. test3.

2、拓展功能测试

688IT编程网

python实现WordCount

发表评论

推荐文章

随机森林算法介绍及R语言实现

基于随机森林优化的神经网络算法在冬小麦产量预测中的应用研究_百度文 ...

基于正则化贪心森林算法的情感分析方法研究

随机森林算法和grandientboosting算法

基于随机森林的图像分类算法研究

热门文章

随机森林算法的改进方法

基于随机森林算法的风险预警模型研究

Python中的随机森林算法详解

随机森林发展历史

如何使用随机森林进行时间序列数据模式识别(八)

随机森林回归模型原理

如何使用随机森林进行时间序列数据模式识别(六)

如何使用随机森林进行时间序列数据预测(四)

如何使用随机森林进行异常检测(六)

随机森林算法和grandientboosting算法 -回复

随机森林方法总结全面

随机森林算法原理和步骤

随机森林的原理

随机森林重要性

随机森林算法

机器学习中随机森林的原理

随机森林算法原理

使用计算机视觉技术进行动物识别的技巧

基于crf命名实体识别实验总结

transformer预测模型训练方法

最新文章

随机森林算法介绍及R语言实现

基于随机森林优化的神经网络算法在冬小麦产量预测中的应用研究_百度文 ...

基于正则化贪心森林算法的情感分析方法研究

随机森林算法和grandientboosting算法

基于随机森林的图像分类算法研究

随机森林结合直接正交信号校正的模型传递方法

标签列表

688IT编程网

python实现WordCount

发表评论

推荐文章

随机森林算法介绍及R语言实现

基于随机森林优化的神经网络算法在冬小麦产量预测中的应用研究_百度文 ...

基于正则化贪心森林算法的情感分析方法研究

随机森林算法和grandientboosting算法

基于随机森林的图像分类算法研究

热门文章

随机森林算法的改进方法

基于随机森林算法的风险预警模型研究

Python中的随机森林算法详解

随机森林发展历史

如何使用随机森林进行时间序列数据模式识别(八)

随机森林回归模型原理

如何使用随机森林进行时间序列数据模式识别(六)

如何使用随机森林进行时间序列数据预测(四)

如何使用随机森林进行异常检测(六)

随机森林算法和grandientboosting算法 -回复

随机森林方法总结全面

随机森林算法原理和步骤

随机森林的原理

随机森林 重要性

随机森林算法

机器学习中随机森林的原理

随机森林算法原理

使用计算机视觉技术进行动物识别的技巧

基于crf命名实体识别实验总结

transformer预测模型训练方法

最新文章

随机森林算法介绍及R语言实现

基于随机森林优化的神经网络算法在冬小麦产量预测中的应用研究_百度文 ...

基于正则化贪心森林算法的情感分析方法研究

随机森林算法和grandientboosting算法

基于随机森林的图像分类算法研究

随机森林结合直接正交信号校正的模型传递方法

标签列表

随机森林重要性