Python对FTP上的文件夹进行遍历

2023-05-31 | 阅读：次

遍历FTP服务器上的文件，获取文件列表，通过这个文件列表，我们可以对其进行筛选、编辑，从而可以快速地获取文件的下载链接，通过其它比较成熟的软件批量下载这些文件，从而节省时间。

以下的例子是遍历Pubmed Central 的文件列表。

# -*- coding: utf-8 -*-
"""
Created on Wed May 10 17:35:48 2023

@author: Administrator
"""

import ftplib

def iterFTP(host="ftp.ncbi.nlm.nih.gov",username='',password='', 
            toiterDir = "/pub/pmc/oa_bulk/oa_comm/txt/",
            saveFile = 'data.txt'):
    # 连接FTP服务器
    ftp = ftplib.FTP(host, timeout=60)
    ftp.login(username, password) 
    
    # 切换到要遍历的目录
    ftp.cwd(toiterDir)
    
    # 获取目录中的文件列表
    file_list = ftp.nlst()
    
    # 存储文件列表
    wholeFiles = []
    
    # 递归地遍历子目录
    def walk(ftp, path="/"):
        print(path)
        try:
            ftp.cwd(path)
        except ftplib.error_perm as e:
            # 忽略无法进入的目录
            return

        files = ftp.nlst()
        for file in files:
            if "." in file:
                # 文件
                # print(path + file)
                wholeFiles.append(path + file)
            else:
                # 子目录
                walk(ftp, path + file + "/")
    
    # 开始遍历
    walk(ftp,toiterDir)
    
    # 关闭FTP连接
    ftp.quit()
    
    # 保存遍历的文件列表
    with open(saveFile, 'w', encoding='utf-8') as fh:
        for item in wholeFiles:
            fh.write("%s\n" % item)
    
    
if __name__ == "__main__":
    HOST = "ftp.ncbi.nlm.nih.gov"
    iterFTP(host=HOST,username='',password='', 
                toiterDir = "/pub/pmc/oa_pdf/",
                saveFile = 'PMC_oa_pdf_filesList.txt')

Xiaowei

远志

Python对FTP上的文件夹进行遍历