[Python] 如何列出目錄的所有文件?


Answers

我更喜歡使用glob模塊,因為它可以進行模式匹配和擴展。

import glob
print(glob.glob("/home/adam/*.txt"))

將返回一個包含查詢文件的列表:

['/home/adam/file1.txt', '/home/adam/file2.txt', .... ]
Question

我如何在Python中列出目錄中的所有文件並將它們添加到list




Returning a list of absolute filepaths, does not recurse into subdirectories

L = [os.path.join(os.getcwd(),f) for f in os.listdir('.') if os.path.isfile(os.path.join(os.getcwd(),f))]



獲取文件列表

我在這裡也做了一個短片: Video

os.listdir():獲取當前目錄中的文件(Python 3)

在Python 3中使用當前目錄中的文件最簡單的方法就是這樣。 這非常簡單,使用os模塊和listdir()函數,並且你將在該目錄中有文件(以及最終文件夾在目錄中,但是你不會在子目錄中有文件,因為你可以使用它走 - 我會在稍後再談)。

>>> import os
>>> arr = os.listdir()
>>> arr
['$RECYCLE.BIN', 'work.txt', '3ebooks.txt', 'documents']

獲取完整路徑名稱

正如你注意到的,你在上面的代碼中沒有完整的文件路徑。 如果你需要有絕對路徑,你可以使用os.path模塊的另一個叫做_getfullpathname的函數,把你從os.listdir()得到的文件作為參數。 還有其他方法可以獲得完整路徑,稍後我們會進行檢查(我將其替換為mexmex提供的,帶有abspath的_getfullpathname)。

>>> import os
>>> files_path = [os.path.abspath(x) for x in os.listdir())]
>>> files_path
['F:\\documenti\applications.txt', 'F:\\documenti\collections.txt']

通過walk獲取文件類型的完整路徑名到所有子目錄中

我發現這對於在許多目錄中找到東西非常有用,它幫助我找到了一個我不記得名字的文件:

import os

thisdir = os.getcwd()
for r, d, f in os.walk(thisdir):
    for file in f:
        if ".docx" in file:
            print(os.path.join(r, file))

os.listdir():獲取當前目錄中的文件(Python 2)

>>> import os
>>> arr = os.listdir('.')
>>> arr
['$RECYCLE.BIN', 'work.txt', '3ebooks.txt', 'documents']

在目錄樹中進入

>>> # method 1
>>> x = os.listdir('..')

# method 2
>>> x= os.listdir('/')

獲取文件:os.listdir()在一個特定的目錄(Python 2和3)

>>> import os
>>> arr = os.listdir('F:\\python')
>>> arr
['$RECYCLE.BIN', 'work.txt', '3ebooks.txt', 'documents']

使用os.listdir()獲取特定子目錄的文件

import os

x = os.listdir("./content")

os.walk('。') - 當前目錄

>>> import os
>>> arr = next(os.walk('.'))[2]
>>> arr
['5bs_Turismo1.pdf', '5bs_Turismo1.pptx', 'esperienza.txt']

glob模塊 - 所有文件

import glob
print(glob.glob("*"))

out:['content', 'start.py']

next(os.walk('。'))和os.path.join('dir','file')

>>> import os
>>> arr = []
>>> for d,r,f in next(os.walk("F:\_python)):
>>>     for file in f:
>>>         arr.append(os.path.join(r,file))
...
>>> for f in arr:
>>>     print(files)

>output

F:\\_python\\dict_class.py
F:\\_python\\programmi.txt

next(os.walk('F:\')) - 獲得完整路徑 - 列表理解

>>> [os.path.join(r,file) for r,d,f in next(os.walk("F:\\_python")) for file in f]
['F:\\_python\\dict_class.py', 'F:\\_python\\programmi.txt']

os.walk - 獲取完整路徑 - 子目錄中的所有文件

x = [os.path.join(r,file) for r,d,f in os.walk("F:\\_python") for file in f]

>>>x
['F:\\_python\\dict.py', 'F:\\_python\\progr.txt', 'F:\\_python\\readl.py']

os.listdir() - 只獲取txt文件

>>> arr_txt = [x for x in os.listdir() if x.endswith(".txt")]
>>> print(arr_txt)
['work.txt', '3ebooks.txt']

glob - 只獲取txt文件

>>> import glob
>>> x = glob.glob("*.txt")
>>> x
['ale.txt', 'alunni2015.txt', 'assenze.text.txt', 'text2.txt', 'untitled.txt']

使用glob獲取文件的完整路徑

如果我需要文件的絕對路徑:

>>> from path import path
>>> from glob import glob
>>> x = [path(f).abspath() for f in glob("F:\*.txt")]
>>> for f in x:
...  print(f)
...
F:\acquistionline.txt
F:\acquisti_2018.txt
F:\bootstrap_jquery_ecc.txt

其他使用glob

如果我想要目錄中的所有文件:

>>> x = glob.glob("*")

使用os.path.isfile避免列表中的目錄*

import os.path
listOfFiles = [f for f in os.listdir() if os.path.isfile(f)]
print(listOfFiles)

> output

['a simple game.py', 'data.txt', 'decorator.py']

使用(Python 3.4)中的pathlib

import pathlib

>>> flist = []
>>> for p in pathlib.Path('.').iterdir():
...  if p.is_file():
...   print(p)
...   flist.append(p)
...
error.PNG
exemaker.bat
guiprova.mp3
setup.py
speak_gui2.py
thumb.PNG

如果你想使用列表理解

>>> flist = [p for p in pathlib.Path('.').iterdir() if p.is_file()]

使用os.walk獲取所有和唯一的文件

import os
x = [i[2] for i in os.walk('.')]
y=[]
for t in x:
    for f in t:
        y.append(f)

>>> y
['append_to_list.py', 'data.txt', 'data1.txt', 'data2.txt', 'data_180617', 'os_walk.py', 'READ2.py', 'read_data.py', 'somma_defaltdic.py', 'substitute_words.py', 'sum_data.py', 'data.txt', 'data1.txt', 'data_180617']

獲取下一個文件並走到目錄中

>>> import os
>>> x = next(os.walk('F://python'))[2]
>>> x
['calculator.bat','calculator.py']

只獲取下一個目錄並走到目錄中

>>> import os
>>> next(os.walk('F://python'))[1] # for the current dir use ('.')
['python3','others']

**通過散步獲取所有子目錄名稱

>>> for r,d,f in os.walk("F:\_python"):
...  for dirs in d:
...   print(dirs)
...
.vscode
pyexcel
pyschool.py
subtitles
_metaprogramming
.ipynb_checkpoints

os.scandir()從python 3.5開始

>>> import os
>>> x = [f.name for f in os.scandir() if f.is_file()]
>>> x
['calculator.bat','calculator.py']

# Another example with scandir (a little variation from docs.python.org)
# This one is more efficient than os.listdir. 
# In this case, it shows the files only in the current directory 
# where the script is executed.

>>> import os
>>> with os.scandir() as i:
...  for entry in i:
...   if entry.is_file():
...    print(entry.name)
...
ebookmaker.py
error.PNG
exemaker.bat
guiprova.mp3
setup.py
speakgui4.py
speak_gui2.py
speak_gui3.py
thumb.PNG
>>>

防爆。 1:子目錄中有多少個文件?

在這個例子中,我們查找包含在所有目錄及其子目錄中的文件數量。

import os

def count(dir, counter=0):
    "returns number of files in dir and subdirs"
    for pack in os.walk(dir):
        for f in pack[2]:
            counter += 1
    return dir + " : " + str(counter) + "files"


print(count("F:\\python"))

> output

>'F:\\\python' : 12057 files'

例2:如何從一個目錄複製所有文件到另一個目錄?

一個腳本,用於在您的計算機中查找某個類型的所有文件(默認:pptx)並將它們複製到新文件夾中。

import os
import shutil
from path import path

destination = "F:\\file_copied"
# os.makedirs(destination)


def copyfile(dir, filetype='pptx', counter=0):
    "Searches for pptx (or other - pptx is the default) files and copies them"
    for pack in os.walk(dir):
        for f in pack[2]:
            if f.endswith(filetype):
                fullpath = pack[0] + "\\" + f
                print(fullpath)
                shutil.copy(fullpath, destination)
                counter += 1
    if counter > 0:
        print("------------------------")
        print("\t==> Found in: `" + dir + "` : " + str(counter) + " files\n")


for dir in os.listdir():
    "searches for folders that starts with `_`"
    if dir[0] == '_':
        # copyfile(dir, filetype='pdf')
        copyfile(dir, filetype='txt')


> Output

_compiti18\Compito Contabilità 1\conti.txt
_compiti18\Compito Contabilità 1\modula4.txt
_compiti18\Compito Contabilità 1\moduloa4.txt
------------------------
==> Found in: `_compiti18` : 3 files

防爆。 3:如何獲取txt文件中的所有文件

如果你想創建一個包含所有文件名的txt文件:

import os
mylist = ""
with open("filelist.txt", "w", encoding="utf-8") as file:
    for eachfile in os.listdir():
        mylist += eachfile + "\n"
    file.write(mylist)



你應該使用os模塊列出目錄內容。 os.listdir(".")返回目錄的所有內容。 我們遍歷結果並追加到列表中。

import os

content_list = []

for content in os.listdir("."): # "." means current directory
    content_list.append(content)

print content_list



Using generators

import os
def get_files(search_path):
     for (dirpath, _, filenames) in os.walk(search_path):
         for filename in filenames:
             yield os.path.join(dirpath, filename)
list_files = get_files('.')
for filename in list_files:
    print(filename)



I will provide a sample one liner where sourcepath and file type can be provided as input. The code returns a list of filenames with csv extension. Use . in case all files needs to be returned. This will also recursively scans the subdirectories.

[y for x in os.walk(sourcePath) for y in glob(os.path.join(x[0], '*.csv'))]

Modify file extensions and source path as needed.




If you care about performance, try scandir , for Python 2.x, you may need to install it manually. 例子:

# python 2.x
import scandir
import sys

de = scandir.scandir(sys.argv[1])
while 1:
    try:
        d = de.next()
        print d.path
    except StopIteration as _:
        break

This save a lot of time when you need to scan a huge directory, you do not need to buffer a huge list, just fetch one by one. And also you can do it recursively:

def scan_path(path):
    de = scandir.scandir(path)
    while 1:
        try:
            e = de.next()
            if e.is_dir():
                scan_path(e.path)
            else:
                print e.path
        except StopIteration as _:
                break



List all files in a directory:

import os
from os import path

files = [x for x in os.listdir(directory_path) if path.isfile(directory_path+os.sep+x)]

Here, you get list of all files in a directory.




從目錄及其所有子目錄獲取完整的文件路徑

import os

def get_filepaths(directory):
    """
    This function will generate the file names in a directory 
    tree by walking the tree either top-down or bottom-up. For each 
    directory in the tree rooted at directory top (including top itself), 
    it yields a 3-tuple (dirpath, dirnames, filenames).
    """
    file_paths = []  # List which will store all of the full filepaths.

    # Walk the tree.
    for root, directories, files in os.walk(directory):
        for filename in files:
            # Join the two strings in order to form the full filepath.
            filepath = os.path.join(root, filename)
            file_paths.append(filepath)  # Add it to the list.

    return file_paths  # Self-explanatory.

# Run the above function and store its results in a variable.   
full_file_paths = get_filepaths("/Users/johnny/Desktop/TEST")
  • 我在上面的函數中提供的路徑包含3個文件 - 其中兩個位於根目錄中,另一個位於名為“SUBFOLDER”的子文件夾中。 您現在可以執行以下操作:
  • print full_file_paths將打印列表的print full_file_paths

    • ['/Users/johnny/Desktop/TEST/file1.txt', '/Users/johnny/Desktop/TEST/file2.txt', '/Users/johnny/Desktop/TEST/SUBFOLDER/file3.dat']

如果你願意,你可以打開並閱讀內容,或只關注擴展名為“.dat”的文件,如下面的代碼所示:

for f in full_file_paths:
  if f.endswith(".dat"):
    print f

/Users/johnny/Desktop/TEST/SUBFOLDER/file3.dat




我真的很喜歡adamk的回答 ,建議你使用來自同名模塊的glob() 。 這使您可以與* s進行模式匹配。

但正如其他人在評論中指出的那樣, glob()可能因不一致的斜線方向而被絆倒。 為了解決這個問題,我建議你在os.path模塊中使用join()expanduser()函數,也可以在os模塊中使用getcwd()函數。

作為例子:

from glob import glob

# Return everything under C:\Users\admin that contains a folder called wlp.
glob('C:\Users\admin\*\wlp')

以上是可怕的 - 路徑已經硬編碼,並且只能在驅動器名稱和硬編碼路徑之間的Windows上工作。

from glob    import glob
from os.path import join

# Return everything under Users, admin, that contains a folder called wlp.
glob(join('Users', 'admin', '*', 'wlp'))

上述方法效果更好,但它依賴於Windows上經常使用的文件夾名稱,而在其他操作系統上不常見。 它也依賴於具有特定名稱admin的用戶。

from glob    import glob
from os.path import expanduser, join

# Return everything under the user directory that contains a folder called wlp.
glob(join(expanduser('~'), '*', 'wlp'))

這適用於所有平台。

另一個很好的例子,可以在各個平台上完美工作,並做一些有點不同的事情

from glob    import glob
from os      import getcwd
from os.path import join

# Return everything under the current directory that contains a folder called wlp.
glob(join(getcwd(), '*', 'wlp'))

希望這些例子能夠幫助您看到在標準Python庫模塊中可以找到的一些功能的強大功能。




這是一個簡單的例子:

import os
root, dirs, files = next(os.walk('.'))
for file in files:
    print(file) # In Python 3 use: file.encode('utf-8') in case of error.

Note: Change . to your path value or variable.

Here is the example returning list of files with absolute paths:

import os
path = '.' # Change this as you need.
abspaths = []
for fn in os.listdir(path):
    abspaths.append(os.path.abspath(os.path.join(path, fn)))
print("\n".join(abspaths))

Documentation: os and os.path for Python 2, os and os.path for Python 3.




Execute findfiles() with a directory as a parameter and it will return a list of all files in it.

import os
def findfiles(directory):
    objects = os.listdir(directory)  # find all objects in a dir

    files = []
    for i in objects:  # check if very object in the folder ...
        if os.path.isfile(os.path.join(directory, i)):  # ... is a file.
            files.append(i)  # if yes, append it.
    return files



Due to the fact that SO 's post (question / answer) limit is 30000 chars ( [Meta.SE]: Knowing Your Limits: What is the maximum length of a question title, post, image and links used? ),
this answer is a continuation of
[SO]: How do I list all files of a directory? (@CristiFati's answer - Part One)


Part Two

Solutions (continued)

Other approaches:

  1. Use Python only as a wrapper

    • Everything is done using another technology
    • That technology is invoked from Python
    • The most famous flavor that I know is what I call the sysadmin approach:

      • Use Python (or any programming language for that matter) in order to execute shell commands (and parse their outputs - in general this approach is to be avoided, since if some command output format slightly differs between OS versions/flavors, the parsing code should be adapted as well; not to mention non EN locales)
      • Some consider this a neat hack
      • I consider it more like a lame workaround ( gainarie ), as the action per se is performed from shell ( cmd in this case), and thus doesn't have anything to do with Python
      • Filtering ( grep / findstr ) or output formatting could be done on both sides, but I'm not going to insist on it. Also, I deliberately used os.system instead of subprocess.Popen
      (py35x64_test) E:\Work\Dev\\q003207219>"e:\Work\Dev\VEnvs\py35x64_test\Scripts\python.exe" -c "import os;os.system(\"dir /b root_dir\")"
      dir0
      dir1
      dir2
      dir3
      file0
      file1
      


Final note(s) :

  • I will try to keep it up to date, any suggestions are welcome, I will incorporate anything useful that will come up into the answer(s)



By using os library.

import os
for root, dirs,files in os.walk("your dir path", topdown=True):
    for name in files:
        print(os.path.join(root, name))



如果你正在尋找Python的Python實現,這是我經常使用的一個配方:

from findtools.find_files import (find_files, Match)

# Recursively find all *.sh files in **/usr/bin**
sh_files_pattern = Match(filetype='f', name='*.sh')
found_files = find_files(path='/usr/bin', match=sh_files_pattern)

for found_file in found_files:
    print found_file

所以我製作了一個PyPI package ,還有一個GitHub存儲庫 。 我希望有人發現它可能對此代碼有用。