python with How do I list all files of a directory?




python read file from directory (24)

Part Two 1

Solutions (continued)

Other approaches:

  1. Use Python only as a wrapper

    • Everything is done using another technology
    • That technology is invoked from Python
    • The most famous flavor that I know is what I call the system administrator approach:

      • Use Python (or any programming language for that matter) in order to execute shell commands (and parse their outputs - in general this approach is to be avoided, since if some command output format slightly differs between OS versions/flavors, the parsing code should be adapted as well; not to mention non EN locales)
      • Some consider this a neat hack
      • I consider it more like a lame workaround (gainarie), as the action per se is performed from shell (cmd in this case), and thus doesn't have anything to do with Python.
      • Filtering (grep / findstr) or output formatting could be done on both sides, but I'm not going to insist on it. Also, I deliberately used os.system instead of subprocess.Popen.
      (py35x64_test) E:\Work\Dev\StackOverflow\q003207219>"e:\Work\Dev\VEnvs\py35x64_test\Scripts\python.exe" -c "import os;os.system(\"dir /b root_dir\")"
      dir0
      dir1
      dir2
      dir3
      file0
      file1
      

End of Part Two 1


1. Due to the fact that SO's post (question / answer) limit is 30000 chars ([SE.Meta]: Knowing Your Limits: What is the maximum length of a question title, post, image and links used?), the answer was split in 2 parts. Please make sure to read [SO]: How do I list all files of a directory? (@CristiFati's answer - "Part One") before.

How can I list all files of a directory in Python and add them to a list?


If you care about performance, try scandir. For Python 2.x, you may need to install it manually. Examples:

# python 2.x
import scandir
import sys

de = scandir.scandir(sys.argv[1])
while 1:
    try:
        d = de.next()
        print d.path
    except StopIteration as _:
        break

This saves a lot of time when you need to scan a huge directory, and you do not need to buffer a huge list, just fetch one by one. And also you can do it recursively:

def scan_path(path):
    de = scandir.scandir(path)
    while 1:
        try:
            e = de.next()
            if e.is_dir():
                scan_path(e.path)
            else:
                print e.path
        except StopIteration as _:
                break

Really simple version:

import os
[f for f in os.listdir(os.getcwd) if ...]

import os 
os.listdir(path)

This will return a list of all files and directories in path.

filenames = next(os.walk(path))[2]

This will return only a list of files, not subdirectories.


If you are looking for a Python implementation of find, this is a recipe I use rather frequently:

from findtools.find_files import (find_files, Match)

# Recursively find all *.sh files in **/usr/bin**
sh_files_pattern = Match(filetype='f', name='*.sh')
found_files = find_files(path='/usr/bin', match=sh_files_pattern)

for found_file in found_files:
    print found_file

So I made a PyPI package out of it and there is also a GitHub repository. I hope that someone finds it potentially useful for this code.


Since version 3.4 there are builtin iterators for this which are a lot more efficient than os.listdir():

pathlib: New in version 3.4.

>>> import pathlib
>>> [p for p in pathlib.Path('.').iterdir() if p.is_file()]

According to PEP 428, the aim of the pathlib library is to provide a simple hierarchy of classes to handle filesystem paths and the common operations users do over them.

os.scandir(): New in version 3.5.

>>> import os
>>> [entry for entry in os.scandir('.') if entry.is_file()]

Note that os.walk() uses os.scandir() instead of os.listdir() from version 3.5, and its speed got increased by 2-20 times according to PEP 471.

Let me also recommend reading ShadowRanger's comment below.


import os
import os.path


def get_files(target_dir):
    item_list = os.listdir(target_dir)

    file_list = list()
    for item in item_list:
        item_dir = os.path.join(target_dir,item)
        if os.path.isdir(item_dir):
            file_list += get_files(item_dir)
        else:
            file_list.append(item_dir)
    return file_list

Here I use a recursive structure.


import dircache
list = dircache.listdir(pathname)
i = 0
check = len(list[0])
temp = []
count = len(list)
while count != 0:
  if len(list[i]) != check:
     temp.append(list[i-1])
     check = len(list[i])
  else:
    i = i + 1
    count = count - 1

print temp

I prefer using the glob module, as it does pattern matching and expansion.

import glob
print(glob.glob("/home/adam/*.txt"))

It will return a list with the queried files:

['/home/adam/file1.txt', '/home/adam/file2.txt', .... ]

Use this function if you want to use a different file type or get the full directory:

import os

def createList(foldername, fulldir = True, suffix=".jpg"):
    file_list_tmp = os.listdir(foldername)
    #print len(file_list_tmp)
    file_list = []
    if fulldir:
        for item in file_list_tmp:
            if item.endswith(suffix):
                file_list.append(os.path.join(foldername, item))
    else:
        for item in file_list_tmp:
            if item.endswith(suffix):
                file_list.append(item)
    return file_list

import os
lst=os.listdir(path)

os.listdir returns a list containing the names of the entries in the directory given by path.


Returning a list of absolute filepaths, does not recurse into subdirectories

L = [os.path.join(os.getcwd(),f) for f in os.listdir('.') if os.path.isfile(os.path.join(os.getcwd(),f))]

# -** coding: utf-8 -*-
import os
import traceback

print '\n\n'

def start():
    address = "/home/ubuntu/Desktop"
    try:
        Folders = []
        Id = 1
        for item in os.listdir(address):
            endaddress = address + "/" + item
            Folders.append({'Id': Id, 'TopId': 0, 'Name': item, 'Address': endaddress })
            Id += 1         

            state = 0
            for item2 in os.listdir(endaddress):
                state = 1
            if state == 1: 
                Id = FolderToList(endaddress, Id, Id - 1, Folders)
        return Folders
    except:
        print "___________________________ ERROR ___________________________\n" + traceback.format_exc()

def FolderToList(address, Id, TopId, Folders):
    for item in os.listdir(address):
        endaddress = address + "/" + item
        Folders.append({'Id': Id, 'TopId': TopId, 'Name': item, 'Address': endaddress })
        Id += 1

        state = 0
        for item in os.listdir(endaddress):
            state = 1
        if state == 1: 
            Id = FolderToList(endaddress, Id, Id - 1, Folders)
    return Id

print start()

Using generators

import os
def get_files(search_path):
     for (dirpath, _, filenames) in os.walk(search_path):
         for filename in filenames:
             yield os.path.join(dirpath, filename)
list_files = get_files('.')
for filename in list_files:
    print(filename)

I will provide a sample one liner where sourcepath and file type can be provided as input. The code returns a list of filenames with csv extension. Use . in case all files needs to be returned. This will also recursively scans the subdirectories.

[y for x in os.walk(sourcePath) for y in glob(os.path.join(x[0], '*.csv'))]

Modify file extensions and source path as needed.


Getting Full File Paths From a Directory and All Its Subdirectories

import os

def get_filepaths(directory):
    """
    This function will generate the file names in a directory 
    tree by walking the tree either top-down or bottom-up. For each 
    directory in the tree rooted at directory top (including top itself), 
    it yields a 3-tuple (dirpath, dirnames, filenames).
    """
    file_paths = []  # List which will store all of the full filepaths.

    # Walk the tree.
    for root, directories, files in os.walk(directory):
        for filename in files:
            # Join the two strings in order to form the full filepath.
            filepath = os.path.join(root, filename)
            file_paths.append(filepath)  # Add it to the list.

    return file_paths  # Self-explanatory.

# Run the above function and store its results in a variable.   
full_file_paths = get_filepaths("/Users/johnny/Desktop/TEST")

  • The path I provided in the above function contained 3 files— two of them in the root directory, and another in a subfolder called "SUBFOLDER." You can now do things like:
  • print full_file_paths which will print the list:

    • ['/Users/johnny/Desktop/TEST/file1.txt', '/Users/johnny/Desktop/TEST/file2.txt', '/Users/johnny/Desktop/TEST/SUBFOLDER/file3.dat']

If you'd like, you can open and read the contents, or focus only on files with the extension ".dat" like in the code below:

for f in full_file_paths:
  if f.endswith(".dat"):
    print f

/Users/johnny/Desktop/TEST/SUBFOLDER/file3.dat


Get a list of files with Python 2 and 3


I have also made a short video here: Python: how to get a list of file in a directory


os.listdir()

or..... how to get all the files (and directories) in current directory (Python 3)

The simplest way to have the file in the current directory in Python 3 is this. It's really simple; use the os module and the listdir() function and you'll have the file in that directory (and eventual folders that are in the directory, but you will not have the file in the subdirectory, for that you can use walk - I will talk about it later).

>>> import os
>>> arr = os.listdir()
>>> arr
['$RECYCLE.BIN', 'work.txt', '3ebooks.txt', 'documents']

Using glob

I found glob easier to select file of the same type or with something in common. Look at the following example:

import glob

txtfiles = []
for file in glob.glob("*.txt"):
    txtfiles.append(file)

Using list comprehension

import glob

mylist = [f for f in glob.glob("*.txt")]

Getting the full path name with os.path.abspath

As you noticed, you don't have the full path of the file in the code above. If you need to have the absolute path, you can use another function of the os.path module called _getfullpathname, putting the file that you get from os.listdir() as an argument. There are other ways to have the full path, as we will check later (I replaced, as suggested by mexmex, _getfullpathname with abspath).

>>> import os
>>> files_path = [os.path.abspath(x) for x in os.listdir()]
>>> files_path
['F:\\documenti\applications.txt', 'F:\\documenti\collections.txt']

Get the full path name of a type of file into all subdirectories with walk

I find this very useful to find stuff in many directories, and it helped me finding a file about which I didn't remember the name:

import os

# Getting the current work directory (cwd)
thisdir = os.getcwd()

# r=root, d=directories, f = files
for r, d, f in os.walk(thisdir):
    for file in f:
        if ".docx" in file:
            print(os.path.join(r, file))

os.listdir(): get files in the current directory (Python 2)

In Python 2 you, if you want the list of the files in the current directory, you have to give the argument as '.' or os.getcwd() in the os.listdir method.

>>> import os
>>> arr = os.listdir('.')
>>> arr
['$RECYCLE.BIN', 'work.txt', '3ebooks.txt', 'documents']

To go up in the directory tree

>>> # Method 1
>>> x = os.listdir('..')

# Method 2
>>> x= os.listdir('/')

Get files: os.listdir() in a particular directory (Python 2 and 3)

>>> import os
>>> arr = os.listdir('F:\\python')
>>> arr
['$RECYCLE.BIN', 'work.txt', '3ebooks.txt', 'documents']

Get files of a particular subdirectory with os.listdir()

import os

x = os.listdir("./content")

os.walk('.') - current directory

>>> import os
>>> arr = next(os.walk('.'))[2]
>>> arr
['5bs_Turismo1.pdf', '5bs_Turismo1.pptx', 'esperienza.txt']

glob module - all files

import glob
print(glob.glob("*"))

out:['content', 'start.py']

next(os.walk('.')) and os.path.join('dir','file')

>>> import os
>>> arr = []
>>> for d,r,f in next(os.walk("F:\_python")):
>>>     for file in f:
>>>         arr.append(os.path.join(r,file))
...
>>> for f in arr:
>>>     print(files)

>output

F:\\_python\\dict_class.py
F:\\_python\\programmi.txt

next(os.walk('F:\') - get the full path - list comprehension

>>> [os.path.join(r,file) for r,d,f in next(os.walk("F:\\_python")) for file in f]
['F:\\_python\\dict_class.py', 'F:\\_python\\programmi.txt']

os.walk - get full path - all files in sub dirs

x = [os.path.join(r,file) for r,d,f in os.walk("F:\\_python") for file in f]

>>>x
['F:\\_python\\dict.py', 'F:\\_python\\progr.txt', 'F:\\_python\\readl.py']

os.listdir() - get only txt files

>>> arr_txt = [x for x in os.listdir() if x.endswith(".txt")]
>>> print(arr_txt)
['work.txt', '3ebooks.txt']

glob - get only txt files

>>> import glob
>>> x = glob.glob("*.txt")
>>> x
['ale.txt', 'alunni2015.txt', 'assenze.text.txt', 'text2.txt', 'untitled.txt']

Using glob to get the full path of the files

If I should need the absolute path of the files:

>>> from path import path
>>> from glob import glob
>>> x = [path(f).abspath() for f in glob("F:\*.txt")]
>>> for f in x:
...  print(f)
...
F:\acquistionline.txt
F:\acquisti_2018.txt
F:\bootstrap_jquery_ecc.txt

Other use of glob

If I want all the files in the directory:

>>> x = glob.glob("*")

Using os.path.isfile to avoid directories in the list

import os.path
listOfFiles = [f for f in os.listdir() if os.path.isfile(f)]
print(listOfFiles)

> output

['a simple game.py', 'data.txt', 'decorator.py']

Using pathlib from (Python 3.4)

import pathlib

>>> flist = []
>>> for p in pathlib.Path('.').iterdir():
...  if p.is_file():
...   print(p)
...   flist.append(p)
...
error.PNG
exemaker.bat
guiprova.mp3
setup.py
speak_gui2.py
thumb.PNG

If you want to use list comprehension

>>> flist = [p for p in pathlib.Path('.').iterdir() if p.is_file()]

*You can use also just pathlib.Path() instead of pathlib.Path(".")

Use glob method in pathlib.Path()

import pathlib

py = pathlib.Path().glob("*.py")
for file in py:
    print(file)

output:

stack_overflow_list.py
stack_overflow_list_tkinter.py

Get all and only files with os.walk

import os
x = [i[2] for i in os.walk('.')]
y=[]
for t in x:
    for f in t:
        y.append(f)

>>> y
['append_to_list.py', 'data.txt', 'data1.txt', 'data2.txt', 'data_180617', 'os_walk.py', 'READ2.py', 'read_data.py', 'somma_defaltdic.py', 'substitute_words.py', 'sum_data.py', 'data.txt', 'data1.txt', 'data_180617']

Get only files with next and walk in a directory

>>> import os
>>> x = next(os.walk('F://python'))[2]
>>> x
['calculator.bat','calculator.py']

Get only directories with next and walk in a directory

>>> import os
>>> next(os.walk('F://python'))[1] # for the current dir use ('.')
['python3','others']

Get all the subdir names with walk

>>> for r,d,f in os.walk("F:\_python"):
...  for dirs in d:
...   print(dirs)
...
.vscode
pyexcel
pyschool.py
subtitles
_metaprogramming
.ipynb_checkpoints

os.scandir() from Python 3.5 on

>>> import os
>>> x = [f.name for f in os.scandir() if f.is_file()]
>>> x
['calculator.bat','calculator.py']

# Another example with scandir (a little variation from docs.python.org)
# This one is more efficient than os.listdir.
# In this case, it shows the files only in the current directory
# where the script is executed.

>>> import os
>>> with os.scandir() as i:
...  for entry in i:
...   if entry.is_file():
...    print(entry.name)
...
ebookmaker.py
error.PNG
exemaker.bat
guiprova.mp3
setup.py
speakgui4.py
speak_gui2.py
speak_gui3.py
thumb.PNG
>>>

Ex. 1: How many files are there in the subdirectories?

In this example, we look for the number of files that are included in all the directory and its subdirectories.

import os

def count(dir, counter=0):
    "returns number of files in dir and subdirs"
    for pack in os.walk(dir):
        for f in pack[2]:
            counter += 1
    return dir + " : " + str(counter) + "files"

print(count("F:\\python"))

> output

>'F:\\\python' : 12057 files'

Ex.2: How to copy all files from a directory to another?

A script to make order in your computer finding all files of a type (default: pptx) and copying them in a new folder.

import os
import shutil
from path import path

destination = "F:\\file_copied"
# os.makedirs(destination)

def copyfile(dir, filetype='pptx', counter=0):
    "Searches for pptx (or other - pptx is the default) files and copies them"
    for pack in os.walk(dir):
        for f in pack[2]:
            if f.endswith(filetype):
                fullpath = pack[0] + "\\" + f
                print(fullpath)
                shutil.copy(fullpath, destination)
                counter += 1
    if counter > 0:
        print("------------------------")
        print("\t==> Found in: `" + dir + "` : " + str(counter) + " files\n")

for dir in os.listdir():
    "searches for folders that starts with `_`"
    if dir[0] == '_':
        # copyfile(dir, filetype='pdf')
        copyfile(dir, filetype='txt')


> Output

_compiti18\Compito Contabilità 1\conti.txt
_compiti18\Compito Contabilità 1\modula4.txt
_compiti18\Compito Contabilità 1\moduloa4.txt
------------------------
==> Found in: `_compiti18` : 3 files

Ex. 3: How to get all the files in a txt file

In case you want to create a txt file with all the file names:

import os
mylist = ""
with open("filelist.txt", "w", encoding="utf-8") as file:
    for eachfile in os.listdir():
        mylist += eachfile + "\n"
    file.write(mylist)

Example: txt with all the files of an hard drive

"""We are going to save a txt file with all the files in your directory.
We will use the function walk()

"""

import os

# see all the methods of os
# print(*dir(os), sep=", ")
listafile = []
percorso = []
with open("lista_file.txt", "w", encoding='utf-8') as testo:
    for root, dirs, files in os.walk("D:\\"):
        for file in files:
            listafile.append(file)
            percorso.append(root + "\\" + file)
            testo.write(file + "\n")
listafile.sort()
print("N. of files", len(listafile))
with open("lista_file_ordinata.txt", "w", encoding="utf-8") as testo_ordinato:
    for file in listafile:
        testo_ordinato.write(file + "\n")

with open("percorso.txt", "w", encoding="utf-8") as file_percorso:
    for file in percorso:
        file_percorso.write(file + "\n")

os.system("lista_file.txt")
os.system("lista_file_ordinata.txt")
os.system("percorso.txt")

All the file of C:\\ in one text file

This is a shorter version of the previous code. Change the folder where to start finding the files if you need to start from another position. This code generate a 50 mb on text file on my computer with something less then 500.000 lines with files with the complete path.

import os

with open("file.txt", "w", encoding="utf-8") as filewrite:
    for r, d, f in os.walk("C:\\"):
        for file in f:
            filewrite.write(f"{r + file}\n")    

A function to search for a certain type of file

import os

def searchfiles(extension='.ttf'):
    "Create a txt file with all the file of a type"
    with open("file.txt", "w", encoding="utf-8") as filewrite:
        for r, d, f in os.walk("C:\\"):
            for file in f:
                if file.endswith(extension):
                    filewrite.write(f"{r + file}\n")

# looking for ttf file (fonts)
searchfiles('ttf')

Referring to the answer by @adamk, here is my os detection method in response to the slash inconsistency comment by @Anti Earth

import sys
import os
from pathlib import Path
from glob import glob
platformtype = sys.platform
if platformtype == 'win32':
    slash = "\\"
if platformtype == 'darwin':
    slash = "/"

# TODO: How can I list all files of a directory in Python and add them to a list?

# Step 1 - List all files of a directory

# Method 1: Find only pre-defined filetypes (.txt) and no subfiles, answer provided by @adamk
dir1 = "%sfoo%sbar%s*.txt" % (slash)
_files = glob(dir1)

# Method 2: Find all files and no subfiles
dir2 = "%sfoo%sbar%s" % (slash)
_files = (x for x in Path("dir2").iterdir() if x.is_file())

# Method 3: Find all files and all subfiles
dir3 = "%sfoo%sbar" % (slash)
_files = (x for x in Path('dir3').glob('**/*') if x.is_file())


# Step 2 - Add them to a list

files_list = []
for eachfiles in _files:
    files_basename = os.path.basename(eachfiles)
    files_list.append(files_basename)

print(files_list)
['file1.txt', 'file2.txt', .... ]

I'm assuming that you want just the basenames in the list.

Refer to this post for pre-defining multiple file formats for Method 1.


def list_files(path):
    # returns a list of names (with extension, without full path) of all files 
    # in folder path
    files = []
    for name in os.listdir(path):
        if os.path.isfile(os.path.join(path, name)):
            files.append(name)
    return files 

Python 3.5 introduced a new, faster method for walking through the directory - os.scandir().

Example:

for file in os.scandir('/usr/bin'):
    line = ''
    if file.is_file():
        line += 'f'
    elif file.is_dir():
        line += 'd'
    elif file.is_symlink():
        line += 'l'
    line += '\t'
    print("{}{}".format(line, file.name))

Another very readable variant for Python 3.4+ is using pathlib.Path.glob:

from pathlib import Path
folder = '/foo'
[f for f in Path(folder).glob('*') if f.is_file()]

It is simple to make more specific, e.g. only look for Python source files which are not symbolic links, also in all subdirectories:

[f for f in Path(folder).glob('**/*.py') if not f.is_symlink()]

By using os library.

import os
for root, dirs,files in os.walk("your dir path", topdown=True):
    for name in files:
        print(os.path.join(root, name))

A one-line solution to get only list of files (no subdirectories):

filenames = next(os.walk(path))[2]

or absolute pathnames:

paths = [os.path.join(path,fn) for fn in next(os.walk(path))[2]]

I really liked adamk's answer, suggesting that you use glob(), from the module of the same name. This allows you to have pattern matching with *s.

But as other people pointed out in the comments, glob() can get tripped up over inconsistent slash directions. To help with that, I suggest you use the join() and expanduser() functions in the os.path module, and perhaps the getcwd() function in the os module, as well.

As examples:

from glob import glob

# Return everything under C:\Users\admin that contains a folder called wlp.
glob('C:\Users\admin\*\wlp')

The above is terrible - the path has been hardcoded and will only ever work on Windows between the drive name and the \s being hardcoded into the path.

from glob    import glob
from os.path import join

# Return everything under Users, admin, that contains a folder called wlp.
glob(join('Users', 'admin', '*', 'wlp'))

The above works better, but it relies on the folder name Users which is often found on Windows and not so often found on other OSs. It also relies on the user having a specific name, admin.

from glob    import glob
from os.path import expanduser, join

# Return everything under the user directory that contains a folder called wlp.
glob(join(expanduser('~'), '*', 'wlp'))

This works perfectly across all platforms.

Another great example that works perfectly across platforms and does something a bit different:

from glob    import glob
from os      import getcwd
from os.path import join

# Return everything under the current directory that contains a folder called wlp.
glob(join(getcwd(), '*', 'wlp'))

Hope these examples help you see the power of a few of the functions you can find in the standard Python library modules.







directory