bash - shell读取文件名 - shell遍历文件名




在Bash中提取文件名和扩展名 (20)

我想分别获取文件名(不带扩展名)和扩展名。

目前为止我发现的最佳解决方案是:

NAME=`echo "$FILE" | cut -d'.' -f1`
EXTENSION=`echo "$FILE" | cut -d'.' -f2`

这是错误的,因为如果文件名包含多个“。”,则它不起作用。 字符。 如果我们说,我有abjs,它会考虑ab.js ,而不是abjs

它可以很容易地在Python中完成

file, ext = os.path.splitext(path)

但如果可能的话,我宁愿不要为此解雇Python解释器。

任何更好的想法?


魔术文件识别

除了这个堆栈溢出问题的很多好的答案之外,我想补充一点:

在Linux和其他unixen下,有一个名为file魔术命令,它通过分析文件的第一个字节来进行文件类型检测。 这是一个非常古老的工具,用于打印服务器的initialy(如果不是为......创建的,我不确定)。

file myfile.txt
myfile.txt: UTF-8 Unicode text

file -b --mime-type myfile.txt
text/plain

标准扩展可以在/etc/mime.types中找到(在我的Debian GNU / Linux桌面上,请参阅man fileman mime.types 。也许你必须安装file实用程序和mime-support软件包):

grep $( file -b --mime-type myfile.txt ) </etc/mime.types
text/plain      asc txt text pot brf srt

你可以创建一个bash函数来确定右扩展。 有一点(不完美)的样本:

file2ext() {
    local _mimetype=$(file -Lb --mime-type "$1") _line _basemimetype
    case ${_mimetype##*[/.-]} in
        gzip | bzip2 | xz | z )
            _mimetype=${_mimetype##*[/.-]}
            _mimetype=${_mimetype//ip}
            _basemimetype=$(file -zLb --mime-type "$1")
            ;;
        stream )
            _mimetype=($(file -Lb "$1"))
            [ "${_mimetype[1]}" = "compressed" ] &&
                _basemimetype=$(file -b --mime-type - < <(
                        ${_mimetype,,} -d <"$1")) ||
                _basemimetype=${_mimetype,,}
            _mimetype=${_mimetype,,}
            ;;
        executable )  _mimetype='' _basemimetype='' ;;
        dosexec )     _mimetype='' _basemimetype='exe' ;;
        shellscript ) _mimetype='' _basemimetype='sh' ;;
        * )
            _basemimetype=$_mimetype
            _mimetype=''
            ;;
    esac
    while read -a _line ;do
        if [ "$_line" == "$_basemimetype" ] ;then
            [ "$_line[1]" ] &&
                _basemimetype=${_line[1]} ||
                _basemimetype=${_basemimetype##*[/.-]}
            break
        fi
        done </etc/mime.types
    case ${_basemimetype##*[/.-]} in
        executable ) _basemimetype='' ;;
        shellscript ) _basemimetype='sh' ;;
        dosexec ) _basemimetype='exe' ;;
        * ) ;;
    esac
    [ "$_mimetype" ] && [ "$_basemimetype" != "$_mimetype" ] &&
      printf ${2+-v} $2 "%s.%s" ${_basemimetype##*[/.-]} ${_mimetype##*[/.-]} ||
      printf ${2+-v} $2 "%s" ${_basemimetype##*[/.-]}
}

这个函数可以设置一个可以在以后使用的Bash变量:

(这是来自@Petesh正确答案的启发):

filename=$(basename "$fullfile")
filename="${filename%.*}"
file2ext "$fullfile" extension

echo "$fullfile -> $filename . $extension"

[从单线程修改为通用bash 函数 ,行为现在与dirnamebasename实用程序一致; 理由补充。]

接受的答案在典型情况下效果很好 ,但边缘情况下失败 ,即:

  • 对于没有扩展名的文件名(在这个答案的其余部分称为后缀 ), extension=${filename##*.}返回输入文件名而不是空字符串。
  • extension=${filename##*.}不包括最初的. ,违背惯例。
    • 盲目前瞻. 不适用于没有后缀的文件名。
  • 如果输入文件名以“ .开头,则filename="${filename%.*}"将是空字符串. 并且不再包含. 字符(例如.bash_profile ) - 违反惯例。

---------

因此, 涵盖所有边缘案例强大解决方案的复杂性需要一个功能 - 请参阅下面的定义; 它可以返回路径的所有组件

示例呼叫:

splitPath '/etc/bash.bashrc' dir fname fnameroot suffix
# -> $dir == '/etc'
# -> $fname == 'bash.bashrc'
# -> $fnameroot == 'bash'
# -> $suffix == '.bashrc'

请注意,输入路径后面的参数是可自由选择的位置变量名称
要跳过那些不在那些之前的变量,指定_ (使用抛出变量$_ )或'' ; 例如,仅提取文件名根目录和扩展名,使用splitPath '/etc/bash.bashrc' _ _ fnameroot extension

# SYNOPSIS
#   splitPath path varDirname [varBasename [varBasenameRoot [varSuffix]]] 
# DESCRIPTION
#   Splits the specified input path into its components and returns them by assigning
#   them to variables with the specified *names*.
#   Specify '' or throw-away variable _ to skip earlier variables, if necessary.
#   The filename suffix, if any, always starts with '.' - only the *last*
#   '.'-prefixed token is reported as the suffix.
#   As with `dirname`, varDirname will report '.' (current dir) for input paths
#   that are mere filenames, and '/' for the root dir.
#   As with `dirname` and `basename`, a trailing '/' in the input path is ignored.
#   A '.' as the very first char. of a filename is NOT considered the beginning
#   of a filename suffix.
# EXAMPLE
#   splitPath '/home/jdoe/readme.txt' parentpath fname fnameroot suffix
#   echo "$parentpath" # -> '/home/jdoe'
#   echo "$fname" # -> 'readme.txt'
#   echo "$fnameroot" # -> 'readme'
#   echo "$suffix" # -> '.txt'
#   ---
#   splitPath '/home/jdoe/readme.txt' _ _ fnameroot
#   echo "$fnameroot" # -> 'readme'  
splitPath() {
  local _sp_dirname= _sp_basename= _sp_basename_root= _sp_suffix=
    # simple argument validation
  (( $# >= 2 )) || { echo "$FUNCNAME: ERROR: Specify an input path and at least 1 output variable name." >&2; exit 2; }
    # extract dirname (parent path) and basename (filename)
  _sp_dirname=$(dirname "$1")
  _sp_basename=$(basename "$1")
    # determine suffix, if any
  _sp_suffix=$([[ $_sp_basename = *.* ]] && printf %s ".${_sp_basename##*.}" || printf '')
    # determine basename root (filemane w/o suffix)
  if [[ "$_sp_basename" == "$_sp_suffix" ]]; then # does filename start with '.'?
      _sp_basename_root=$_sp_basename
      _sp_suffix=''
  else # strip suffix from filename
    _sp_basename_root=${_sp_basename%$_sp_suffix}
  fi
  # assign to output vars.
  [[ -n $2 ]] && printf -v "$2" "$_sp_dirname"
  [[ -n $3 ]] && printf -v "$3" "$_sp_basename"
  [[ -n $4 ]] && printf -v "$4" "$_sp_basename_root"
  [[ -n $5 ]] && printf -v "$5" "$_sp_suffix"
  return 0
}

test_paths=(
  '/etc/bash.bashrc'
  '/usr/bin/grep'
  '/Users/jdoe/.bash_profile'
  '/Library/Application Support/'
  'readme.new.txt'
)

for p in "${test_paths[@]}"; do
  echo ----- "$p"
  parentpath= fname= fnameroot= suffix=
  splitPath "$p" parentpath fname fnameroot suffix
  for n in parentpath fname fnameroot suffix; do
    echo "$n=${!n}"
  done
done

测试代码,锻炼功能:

test_paths=(
  '/etc/bash.bashrc'
  '/usr/bin/grep'
  '/Users/jdoe/.bash_profile'
  '/Library/Application Support/'
  'readme.new.txt'
)

for p in "${test_paths[@]}"; do
  echo ----- "$p"
  parentpath= fname= fnameroot= suffix=
  splitPath "$p" parentpath fname fnameroot suffix
  for n in parentpath fname fnameroot suffix; do
    echo "$n=${!n}"
  done
done

预期输出 - 请注意边缘情况:

  • 一个没有后缀的文件名
  • .开头的文件名.考虑后缀的开始)
  • /结尾的输入路径(尾随/被忽略)
  • 输入路径仅为文件名( .作为父路径返回)
  • 一个文件名多于. -prefixed标记(只有最后一个被认为是后缀):
----- /etc/bash.bashrc
parentpath=/etc
fname=bash.bashrc
fnameroot=bash
suffix=.bashrc
----- /usr/bin/grep
parentpath=/usr/bin
fname=grep
fnameroot=grep
suffix=
----- /Users/jdoe/.bash_profile
parentpath=/Users/jdoe
fname=.bash_profile
fnameroot=.bash_profile
suffix=
----- /Library/Application Support/
parentpath=/Library
fname=Application Support
fnameroot=Application Support
suffix=
----- readme.new.txt
parentpath=.
fname=readme.new.txt
fnameroot=readme.new
suffix=.txt

From the answers above, the shortest oneliner to mimic Python's

file, ext = os.path.splitext(path)

presuming your file really does have an extension, is

EXT="${PATH##*.}"; FILE=$(basename "$PATH" .$EXT)

Here is the algorithm I used for finding the name and extension of a file when I wrote a Bash script to make names unique when names conflicted with respect to casing.

#! /bin/bash 

#
# Finds 
# -- name and extension pairs
# -- null extension when there isn't an extension.
# -- Finds name of a hidden file without an extension
# 

declare -a fileNames=(
  '.Montreal' 
  '.Rome.txt' 
  'Loundon.txt' 
  'Paris' 
  'San Diego.txt'
  'San Francisco' 
  )

echo "Script ${0} finding name and extension pairs."
echo 

for theFileName in "${fileNames[@]}"
do
     echo "theFileName=${theFileName}"  

     # Get the proposed name by chopping off the extension
     name="${theFileName%.*}"

     # get extension.  Set to null when there isn't an extension
     # Thanks to mklement0 in a comment above.
     extension=$([[ "$theFileName" == *.* ]] && echo ".${theFileName##*.}" || echo '')

     # a hidden file without extenson?
     if [ "${theFileName}" = "${extension}" ] ; then
         # hidden file without extension.  Fixup.
         name=${theFileName}
         extension=""
     fi

     echo "  name=${name}"
     echo "  extension=${extension}"
done 

The test run.

$ config/Name\&Extension.bash 
Script config/Name&Extension.bash finding name and extension pairs.

theFileName=.Montreal
  name=.Montreal
  extension=
theFileName=.Rome.txt
  name=.Rome
  extension=.txt
theFileName=Loundon.txt
  name=Loundon
  extension=.txt
theFileName=Paris
  name=Paris
  extension=
theFileName=San Diego.txt
  name=San Diego
  extension=.txt
theFileName=San Francisco
  name=San Francisco
  extension=
$ 

FYI: The complete transliteration program and more test cases can be found here: https://www.dropbox.com/s/4c6m0f2e28a1vxf/avoid-clashes-code.zip?dl=0


In order to make dir more useful (in the case a local file with no path is specified as input) I did the following:

# Substring from 0 thru pos of filename
dir="${fullpath:0:${#fullpath} - ${#filename}}"
if [[ -z "$dir" ]]; then
    dir="./"
fi

This allows you to do something useful like add a suffix to the input file basename as:

outfile=${dir}${base}_suffix.${ext}

testcase: foo.bar
dir: "./"
base: "foo"
ext: "bar"
outfile: "./foo_suffix.bar"

testcase: /home/me/foo.bar
dir: "/home/me/"
base: "foo"
ext: "bar"
outfile: "/home/me/foo_suffix.bar"

Using example file /Users/Jonathan/Scripts/bash/MyScript.sh , this code:

MY_EXT=".${0##*.}"
ME=$(/usr/bin/basename "${0}" "${MY_EXT}")

will result in ${ME} being MyScript and ${MY_EXT} being .sh :

Script:

#!/bin/bash
set -e

MY_EXT=".${0##*.}"
ME=$(/usr/bin/basename "${0}" "${MY_EXT}")

echo "${ME} - ${MY_EXT}"

Some tests:

$ ./MyScript.sh 
MyScript - .sh

$ bash MyScript.sh
MyScript - .sh

$ /Users/Jonathan/Scripts/bash/MyScript.sh
MyScript - .sh

$ bash /Users/Jonathan/Scripts/bash/MyScript.sh
MyScript - .sh

以下是一些替代建议(主要以awk ),包括一些高级用例,例如提取软件包的版本号。

f='/path/to/complex/file.1.0.1.tar.gz'

# Filename : 'file.1.0.x.tar.gz'
    echo "$f" | awk -F'/' '{print $NF}'

# Extension (last): 'gz'
    echo "$f" | awk -F'[.]' '{print $NF}'

# Extension (all) : '1.0.1.tar.gz'
    echo "$f" | awk '{sub(/[^.]*[.]/, "", $0)} 1'

# Extension (last-2): 'tar.gz'
    echo "$f" | awk -F'[.]' '{print $(NF-1)"."$NF}'

# Basename : 'file'
    echo "$f" | awk '{gsub(/.*[/]|[.].*/, "", $0)} 1'

# Basename-extended : 'file.1.0.1.tar'
    echo "$f" | awk '{gsub(/.*[/]|[.]{1}[^.]+$/, "", $0)} 1'

# Path : '/path/to/complex/'
    echo "$f" | awk '{match($0, /.*[/]/, a); print a[0]}'
    # or 
    echo "$f" | grep -Eo '.*[/]'

# Folder (containing the file) : 'complex'
    echo "$f" | awk -F'/' '{$1=""; print $(NF-1)}'

# Version : '1.0.1'
    # Defined as 'number.number' or 'number.number.number'
    echo "$f" | grep -Eo '[0-9]+[.]+[0-9]+[.]?[0-9]?'

    # Version - major : '1'
    echo "$f" | grep -Eo '[0-9]+[.]+[0-9]+[.]?[0-9]?' | cut -d. -f1

    # Version - minor : '0'
    echo "$f" | grep -Eo '[0-9]+[.]+[0-9]+[.]?[0-9]?' | cut -d. -f2

    # Version - patch : '1'
    echo "$f" | grep -Eo '[0-9]+[.]+[0-9]+[.]?[0-9]?' | cut -d. -f3

# All Components : "path to complex file 1 0 1 tar gz"
    echo "$f" | awk -F'[/.]' '{$1=""; print $0}'

# Is absolute : True (exit-code : 0)
    # Return true if it is an absolute path (starting with '/' or '~/'
    echo "$f" | grep -q '^[/]\|^~/'

所有用例都使用原始完整路径作为输入,而不依赖于中间结果。


你可以使用POSIX变量的魔力:

bash-3.2$ FILENAME=somefile.tar.gz
bash-3.2$ echo ${FILENAME%%.*}
somefile
bash-3.2$ echo ${FILENAME%.*}
somefile.tar

如果你的文件名是./somefile.tar.gz那么echo ${FILENAME%%.*}会贪婪地删除最长匹配. 你会有空字符串。

(您可以使用临时变量解决该问题:

FULL_FILENAME=$FILENAME
FILENAME=${FULL_FILENAME##*/}
echo ${FILENAME%%.*}

这个site解释更多。

${variable%pattern}
  Trim the shortest match from the end
${variable##pattern}
  Trim the longest match from the beginning
${variable%%pattern}
  Trim the longest match from the end
${variable#pattern}
  Trim the shortest match from the beginning

如何提取fish的文件名和扩展名:

function split-filename-extension --description "Prints the filename and extension"
  for file in $argv
    if test -f $file
      set --local extension (echo $file | awk -F. '{print $NF}')
      set --local filename (basename $file .$extension)
      echo "$filename $extension"
    else
      echo "$file is not a valid file"
    end
  end
end

注意事项:在最后一个点上分割,适用于带点的文件名,但不适用于带点的扩展名。 见下面的例子。

用法:

$ split-filename-extension foo-0.4.2.zip bar.tar.gz
foo-0.4.2 zip  # Looks good!
bar.tar gz  # Careful, you probably want .tar.gz as the extension.

有可能有更好的方法来做到这一点。 随时编辑我的答案,以改善它。

如果你有一套有限的扩展,你会处理,你知道所有的扩展,试试这个:

switch $file
  case *.tar
    echo (basename $file .tar) tar
  case *.tar.bz2
    echo (basename $file .tar.bz2) tar.bz2
  case *.tar.gz
    echo (basename $file .tar.gz) tar.gz
  # and so on
end

并不是第一个例子,但是你必须处理每一个案例,所以它可能会更乏味,这取决于你可以期待多少扩展。


如果文件没有扩展名或没有文件名,这似乎不起作用。 这是我正在使用的; 它只使用内置函数并处理更多(但不是全部)病态文件名。

#!/bin/bash
for fullpath in "[email protected]"
do
    filename="${fullpath##*/}"                      # Strip longest match of */ from start
    dir="${fullpath:0:${#fullpath} - ${#filename}}" # Substring from 0 thru pos of filename
    base="${filename%.[^.]*}"                       # Strip shortest match of . plus at least one non-dot char from end
    ext="${filename:${#base} + 1}"                  # Substring from len of base thru end
    if [[ -z "$base" && -n "$ext" ]]; then          # If we have an extension and no base, it's really the base
        base=".$ext"
        ext=""
    fi

    echo -e "$fullpath:\n\tdir  = \"$dir\"\n\tbase = \"$base\"\n\text  = \"$ext\""
done

这里有一些测试用例:

$ basename-and-extension.sh / /home/me/ /home/me/file /home/me/file.tar /home/me/file.tar.gz /home/me/.hidden /home/me/.hidden.tar /home/me/.. .
/:
    dir  = "/"
    base = ""
    ext  = ""
/home/me/:
    dir  = "/home/me/"
    base = ""
    ext  = ""
/home/me/file:
    dir  = "/home/me/"
    base = "file"
    ext  = ""
/home/me/file.tar:
    dir  = "/home/me/"
    base = "file"
    ext  = "tar"
/home/me/file.tar.gz:
    dir  = "/home/me/"
    base = "file.tar"
    ext  = "gz"
/home/me/.hidden:
    dir  = "/home/me/"
    base = ".hidden"
    ext  = ""
/home/me/.hidden.tar:
    dir  = "/home/me/"
    base = ".hidden"
    ext  = "tar"
/home/me/..:
    dir  = "/home/me/"
    base = ".."
    ext  = ""
.:
    dir  = ""
    base = "."
    ext  = ""

您可以使用basename

例:

$ basename foo-bar.tar.gz .tar.gz
foo-bar

您需要提供基本名称以及将被删除的扩展名,但是如果您始终使用-z执行tar ,那么您知道扩展名将是.tar.gz

这应该做你想做的事情:

tar -zxvf $1
cd $(basename $1 .tar.gz)

您可以使用cut命令删除最后两个扩展名( ".tar.gz"部分):

$ echo "foo.tar.gz" | cut -d'.' --complement -f2-
foo

正如Clayton Hughes在评论中指出的那样,这不适用于问题中的实际例子。 所以作为另一种选择,我建议使用带扩展正则表达式的sed ,如下所示:

$ echo "mpc-1.0.1.tar.gz" | sed -r 's/\.[[:alnum:]]+\.[[:alnum:]]+$//'
mpc-1.0.1

它通过无条件移除最后两个(字母数字)扩展而工作。

[在Anders Lindahl发表评论后再次更新]


我使用以下脚本

$ echo "foo.tar.gz"|rev|cut -d"." -f3-|rev
foo

我认为如果你只是需要文件的名字,你可以试试这个:

FULLPATH=/usr/share/X11/xorg.conf.d/50-synaptics.conf

# Remove all the prefix until the "/" character
FILENAME=${FULLPATH##*/}

# Remove all the prefix until the "." character
FILEEXTENSION=${FILENAME##*.}

# Remove a suffix, in our case, the filename. This will return the name of the directory that contains this file.
BASEDIRECTORY=${FULLPATH%$FILENAME}

echo "path = $FULLPATH"
echo "file name = $FILENAME"
echo "file extension = $FILEEXTENSION"
echo "base directory = $BASEDIRECTORY"

这就是全部= D。


最小和最简单的解决方案(单线)是:

$ file=/blaabla/bla/blah/foo.txt

echo $(basename ${file%.*})

foo


根据Petesh回答,如果只需要文件名,则路径和扩展名都可以在一行中删除,

filename=$(basename ${fullname%.*})

这里是AWK代码。 它可以做得更简单。 但是我对AWK并不擅长。

filename$ ls
abc.a.txt  a.b.c.txt  pp-kk.txt
filename$ find . -type f | awk -F/ '{print $2}' | rev | awk -F"." '{$1="";print}' | rev | awk 'gsub(" ",".") ,sub(".$", "")'
abc.a
a.b.c
pp-kk
filename$ find . -type f | awk -F/ '{print $2}' | awk -F"." '{print $NF}'
txt
txt
txt

通常你已经知道扩展名,所以你可能希望使用:

basename filename .extension

例如:

basename /path/to/dir/filename.txt .txt

我们得到了

filename

$ F = "text file.test.txt"  
$ echo ${F/*./}  
txt  

这可以满足文件名中的多个点和空格,但是如果没有扩展名,它会返回文件名。 虽然容易检查; 只是测试文件名和扩展名是相同的。

自然,此方法不适用于.tar.gz文件。 但是,这可以通过两个步骤来处理。 如果扩展名是gz,则再次检查是否还有tar扩展名。


pax> echo a.b.js | sed 's/\.[^.]*$//'
a.b
pax> echo a.b.js | sed 's/^.*\.//'
js

工作正常,所以你可以使用:

pax> FILE=a.b.js
pax> NAME=$(echo "$FILE" | sed 's/\.[^.]*$//')
pax> EXTENSION=$(echo "$FILE" | sed 's/^.*\.//')
pax> echo $NAME
a.b
pax> echo $EXTENSION
js

顺便说一句,这些命令的工作如下。

NAME的命令替换"." 字符后跟任意数量的非"." 字符直到行的末尾,没有任何内容(即,它从最后的"."到行末尾的所有内容)。 这基本上是一个使用正则表达式欺骗的非贪婪替换。

EXTENSION命令替换任意数量的字符后跟一个"." 字符在行的开头,没有任何东西(例如,它从行的开头到最后一个点,包括所有内容都被删除)。 这是一个贪婪的替代,这是默认的行为。





filenames