php - sub - str_replace正则




PHP会爆炸字符串,但会将引号中的单词视为单个单词 (4)

如何爆炸以下字符串:

Lorem ipsum "dolor sit amet" consectetur "adipiscing elit" dolor

array("Lorem", "ipsum", "dolor sit amet", "consectetur", "adipiscing elit", "dolor")

这样引用中的文本就被视为一个单词。

这就是我现在所拥有的:

$mytext = "Lorem ipsum %22dolor sit amet%22 consectetur %22adipiscing elit%22 dolor"
$noquotes = str_replace("%22", "", $mytext");
$newarray = explode(" ", $noquotes);

但我的代码将每个单词分成一个数组。 如何将引号内的单词视为一个单词?


你可以使用preg_match_all(...)

$text = 'Lorem ipsum "dolor sit amet" consectetur "adipiscing \\"elit" dolor';
preg_match_all('/"(?:\\\\.|[^\\\\"])*"|\S+/', $text, $matches);
print_r($matches);

这将产生:

Array
(
    [0] => Array
        (
            [0] => Lorem
            [1] => ipsum
            [2] => "dolor sit amet"
            [3] => consectetur
            [4] => "adipiscing \"elit"
            [5] => dolor
        )

)

正如您所看到的,它还可以解释引用字符串中的转义引号。

编辑

一个简短的解释:

"           # match the character '"'
(?:         # start non-capture group 1 
  \\        #   match the character '\'
  .         #   match any character except line breaks
  |         #   OR
  [^\\"]    #   match any character except '\' and '"'
)*          # end non-capture group 1 and repeat it zero or more times
"           # match the character '"'
|           # OR
\S+         # match a non-whitespace character: [^\s] and repeat it one or more times

如果匹配%22而不是双引号,您可以:

preg_match_all('/%22(?:\\\\.|(?!%22).)*%22|\S+/', $text, $matches);

使用str_getcsv()更容易。

$test = 'Lorem ipsum "dolor sit amet" consectetur "adipiscing elit" dolor';
var_dump(str_getcsv($test, ' '));

给你

array(6) {
  [0]=>
  string(5) "Lorem"
  [1]=>
  string(5) "ipsum"
  [2]=>
  string(14) "dolor sit amet"
  [3]=>
  string(11) "consectetur"
  [4]=>
  string(15) "adipiscing elit"
  [5]=>
  string(5) "dolor"
}

您也可以尝试这种多重爆炸功能

function multiexplode ($delimiters,$string)
{

$ready = str_replace($delimiters, $delimiters[0], $string);
$launch = explode($delimiters[0], $ready);
return  $launch;
}

$text = "here is a sample: this text, and this will be exploded. this also | this one too :)";
$exploded = multiexplode(array(",",".","|",":"),$text);

print_r($exploded);

我来到这里遇到了类似于此的复杂字符串拆分问题,但这里的答案都没有完全符合我的要求 - 所以我自己写了。

我在这里张贴它,以防它对其他人有帮助。

这可能是一种非常缓慢且低效的方式 - 但它对我有用。

function explode_adv($openers, $closers, $togglers, $delimiters, $str)
{
    $chars = str_split($str);
    $parts = [];
    $nextpart = "";
    $toggle_states = array_fill_keys($togglers, false); // true = now inside, false = now outside
    $depth = 0;
    foreach($chars as $char)
    {
        if(in_array($char, $openers))
            $depth++;
        elseif(in_array($char, $closers))
            $depth--;
        elseif(in_array($char, $togglers))
        {
            if($toggle_states[$char])
                $depth--; // we are inside a toggle block, leave it and decrease the depth
            else
                // we are outside a toggle block, enter it and increase the depth
                $depth++;

            // invert the toggle block state
            $toggle_states[$char] = !$toggle_states[$char];
        }
        else
            $nextpart .= $char;

        if($depth < 0) $depth = 0;

        if(in_array($char, $delimiters) &&
           $depth == 0 &&
           !in_array($char, $closers))
        {
            $parts[] = substr($nextpart, 0, -1);
            $nextpart = "";
        }
    }
    if(strlen($nextpart) > 0)
        $parts[] = $nextpart;

    return $parts;
}

用法如下。 explode_adv有5个参数:

  1. 打开块的字符数组 - 例如[(等)
  2. 关闭块的字符数组 - 例如] ,等等。
  3. 切换块的字符数组 - 例如"'等。
  4. 应该导致拆分到下一部分的字符数组。
  5. 要处理的字符串。

这种方法可能存在缺陷 - 欢迎编辑。





str-replace