java - split with中文




Java/clojure:多個字符分隔符,並保持分隔符 (2)

我正在做一個clojure項目,它可以與任何java類互操作,所以我的問題的答案可以是Java或clojure。

基本上我需要能夠根據給定的分隔符(這將是更多的一個字符)將字符串拆分為組件,但同時保留分隔符。

例如:

splitting "test:test:test" on ":"  => [ "test" ":" "test" ":" "test" ]
splitting "::test::test::" on "::" => [ "::" "test" "::" "test" "::" ]

我用clojure的clojure.string/split來使用clojure.string/split ,但實際上並沒有返回分隔符。 第二個最接近的是使用StringTokenizer,它返回分隔符,但不接受多字符分隔符。

有誰知道任何解決方案,然後只是把字符串分解成一系列的字符,並運行一個奇怪的減少呢?


這裡是一個版本,建立一個正則表達式來匹配分隔符之前和之後的空白,而不是分隔符字符串本身(假設在d沒有正則表達式特殊字符):

=> (defn split-with-delim [s d]
     (clojure.string/split s (re-pattern (str "(?=" d ")|(?<=" d ")"))))
#'user/split-with-delim
=> (split-with-delim "test:test:test" ":")
["test" ":" "test" ":" "test"]
=> (split-with-delim "::test::test::" "::")
["" "::" "test" "::" "test" "::"]

(defn split-it [s d]
  (interpose d (str/split s (re-pattern d))))

(split-it "test:test:test" ":")
=> ("test" ":" "test" ":" "test")

(split-it "::test::test::" "::")
=> ("" "::" "test" "::" "test")




tokenize