# formal languages – Is string spliting formally defined when the string delimiter is an empty string?

Given sequences $$A = (a_1, ldots, a_n)$$ and $$B = (b_1, ldots, b_m)$$, write $$A + B = (a_1, ldots, a_n, b_1, ldots, b_m)$$ for their concatenation. Given a sequence of sequences $$C = (C_1, ldots, C_n)$$, write $$Sigma(C) = C_1 + cdots + C_n$$ for their concatenation.

Define a splitting of a sequence $$X$$ by a sequence $$Y$$ to be a sequence of sequences $$Z = (Z_1, ldots, Z_k)$$ such that $$Z_i neq Y$$ and $$Z_i neq ()$$ for all $$i = 1, ldots, k$$, and
$$X = Sigma (Z_1, Y, Z_2, Y, z_3, ldots, Y, Z_k).$$

For example, $$((a), (), (b, c))$$ is a splitting of $$(a, u, v, u, v, b, c)$$ by $$(u, b)$$.

Splitting by the empty sequence is not unique: $$(a, b, c)$$ may be split by $$()$$ in several ways, among others $$((a),(b),(c))$$, $$((a,b), (c))$$ and $$((a,b,c))$$. From a theoretical point this is a rather trivial and non-interesting observation. The implementors of various string libraries need to deal with splitting by the empty sequence somehow, and as you show, they do.

Posted on