There are a number of issues here, with different answers.

Can Merkle trees use a commutative operation in general to combine hashes?

Yes, but only if they aren’t intended to commit to the order of the leaves.

Clearly when a commutative operation is used, (A,B,C,D) and (D,C,A,B) will hash to the same thing. This is not a problem if the Merkle root is intended to be a commitment to the (multi)set of leaves, but it is if it is intended to be a commitment to the list.

Could the Bitcoin transaction Merkle tree have used a commutative operation?

Maybe, it’s hard to talk about hypotheticals.

The order of transactions in a block is relevant (transactions can spend outputs created by previous transactions in the same block), so you want to prevent a peer from reordering the transactions to invalidate it without breaking the commitment. Obviously alternative solutions could have been used here, either by encoding the order explicitly, or by performing a topological sort on the set of transactions before verification.

Obviously this cannot be changed anymore in the actual Bitcoin protocol without a very invasive hardfork.

Can you use addition or xor as commutative hash combination function?

Not without security reduction.

Imagine a 2-element Merkle tree with leaf elements A and B. Their hash is H = hash(leafhash(A) + leafhash(B)). An attacker who knows A and B can use a generalized birthday attack to find other leaves C and D such that leafhash(C) + leafhash(D) = leafhash(A) + leafhash(B). Perhaps surprisingly, this only needs ~2128 work if leafhash is a generic (and secure) 256-bit hash function. By doing so, the attacker has managed to perform a second preimage attack on the Merkle tree construction, in only the square root of the time that would normally be expected for second preimage security.

There may be reductions in other security properties too.

Are there other feasible commutative hash combination functions then?

Yes, but they don’t shrink the data.

For example, if the hashes are treated as integers modulo a large prime, then combining child hashes x and y as (x+y,xy) works (because the sum and product uniquely identify the inputs, but not their order). When working in a large characteristic-2 finite field (e.g. GF(2256)), (x+y,x3+y3) also works.

Another much simpler possibility is simply sorting the elements: combining x and y as (min(x,y),max(x,y)) works fine.

If using secure commutative combination functions doesn’t shrink the data, then is there any point?

It means that you can prove an element is in the Merkle tree without revealing its position.

This is a minor bandwidth gain (log2(n) bits for a tree with n elements), reduces some implementation complexity, and may be a slight privacy improvement if the positions are sensitive. In fact, this approach is used in the proposed Taproot script trees (concatenating the hashes after sorting), precisely for the reason above.

Disclaimer: I’m a co-author of that proposal.