The question comes from How to separate a list of serial numbers into multiple lists with matched prefix? on Stack Overflow.
Input:
sn = ('bike-001', 'bike-002', 'car/001', 'bus/for/001', 'car/002', 'bus/for/002')
Intended output:
# string with same prefix will be in the same list, e.g.: sn1 = ('bike-001', 'bike-002') sn2 = ('car/001', 'car/002') sn3 = ('bus/for/001', 'bus/for/002')
The original thread already had a brilliant answer using .startswith(<sub_str>)
, however I still want to use regex
to solve the question.
Here’s what I’ve tried: I use re.sub()
to get the prefix and re.search()
to get the 3-digits serial number. I’d like to know if there is a better way (like using one-time regex
function) to get the solution.
import re
sn = ('bike-001', 'bike-002', 'car/001', 'bus/for/001', 'car/002', 'bus/for/002')
sn_dict = {}
for item in sn:
category = re.sub(r'd{3}', "", item)
number = re.search(r'd{3}', item).group()
if category not in sn_dict.keys():
sn_dict(category) = ()
sn_dict(category).append(category + number)
After running the script we will have the following sn_dict
:
{
'bike-': ('bike-001', 'bike-002'),
'car/': ('car/001', 'car/002'),
'bus/for/': ('bus/for/001', 'bus/for/002')
}