python – Capitalizing the first letter of every word in a string (with arbitrary spacing)

  • remove the unused imports

Your code logic is OK but the execution is not good. Try not to overcomplicate it. Take a piece of paper and write the procedure of how a human like YOU would do it, just with a pencil and paper.

read the names individually... make first character capital if it isn't a digit 

Now that you have basic design, become more specific, or in Python terms

  • read the names individually for word in string.split()
  • make the first character capital: string.title()
  • if it isn’t a digit : if not string(0).isdigit()

The primary problem which I faced is handling arbitrary spaces

string.split() will return the same thing, let it be 1 space or 1000 spaces. It does not matter

Now you have exactly what you need, it is just a matter of putting it together.

for every word in words, capitalize if the first character isn't a digit else do nothing
    return " ".join(word.title() if not word(0).isdigit() else word for word in words.split())

Furthermore, using capitalize() will avoid the extra check

    return " ".join(word.capitalize() for word in words.split(' '))

EDIT:

You have to use .split(' ') and NOT .split() since .split() removes all the whitespaces.


As you mentioned, title() and capitalize() fail for scenarios where you pass something like

ALLISON heck

Output

Allison Heck

In that case, you need to have extra checks. The best thing to do here is to create another function that specifically capitalizes the first letter

Here is what I thought of

def cap_first(word):
    return word(:1).upper() + word(1:)

the solve function remains the same

def solve(words):
    return ' '.join(cap_first(word) for word in words.split(' ')

Benchmarks

the latter code is surely more readable and compact, but what is its performance compared to the previous solution?

I will measure the execution time in the following manner using the time module

for iterations in (10 ** 5,11 ** 5, 10 ** 6, 11 ** 6):
    print(f"n{iterations} iteartionsn")


    start = time.time()
    for _ in range(iterations): solvenew(names)
    print(f"Time taken for new function: {time.time() - start:.3f} s")

    start = time.time()
    for _ in range(iterations): solveoriginal(names)
    print(f"Time taken for original function: {time.time() - start:.3f} s")

Here are the results

#                      Time taken 
#
#     iterations  |  original    |   new 
#   --------------------------------------
#       10 ** 6   |   2.553 s    |  2.106 s
#   --------------------------------------
#       11 ** 6   |   6.203 s    |  5.542 s
#   --------------------------------------
#       10 ** 7   |   32.412 s   |  24.774 s

Feel free to try it yourself