TheDeveloperBlog.com


Python String Examples, Methods

Python String Examples, Methods

Strings. These contains characters. And characters represent letters, digits, any kind of data. Strings are the basis of much of the world's usable information.


Strings are important. Nearly every Python program handles string data. They are versatile, easily persisted to disk, and fairly fast.


Triple-quoted. This literal syntax form begins and ends with three quotes. Newlines are left as is. In triple-quoted syntax, we do not need to escape quotes.

Note: In this syntax, the Python interpreter scans for three quotes for the start and end of the literal. A single quote can also be used.

Note 2: We must surround a triple-quoted string with the same kind of quotes. Double quotes must be matched with double quotes.

Based on:

Python 3

Python program that uses triple-quotes

# Use a triple-quoted string.
v = """This string has
triple "quotes" on it."""

print(v)

Output

This string has
triple "quotes" on it.

Length, len. With the len built-in, we easily count the number of characters in a string. We cannot count just some characters. A for-loop would be needed for character testing.

Len

Loops. We often loop over characters. This can be done with a for-loop. We can directly loop over a string. Or we can use a for-loop with the range method to loop over indexes.

Tip: If you need to get adjacent characters, or test many indexes at once, the for-loop that uses range() is best.

Python program that uses for-loop on strings

s = "abc"

# Loop over string.
for c in s:
    print(c)

# Loop over string indexes.
for i in range(0, len(s)):
    print(s[i])

Output

a    Loop 1
b
c
a    Loop 2
b
c

In example. A string is a sequence of characters. It supports sequence operators, such as the in-keyword. On a string, the in-keyword searches the string.

In

And: It returns true if the substring is found, and false otherwise. This is a predicate.

Case: Character casing matters. The in-operator returns false if you search for the wrong case of characters.

Python program that uses in-operator

s = "dot net perls"

# Use the in-operator.
if "dot" in s:
    print("dot")

if "perls" in s:
    print("perls")

if " " in s:
    print("space")

if "D" in s:
    # Not reached, case matters.
    print("D")

if "lycurgus" in s:
    # Not reached.
    print("lycurgus")

Output

dot
perls
space

Find, rfind. The in-operator tells us if the string is found. But find() tells the index of the string. We use find, and rfind, to search for strings when we need to know their positions.

Further: Python provides the index and rindex methods. These are similar to find, but have different behavior when no string is found.

Find

Add, multiply. A string can be added to another string. And it can be multiplied by a number. This copies that same string into a new string the specified number of times.

Caution: If you multiply a string against a large number, you may get a MemoryError. An if-check to prevent this may be helpful.

Python program that adds, multiplies strings

s = "abc?"

# Add two strings together.
add = s + s
print(add)

# Multiply a string.
product = s * 3
print(product)

Output

abc?abc?
abc?abc?abc?

Slice, substring. We take substrings with the slice syntax. No substring method exists. We cannot assign a slice in a string, but we can concatenate slices together.

SubstringSlice

Split, splitlines. Strings often contain many words or parts. Often we need to extract their parts. Instead we use split or related methods like splitlines, rsplit or even partition.

Split

Tip: We also benchmark split() calls with different syntax. We find the fastest way to call split().


Join. This method combines strings in a list or other iterable collection. It is called with an unusual syntax form. We call it on the delimiter that will come between the iterables values.

Tip: Join only places delimiters between strings, not at the start or end of the result. This is different from some loop-based approaches.

Python program that uses join on list

list = ["a", "b", "c"]

# Join with empty string literal.
result = "".join(list)

# Join with comma.
result2 = ",".join(list)

# Display results.
print(result)
print(result2)

Output

abc
a,b,c

Lower, upper. Letters are lowercase or uppercase. Other characters, like digits and whitespace, have no case. The upper() and lower() methods change the case of letters.

Lower, Upper

Capitalize: We cover the capitalize method as well as the title method, which capitalizes each word.

Islower, isupper: We test if strings are already uppercase or lowercase with the isupper and islower methods.


Count. Searching strings is a common task. The count() method is a convenient option. It receives one or three arguments. The first argument is the substring we want to count.

Indexes: The second and third argument to count() are the first index, and last index we are searching.

And: The string must be contained completely within those indexes to be counted.

Note: From my testing, the substring must be fully within the indexes to be matched.

Python program that uses count

value = "finnegans wake"

# Count this substring.
print(value.count("n"))

# Count substring in indexes 0 to 6.
print(value.count("n", 0, 6))

Output

3
2

Startswith, endswith. All things have a start and an end. Often we need to test the starts and ends of strings. We use the startswith and endswith methods.

Next: We use startswith and endswith on an example string, a place name in New York.

Python that uses startswith, endswith

# Input string.
s = "voorheesville"

if s.startswith("voo"):
    print("1")

if s.endswith("ville"):
    print("2")

if s.startswith("stuy"):
    # Not reached.
    print("3")

Output

1
2

Ljust, rjust. These pad strings. They accept one or two arguments. The first argument is the total length of the result string. The second is the padding character.

Tip: If you specify a number that is too small, ljust and rjust do nothing. They return the original string.

Python that uses ljust, rjust

s = "Paris"

# Justify to left, add periods.
print(s.ljust(10, "."))

# Justify to right.
print(s.rjust(10))

Output

Paris.....
     Paris

Replace. A string cannot be changed in-place. We cannot assign characters. Instead we use the replace method to create a new string.

Arguments: Replace accepts two substring arguments: the "before" and "after" parts that are replaced.

And: The third argument is optional. It is a count. It indicates the maximum number of instances to replace.

Tip: Replace only handles substrings, not characters. If you need to replace many single characters, please consider the translate method.

Python that uses replace

value = "aabc"

# Replace a substring with another.
result = value.replace("bc", "yz")
print(result)

# Replace the first occurrence with a substring.
result = value.replace("a", "x", 1)
print(result)

Output

aayz
xabc

Equals. Strings can be compared for exact equality with two equals signs. With this syntax, the characters (and their cases) are compared. If not equal, the test returns false.

Casefold: Newer versions of Python support the casefold method. Similar to lower(), it handles Unicode characters better.

Lower: We can also use the lower method to standardize the casing of strings in our programs.

Python that tests string equality

value = "CAT"

if value == "cat":
    print("A") # Not reached.

if value == "CAT":
    print("B")

if str.casefold(value) == "cat":
    print("C")

if str.lower(value) == "cat":
    print("D")

Output

B
C
D

Raw literals. By prefixing a string literal with an r, we specify a raw string. In a raw string, the backslash character does not specify an escape sequence—it is a regular character.

Tip: Raw string literals are ideal for regular expression patterns. In "re" we often use the backslash.

Here: The "123" is treated as an escaped sequence in the normal string literal, but not in the raw one.

Python that uses raw string

# In a raw string "\" characters do not escape.
raw = r"\directory\123"
val = "\directory\123"

print(raw)
print(val)

Output

\directory\123
\directoryS

Format. This is a built-in function. With it, we take a value (like an integer or a string) and apply formatting to it. We receive the formatted string.

Comma: In the first example, the number 1000 is formatted as "1,000" by using a comma as the format string.

Padding: A ":" character is applied as padding. The "greater than" character, followed by 10s, means right-align in a string of length 10.

Python that uses format

# Format this number with a comma.
result = format(1000, ",")
print(result)

# Align to the right of 10 chars, filling with ":" and as a string.
result = format("cat", ":>10s")
print(result)

Output

1,000
:::::::cat

Ascii. For many English programs, the ASCII character set is sufficient. No letters with accents occur. But sometimes Unicode is needed.

Info: With ascii, a built-in, we escape Unicode characters into ASCII characters.

Python that uses ascii built-in

# This string contains an umlaut.
value = "Düsseldorf"
print(value)

# Display letter with escaped umlaut.
print(ascii(value))

Output

Düsseldorf
'D\xfcsseldorf'

Ascii_letters, lowercase, uppercase. The string module contains helpful constants. We can use these to loop over all lowercase or uppercase ASCII letters, or to build translation tables.

Tip: We import the string module and access these constants on it. This is an easy to way to access the constants.

Python that uses ascii_letters constant

import string

# The letters constant is equal to lowercase + uppercase.
print(string.ascii_letters)
print(string.ascii_lowercase)
print(string.ascii_uppercase)

Output

abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
abcdefghijklmnopqrstuvwxyz
ABCDEFGHIJKLMNOPQRSTUVWXYZ

String.digits. This is another constant in the string module. If you ever need to loop over the strings "0" through "9" this is an option. The code is clean and simple.

Python that uses string.digits

import string

# Loop over digits using string.digits constant.
for digit in string.digits:
    print(digit)

Output

0
1
2
3
4
5
6
7
8
9

Punctuation, whitespace. Let us look at two more occasionally-helpful constants in the string module: punctuation and whitespace. These too can be looped over or tested.

Tip: Instead of testing these strings with "in," consider using methods like isspace().

Python that uses string.punctuation, whitespace

import string

# Display punctuation.
print(string.punctuation)

# The space is included in string.whitespace.
print(" " in string.whitespace)

Output

!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~
True

Ord, chr. Often we have one-character strings. We can convert these strings into integers with ord. And with chr we convert an integer to a 1-char string.

Ord, chr

Whitespace. We can trim, or strip whitespace characters. We introduce the strip method in Python. It optionally accepts an argument, but handles whitespace by default.

Strip

Textwrap: This is a helpful module included with Python. Textwrap offers a line-breaking algorithm.

Textwrap

Translate. With translate, we provide a translation table to map characters to other characters. The maketrans method can be used to create the table. We remove and ignore characters.

Translate

Tip: Translate often can be used instead of single-character replace calls. This often improves leads to faster code.


Markup. There is no one way to parse HTML. A popular approach is to use the html.parser module. From it, we access the HTMLParser type. We build a class that inherits from HTMLParser.

HTML: For HTML, we add methods to take action when elements are started and closed. Regular expressions may also be used.

HTMLRemove HTML Tags

XML: Similar to HTML, XML is standardized. It is often used to store settings or data, like a text-based database.

XML

Ciphers. These are one kind of string algorithm. They change the values of characters in a string. One common cipher is the ROT13 algorithm.

ROT13

Caesar: Julius Caesar, a Roman Emperor, used a simple cipher to make confidential messages harder to read. We implement a Caesar cipher.

Caesar Cipher

StringIO. This is a buffer that can be used to quickly generate a large string. In my benchmarks, StringIO is faster for appending many strings in a loop.

StringIO

String handling, lists. Python makes string handling simple. It considers text handling one of its core purposes. Strings are often used in lists.

String List

Strings are great. But Python also introduces regular expressions for textual processing. These text expressions are best used for complex pattern-matching.