C-Sharp | Java | Python | Swift | GO | WPF | Ruby | Scala | F# | JavaScript | SQL | PHP | Angular | HTML
And characters represent letters, digits, any kind of data. Strings are the basis of much of the world's usable information.
Strings are important. Nearly every Python program handles string data. They are versatile, easily persisted to disk, and fairly fast.
Triple-quoted. This literal syntax form begins and ends with three quotes. Newlines are left as is. In triple-quoted syntax, we do not need to escape quotes.
Note: In this syntax, the Python interpreter scans for three quotes for the start and end of the literal. A single quote can also be used.
Note 2: We must surround a triple-quoted string with the same kind of quotes. Double quotes must be matched with double quotes.
Based on: Python 3 Python program that uses triple-quotes # Use a triple-quoted string. v = """This string has triple "quotes" on it.""" print(v) Output This string has triple "quotes" on it.
Length, len. With the len built-in, we easily count the number of characters in a string. We cannot count just some characters. A for-loop would be needed for character testing.
Loops. We often loop over characters. This can be done with a for-loop. We can directly loop over a string. Or we can use a for-loop with the range method to loop over indexes.
Tip: If you need to get adjacent characters, or test many indexes at once, the for-loop that uses range() is best.
Python program that uses for-loop on strings s = "abc" # Loop over string. for c in s: print(c) # Loop over string indexes. for i in range(0, len(s)): print(s[i]) Output a Loop 1 b c a Loop 2 b c
In example. A string is a sequence of characters. It supports sequence operators, such as the in-keyword. On a string, the in-keyword searches the string.
And: It returns true if the substring is found, and false otherwise. This is a predicate.
Case: Character casing matters. The in-operator returns false if you search for the wrong case of characters.
Python program that uses in-operator s = "The Developer Blog" # Use the in-operator. if "dot" in s: print("dot") if "deves" in s: print("deves") if " " in s: print("space") if "D" in s: # Not reached, case matters. print("D") if "lycurgus" in s: # Not reached. print("lycurgus") Output dot deves space
Find, rfind. The in-operator tells us if the string is found. But find() tells the index of the string. We use find, and rfind, to search for strings when we need to know their positions.
Further: Python provides the index and rindex methods. These are similar to find, but have different behavior when no string is found.
Add, multiply. A string can be added to another string. And it can be multiplied by a number. This copies that same string into a new string the specified number of times.
Caution: If you multiply a string against a large number, you may get a MemoryError. An if-check to prevent this may be helpful.
Python program that adds, multiplies strings s = "abc?" # Add two strings together. add = s + s print(add) # Multiply a string. product = s * 3 print(product) Output abc?abc? abc?abc?abc?
Slice, substring. We take substrings with the slice syntax. No substring method exists. We cannot assign a slice in a string, but we can concatenate slices together.
Split, splitlines. Strings often contain many words or parts. Often we need to extract their parts. Instead we use split or related methods like splitlines, rsplit or even partition.
Tip: We also benchmark split() calls with different syntax. We find the fastest way to call split().
Join. This method combines strings in a list or other iterable collection. It is called with an unusual syntax form. We call it on the delimiter that will come between the iterables values.
Tip: Join only places delimiters between strings, not at the start or end of the result. This is different from some loop-based approaches.
Python program that uses join on list list = ["a", "b", "c"] # Join with empty string literal. result = "".join(list) # Join with comma. result2 = ",".join(list) # Display results. print(result) print(result2) Output abc a,b,c
Lower, upper. Letters are lowercase or uppercase. Other characters, like digits and whitespace, have no case. The upper() and lower() methods change the case of letters.
Capitalize: We cover the capitalize method as well as the title method, which capitalizes each word.
Islower, isupper: We test if strings are already uppercase or lowercase with the isupper and islower methods.
Count. Searching strings is a common task. The count() method is a convenient option. It receives one or three arguments. The first argument is the substring we want to count.
Indexes: The second and third argument to count() are the first index, and last index we are searching.
And: The string must be contained completely within those indexes to be counted.
Note: From my testing, the substring must be fully within the indexes to be matched.
Python program that uses count value = "finnegans wake" # Count this substring. print(value.count("n")) # Count substring in indexes 0 to 6. print(value.count("n", 0, 6)) Output 3 2
Startswith, endswith. All things have a start and an end. Often we need to test the starts and ends of strings. We use the startswith and endswith methods.
Next: We use startswith and endswith on an example string, a place name in New York.
Python that uses startswith, endswith # Input string. s = "voorheesville" if s.startswith("voo"): print("1") if s.endswith("ville"): print("2") if s.startswith("stuy"): # Not reached. print("3") Output 1 2
Ljust, rjust. These pad strings. They accept one or two arguments. The first argument is the total length of the result string. The second is the padding character.
Tip: If you specify a number that is too small, ljust and rjust do nothing. They return the original string.
Python that uses ljust, rjust s = "Paris" # Justify to left, add periods. print(s.ljust(10, ".")) # Justify to right. print(s.rjust(10)) Output Paris..... Paris
Replace. A string cannot be changed in-place. We cannot assign characters. Instead we use the replace method to create a new string.
Arguments: Replace accepts two substring arguments: the "before" and "after" parts that are replaced.
And: The third argument is optional. It is a count. It indicates the maximum number of instances to replace.
Tip: Replace only handles substrings, not characters. If you need to replace many single characters, please consider the translate method.
Python that uses replace value = "aabc" # Replace a substring with another. result = value.replace("bc", "yz") print(result) # Replace the first occurrence with a substring. result = value.replace("a", "x", 1) print(result) Output aayz xabc
Equals. Strings can be compared for exact equality with two equals signs. With this syntax, the characters (and their cases) are compared. If not equal, the test returns false.
Casefold: Newer versions of Python support the casefold method. Similar to lower(), it handles Unicode characters better.
Lower: We can also use the lower method to standardize the casing of strings in our programs.
Python that tests string equality value = "CAT" if value == "cat": print("A") # Not reached. if value == "CAT": print("B") if str.casefold(value) == "cat": print("C") if str.lower(value) == "cat": print("D") Output B C D
Raw literals. By prefixing a string literal with an r, we specify a raw string. In a raw string, the backslash character does not specify an escape sequence—it is a regular character.
Tip: Raw string literals are ideal for regular expression patterns. In "re" we often use the backslash.
Here: The "123" is treated as an escaped sequence in the normal string literal, but not in the raw one.
Python that uses raw string # In a raw string "\" characters do not escape. raw = r"\directory\123" val = "\directory\123" print(raw) print(val) Output \directory\123 \directoryS
Format. This is a built-in function. With it, we take a value (like an integer or a string) and apply formatting to it. We receive the formatted string.
Comma: In the first example, the number 1000 is formatted as "1,000" by using a comma as the format string.
Padding: A ":" character is applied as padding. The "greater than" character, followed by 10s, means right-align in a string of length 10.
Python that uses format # Format this number with a comma. result = format(1000, ",") print(result) # Align to the right of 10 chars, filling with ":" and as a string. result = format("cat", ":>10s") print(result) Output 1,000 :::::::cat
Ascii. For many English programs, the ASCII character set is sufficient. No letters with accents occur. But sometimes Unicode is needed.
Info: With ascii, a built-in, we escape Unicode characters into ASCII characters.
Python that uses ascii built-in # This string contains an umlaut. value = "Düsseldorf" print(value) # Display letter with escaped umlaut. print(ascii(value)) Output Düsseldorf 'D\xfcsseldorf'
Ascii_letters, lowercase, uppercase. The string module contains helpful constants. We can use these to loop over all lowercase or uppercase ASCII letters, or to build translation tables.
Tip: We import the string module and access these constants on it. This is an easy to way to access the constants.
Python that uses ascii_letters constant import string # The letters constant is equal to lowercase + uppercase. print(string.ascii_letters) print(string.ascii_lowercase) print(string.ascii_uppercase) Output abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
String.digits. This is another constant in the string module. If you ever need to loop over the strings "0" through "9" this is an option. The code is clean and simple.
Python that uses string.digits import string # Loop over digits using string.digits constant. for digit in string.digits: print(digit) Output 0 1 2 3 4 5 6 7 8 9
Punctuation, whitespace. Let us look at two more occasionally-helpful constants in the string module: punctuation and whitespace. These too can be looped over or tested.
Tip: Instead of testing these strings with "in," consider using methods like isspace().
Python that uses string.punctuation, whitespace import string # Display punctuation. print(string.punctuation) # The space is included in string.whitespace. print(" " in string.whitespace) Output !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~ True
Ord, chr. Often we have one-character strings. We can convert these strings into integers with ord. And with chr we convert an integer to a 1-char string.
Whitespace. We can trim, or strip whitespace characters. We introduce the strip method in Python. It optionally accepts an argument, but handles whitespace by default.
Textwrap: This is a helpful module included with Python. Textwrap offers a line-breaking algorithm.
Translate. With translate, we provide a translation table to map characters to other characters. The maketrans method can be used to create the table. We remove and ignore characters.
Tip: Translate often can be used instead of single-character replace calls. This often improves leads to faster code.
Markup. There is no one way to parse HTML. A popular approach is to use the html.parser module. From it, we access the HTMLParser type. We build a class that inherits from HTMLParser.
HTML: For HTML, we add methods to take action when elements are started and closed. Regular expressions may also be used.
XML: Similar to HTML, XML is standardized. It is often used to store settings or data, like a text-based database.
Ciphers. These are one kind of string algorithm. They change the values of characters in a string. One common cipher is the ROT13 algorithm.
Caesar: Julius Caesar, a Roman Emperor, used a simple cipher to make confidential messages harder to read. We implement a Caesar cipher.
StringIO. This is a buffer that can be used to quickly generate a large string. In my benchmarks, StringIO is faster for appending many strings in a loop.
String handling, lists. Python makes string handling simple. It considers text handling one of its core purposes. Strings are often used in lists.
Strings are great. But Python also introduces regular expressions for textual processing. These text expressions are best used for complex pattern-matching.