Photo by Hello I’m Nik on Unsplash

 

Python is a great language. It is relatively easy to learn and has an intuitive syntax. The rich selection of libraries also contribute to the popularity and success of Python.

However, it is not just about the third party libraries. Base Python also provides numerous methods and functions to expedite and ease the typical tasks in data science.

In this article, we will go over 15 built-in string methods in Python. You might already be familiar with some of them but we will also see some of the rare ones.

The methods are quite self-explanatory so I will focus more on the examples to demonstrate how to use them rather than explaining what they do.

 

1. Capitalize

 
It makes the first letter uppercase.

txt = "python is awesome!"

txt.capitalize()
'Python is awesome!'

 

2. Upper

 
It makes all the letters uppercase.

txt = "Python is awesome!"

txt.upper()
'PYTHON IS AWESOME!'

 

3. Lower

 
It makes all the letters lowercase.

txt = "PYTHON IS AWESOME!"

txt.lower()
'python is awesome!'

 

4. Isupper

 
It checks if all the letters are uppercase.

txt = "PYTHON IS AWESOME!"

txt.isupper()
True

 

5. Islower

 
It checks if all the letters are lowercase

txt = "PYTHON IS AWESOME!"

txt.islower()
False

 

The following 3 methods are similar so I will do examples that include all of them.

 

6. Isnumeric

 
It checks if all the characters are numeric.

 

7. Isalpha

 
It checks if all the characters are in the alphabet.

 

8. Isalnum

 
It checks if all the characters are alphanumeric (i.e. letter or number).

# Example 1
txt = "Python"

print(txt.isnumeric())
False

print(txt.isalpha())
True

print(txt.isalnum())
True

 

# Example 2
txt = "2021"

print(txt.isnumeric())
True

print(txt.isalpha())
False

print(txt.isalnum())
True

 

# Example 3
txt = "Python2021"

print(txt.isnumeric())
False

print(txt.isalpha())
False

print(txt.isalnum())
True

 

# Example 4
txt = "Python-2021"

print(txt.isnumeric())
False

print(txt.isalpha())
False

print(txt.isalnum())
False

 

9. Count

 
It counts the number of occurrences of the given character in a string.

txt = "Data science"

txt.count("e")
2

 

10. Find

 
It returns the index of the first occurrence of the given character in a string.

txt = "Data science"

txt.find("a")
1

 

We can also find the second or other occurrences of a character.

 

If we pass a sequence of characters, the find method returns the index where the sequence starts.

 

11. Startswith

 
It checks if a string starts with the given character. We can use this method as a filter in a list comprehension.

mylist = ["John", "Jane", "Emily", "Jack", "Ashley"]

j_list = [name for name in mylist if name.startswith("J")]

j_list
['John', 'Jane', 'Jack']

 

12. Endswith

 
It checks if a string ends with the given character.

txt = "Python"

txt.endswith("n")
True

 

Both the endswith and startswith methods are case sensitive.

txt = "Python"

txt.startswith("p")
False

txt.startswith("P")
True

 

13. Replace

 
It replaces a string or a part of it with the given set of characters.

txt = "Python is awesome!"

txt = txt.replace("Python", "Data science")

txt
'Data science is awesome!'

 

14. Split

 
It splits a string at the occurrences of the specified character and returns a list that contains each part after splitting.

txt="Data science is awesome!"

txt.split()
['Data', 'science', 'is', 'awesome!']

 

By default, it splits at whitespace but we can make it based on any character or set of characters.

 

15. Partition

 
It partitions a string into 3 parts and returns a tuple that contains these parts.

txt = "Python is awesome!"
txt.partition("is")
('Python ', 'is', ' awesome!')

txt = "Python is awesome and it is easy to learn."
txt.partition("and")
('Python is awesome ', 'and', ' it is easy to learn.')

 

The partition method returns exactly 3 parts. If there are multiple occurrences of the character used for partitioning, the first one is taken into account.

txt = "Python and data science and machine learning"
txt.partition("and")
('Python ', 'and', ' data science and machine learning')

 

We can also do a similar operation with the split method by limiting the number of splits. However, there are some differences.

  • The split method returns a list
  • The returned list does not include the characters used for splitting
txt = "Python and data science and machine learning"
txt.split("and", 1)
['Python ', ' data science and machine learning']

 

Bonus

 
Thanks Matheus Ferreira for reminding me one of the greatest strings methods: join. I also use the join method but I forgot to add it here. It deserves to get in the list as a bonus.
The join method combines the strings in a collection into a single string.

mylist = ["Jane", "John", "Matt", "James"]

"-".join(mylist)

'Jane-John-Matt-James'

 

Let’s do an example with a tuple as well.

mytuple = ("Data science", "Machine learning")" and ".join(mytuple)'Data science and Machine learning'

 

Conclusion

 
When performing data science, we deal with textual data a lot. Moreover, the textual data requires much more preprocessing than plain numbers. Thankfully, Python’s built-in string methods are capable of performing such tasks efficiently and smoothly.

Thank you for reading. Please let me know if you have any feedback.

 
Bio: Soner Yıldırım is a Junior Data Scientist at Invent Analytics and blogger.

Original. Reposted with permission.

Related:



Source link

Leave a Reply

Your email address will not be published.