Formatting Functions
Available functions:
remove_stopwords
(text)
: Remove stopwords from the input text using NLTK's stopwords.remove_numbers
(text)
: Remove numbers from the input text.remove_whitespace
(text)
: Remove excess whitespace from the input text.normalize_whitespace
(text)
: Normalize multiple whitespaces into a single whitespace in the input text.seperate_symbols
(text)
: Separate symbols and words with a space to ease tokenization.remove_special_characters
(text)
: Remove special characters from the input text.standardize_text
(text)
: Standardize the formatting of the input text.tokenize_text
(text)
: Tokenize the input text into individual words.stem_words
(words)
: Stem the input words using Porter stemming algorithm.lemmatize_words
(words)
: Lemmatize the input words using WordNet lemmatization.pos_tag
(text)
: Perform part-of-speech (POS) tagging on the input text.remove_profanity_from_text
(text)
: Remove profane words from the input text.remove_sensitive_info_from_text
(text)
: Remove sensitive information from the input text.remove_hate_speech_from_text
(text)
: Remove hate speech or offensive speech from the input text.post_format_text
(text)
: Post-format the text using regex.
Remove stopwords
Remove stopwords from the input text using NLTK's stopwords.
Remove numbers
Remove numbers from the input text.
Remove whitespace
Remove excess whitespace from the input text.
Normalize whitespace
Normalize multiple whitespaces into a single whitespace in the input text.
Seperate symbols
Separate symbols and words with a space to ease tokenization.
Remove special characters
Remove special characters from the input text.
Standardize text
Standardize the formatting of the input text.
Tokenize text
Tokenize the input text into individual words.
Stem words
Stem the input words using the Porter stemming algorithm.
Lemmatize words
Lemmatize the input words using WordNet lemmatization.
POS tag
Perform part-of-speech (POS) tagging on the input text.
Remove profane words from text
This ensures that the text is clean and does not contain inappropriate language.
Remove sensitive information from text
This can be useful for depersonalization of text data.
Remove hate speech from text using AI
This function removes sentences, and not just a certain word because it is context-relevant.
Post-format text using regex
This function post-formats the text after DupliPy's augmentation or other processes.
Last updated