Documentation
Our WebsiteOur Github
  • 👋Welcome to Infinitode Documentation
  • AI Documentation
  • API Documentation
    • Basic Math API Documentation (#Experimental)
    • BMI Calculator API Documentation
    • Character Counter API Documentation
    • Chemical Equation Balancer API Documentation
    • Color Generator API Documentation
    • Date Difference Calculator API Documentation
    • Dungen API Documentation
    • Dungen Dev API Documentation
    • Factorial Calculator API Documentation
    • Fantasy Name Generator API Documentation
    • Fibonacci Sequence Generator API Documentation
    • GCD Calculator API Documentation
    • Hash API Documentation
    • Helix PSA API Documentation
    • LCM Calculator API Documentation
    • Leap Year Checker API Documentation
    • Lorem API Documentation
    • Molar Mass Calculator API Documentation (#Experimental)
    • MycoNom API Documentation
    • Name Generator API Documentation
    • Palindrome Checker API Documentation
    • Password Generator API Documentation
    • Password Strength Detector API Documentation
    • Periodic Table API Documentation
    • Prime Number Checker API Documentation
    • Quadratic Equation Solver API Documentation
    • Random Facts Generator API Documentation
    • Random Quotes Generator API Documentation
    • Roman Numeral Converter API Documentation
    • Simple Interest Calculator API Documentation
    • Slugify API Documentation
    • Text Case Converter API Documentation
    • Unit Converter API Documentation
    • Username Generator API Documentation
    • UUID Generator API Documentation
    • Vowel Counter API Documentation
  • Package Documentation
    • BlurJS Package Documentation
      • BlurJS Usage Examples
      • BlurJS Reference Documentation
    • CodeSafe Package Documentation
      • CodeSafe Reference
        • CodeSafe Functions
    • DeepDefend Package Documentation
      • DeepDefend Reference
        • Attacks Functions
        • Defenses Functions
    • DupliPy Package Documentation
      • DupliPy Reference
        • Formatting Functions
        • Replication Functions
        • Similarity Functions
        • Text Analysis Functions
    • FuncProfiler Package Documentation
      • FuncProfiler Reference
        • FuncProfiler Functions
    • Hued Package Documentation
      • Hued Reference
        • Analysis Functions
        • Colors Functions
        • Conversions Functions
        • Palettes Functions
    • LocalSiteMap Package Documentation
      • LocalSiteMap Reference
        • LocalSiteMap Functions
    • PyAutoPlot Package Documentation
      • PyAutoPlot Reference
        • PyAutoPlot Functions
    • PyWebScrapr Package Documentation
      • PyWebScrapr Reference
        • PyWebScrapr Functions
    • ValX Package Documentation
      • ValX Reference
        • ValX Functions
Powered by GitBook
On this page
  • Remove stopwords
  • Remove numbers
  • Remove whitespace
  • Normalize whitespace
  • Seperate symbols
  • Remove special characters
  • Standardize text
  • Tokenize text
  • Stem words
  • Lemmatize words
  • POS tag
  • Remove profane words from text
  • Remove sensitive information from text
  • Remove hate speech from text using AI
  • Post-format text using regex

Was this helpful?

  1. Package Documentation
  2. DupliPy Package Documentation
  3. DupliPy Reference

Formatting Functions

PreviousDupliPy ReferenceNextReplication Functions

Last updated 4 months ago

Was this helpful?

Available functions:

  • (text): Remove stopwords from the input text using NLTK's stopwords.

  • (text): Remove numbers from the input text.

  • (text): Remove excess whitespace from the input text.

  • (text): Normalize multiple whitespaces into a single whitespace in the input text.

  • (text): Separate symbols and words with a space to ease tokenization.

  • (text): Remove special characters from the input text.

  • (text): Standardize the formatting of the input text.

  • (text): Tokenize the input text into individual words.

  • (words): Stem the input words using Porter stemming algorithm.

  • (words): Lemmatize the input words using WordNet lemmatization.

  • (text): Perform part-of-speech (POS) tagging on the input text.

  • (text): Remove profane words from the input text.

  • (text): Remove sensitive information from the input text.

  • (text): Remove hate speech or offensive speech from the input text.

  • (text): Post-format the text using regex.


Remove stopwords

Remove stopwords from the input text using NLTK's stopwords.

Parameters:
- `text` (str): The input text from which stopwords should be removed.

Returns:
- `str`: The text without stopwords.

Remove numbers

Remove numbers from the input text.

Parameters:
- `text` (str): The input text from which numbers should be removed.

Returns:
- `str`: The text without numbers.

Remove whitespace

Remove excess whitespace from the input text.

Parameters:
- `text` (str): The input text from which excess whitespace should be removed.

Returns:
- `str`: The text with the removed excess whitespace.

Normalize whitespace

Normalize multiple whitespaces into a single whitespace in the input text.

Parameters:
- `text` (str): The input text from which whitespace should be normalized.

Returns:
- `str`: The text with normalized whitespace.

Seperate symbols

Separate symbols and words with a space to ease tokenization.

Parameters:
- `text` (str): The input text from which symbols needs to be seperated.

Returns:
- `str`: The text from which symbols have been seperated.

Remove special characters

Remove special characters from the input text.

Parameters:
- `text` (str): The input text from which special characters should be removed.

Returns:
- `str`: The text with special characters removed.

Standardize text

Standardize the formatting of the input text.

Parameters:
- `text` (str): The input text which needs to be standardized.

Returns:
- `str`: The standardized text.

Tokenize text

Tokenize the input text into individual words.

Parameters:
- `text` (str): The input text to be tokenized.

Returns:
- `list`: A list of tokens (words) from the input text.

Stem words

Stem the input words using the Porter stemming algorithm.

Parameters:
- `words` (list): A list of words to be stemmed.

Returns:
- `list`: A list of stemmed words.

Lemmatize words

Lemmatize the input words using WordNet lemmatization.

Parameters:
- `words` (list): A list of words to be lemmatized.

Returns:
- `list`: A list of lemmatized words.

POS tag

Perform part-of-speech (POS) tagging on the input text.

Parameters:
- `text` (str): The input text to be POS tagged.

Returns:
- `list`: A list of tuples containing (word, tag) pairs.

Remove profane words from text

This ensures that the text is clean and does not contain inappropriate language.

Parameters:
- `text` (str): The input text to remove profanity from.

Returns:
- `text` (str): The cleaned output text.

Remove sensitive information from text

This can be useful for depersonalization of text data.

Parameters:
- `text` (str): The input text to remove sensitive information from.

Returns:
- `text` (str): The cleaned output text.

Remove hate speech from text using AI

This function removes sentences, and not just a certain word because it is context-relevant.

Parameters:
- `text` (str): The input text to remove hate speech and offensive speech from.

Returns:
- `text` (str): The cleaned output text.

Post-format text using regex

This function post-formats the text after DupliPy's augmentation or other processes.

Parameters:
- `text` (str): The input text to be post-formatted.
    
Returns:
- `str`: The post-formatted text.
remove_stopwords
remove_numbers
remove_whitespace
normalize_whitespace
seperate_symbols
remove_special_characters
standardize_text
tokenize_text
stem_words
lemmatize_words
pos_tag
remove_profanity_from_text
remove_sensitive_info_from_text
remove_hate_speech_from_text
post_format_text