# Formatting Functions

**Available functions:**

* [`remove_stopwords`](#remove-stopwords)`(text)`: Remove stopwords from the input text using NLTK's stopwords.
* [`remove_numbers`](#remove-numbers)`(text)`: Remove numbers from the input text.
* [`remove_whitespace`](#remove-whitespace)`(text)`: Remove excess whitespace from the input text.
* [`normalize_whitespace`](#normalize-whitespace)`(text)`: Normalize multiple whitespaces into a single whitespace in the input text.
* [`seperate_symbols`](#seperate-symbols)`(text)`: Separate symbols and words with a space to ease tokenization.
* [`remove_special_characters`](#remove-special-characters)`(text)`: Remove special characters from the input text.
* [`standardize_text`](#standardize-text)`(text)`: Standardize the formatting of the input text.
* [`tokenize_text`](#tokenize-text)`(text)`: Tokenize the input text into individual words.
* [`stem_words`](#stem-words)`(words)`: Stem the input words using Porter stemming algorithm.
* [`lemmatize_words`](#lemmatize-words)`(words)`: Lemmatize the input words using WordNet lemmatization.
* [`pos_tag`](#pos-tag)`(text)`: Perform part-of-speech (POS) tagging on the input text.
* [`remove_profanity_from_text`](#remove-profane-words-from-text)`(text)`: Remove profane words from the input text.
* [`remove_sensitive_info_from_text`](#remove-sensitive-information-from-text)`(text)`: Remove sensitive information from the input text.
* [`remove_hate_speech_from_text`](#remove-hate-speech-from-text-using-ai)`(text)`: Remove hate speech or offensive speech from the input text.
* [`post_format_text`](#post-format-text-using-regex)`(text)`: Post-format the text using regex.

***

### Remove stopwords

Remove stopwords from the input text using NLTK's stopwords.

```
Parameters:
- `text` (str): The input text from which stopwords should be removed.

Returns:
- `str`: The text without stopwords.
```

### Remove numbers

Remove numbers from the input text.

```
Parameters:
- `text` (str): The input text from which numbers should be removed.

Returns:
- `str`: The text without numbers.
```

### Remove whitespace

Remove excess whitespace from the input text.

```
Parameters:
- `text` (str): The input text from which excess whitespace should be removed.

Returns:
- `str`: The text with the removed excess whitespace.
```

### Normalize whitespace

Normalize multiple whitespaces into a single whitespace in the input text.

```
Parameters:
- `text` (str): The input text from which whitespace should be normalized.

Returns:
- `str`: The text with normalized whitespace.
```

### Seperate symbols

Separate symbols and words with a space to ease tokenization.

```
Parameters:
- `text` (str): The input text from which symbols needs to be seperated.

Returns:
- `str`: The text from which symbols have been seperated.
```

### Remove special characters

Remove special characters from the input text.

```
Parameters:
- `text` (str): The input text from which special characters should be removed.

Returns:
- `str`: The text with special characters removed.
```

### Standardize text

Standardize the formatting of the input text.

```
Parameters:
- `text` (str): The input text which needs to be standardized.

Returns:
- `str`: The standardized text.
```

### Tokenize text

Tokenize the input text into individual words.

```
Parameters:
- `text` (str): The input text to be tokenized.

Returns:
- `list`: A list of tokens (words) from the input text.
```

### Stem words

Stem the input words using the Porter stemming algorithm.

```
Parameters:
- `words` (list): A list of words to be stemmed.

Returns:
- `list`: A list of stemmed words.
```

### Lemmatize words

Lemmatize the input words using WordNet lemmatization.

```
Parameters:
- `words` (list): A list of words to be lemmatized.

Returns:
- `list`: A list of lemmatized words.
```

### POS tag

Perform part-of-speech (POS) tagging on the input text.

```
Parameters:
- `text` (str): The input text to be POS tagged.

Returns:
- `list`: A list of tuples containing (word, tag) pairs.
```

### Remove profane words from text

This ensures that the text is clean and does not contain inappropriate language.

```
Parameters:
- `text` (str): The input text to remove profanity from.

Returns:
- `text` (str): The cleaned output text.
```

### Remove sensitive information from text

This can be useful for depersonalization of text data.

```
Parameters:
- `text` (str): The input text to remove sensitive information from.

Returns:
- `text` (str): The cleaned output text.
```

### Remove hate speech from text using AI

This function removes sentences, and not just a certain word because it is context-relevant.

```
Parameters:
- `text` (str): The input text to remove hate speech and offensive speech from.

Returns:
- `text` (str): The cleaned output text.
```

### Post-format text using regex

This function post-formats the text after DupliPy's augmentation or other processes.

```
Parameters:
- `text` (str): The input text to be post-formatted.
    
Returns:
- `str`: The post-formatted text.
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://infinitode-docs.gitbook.io/documentation/package-documentation/duplipy-package-documentation/duplipy-reference/formatting-functions.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
