If you make email campaigns, you’ll like Detergent. I made it for myself when I discovered that no tool on the market was capable to encode the entities and clean the invisible characters at the same time.
Cleaning and Encoding
Detergent cleans, encodes and prepares any text for pasting into HTML code.
When making websites, developers usually set up the CMS and leave the content for the client to take care of. Also, websites support UTF-8.
Email newsletters are very different. For starters, content is added upon the build and developer has to put all and/or final text into the newsletter, because it is sent once done.
Secondly, email newsletter raw source is ASCII symbols. If there are any symbols outside of the ASCII table (like pound symbol, for example), those symbols are encoded. When this encoding gets done wrongly because of any reasons, broken symbols appear in place of every special character.
Detergent.io encodes only the characters outside of ASCII table. For example, pound symbol (£) is encoded, but apostrophe (‘) is not.
There are two ways to encode special characters: using numeric entities (£) and using named entities (
£). It is very important to encode special characters for email HTML using the named entities, because they are readable and you can understand them when you read the email’s text before sending.
Detergent encodes the special characters onto to named entities.
Detergent uses few other dependencies, such as:
- curl-quotes - to encode quotes into typographically correct ones
- string.js - to perform actions on the text, especially on internal function to remove widow words
- typographic-em-dashes - to optionally convert hyphens into em dashes
- typographic-en-dashes - to optionally convert hyphens into en dashes
- unicode-dragon - to clean the stray unpaired surrogates
Detergent’s Cleaning Pipeline
Detergent cleans the left text box and shows the result live, in the right text box. You can freely edit the input text, change the settings and see the result instantly.
Here’s the Detergent’s cleaning pipeline on Detergent.io:
Any encoded entities are decoded (not optional function)
Any unpaired surrogates are removed (not optional function)
Any invisible characters that can’t be interpreted as line terminators are removed (not optional function)
Any invisible characters that can be interpreted as line terminators are replaced with
<br />(depending on the setting) (not optional function)
All special characters (all outside ASCII table) are encoded using named HTML entities (optional)
Tabs are replaced with spaces (not optional)
White space is collapsed - there can’t be any multiple spaces between words for example (not optional)
Soft hyphens are removed (optional)
HTML tags are stripped (obligatory) but optionally,
<i>tags are simplified (any attributes and redundant white space within the tag stripped, tag’s case converted to lowercase)
Any variations of
<br>tags (including misspelled and erroneous versions) are decoded into line terminator symbol (not optional)
White space is trimmed in front and an end of the input text string (not optional)
Widow words are prevented, replacing the space in front of them with
Text is improved typographically - quotes, double quotes and dashes are converted into correct alternatives (optional)
Line terminator symbols are replaced with
’ (depending on the setting, optionally)
White space is collapsed again
Closest competitor for Detergent is Email on Acid Character Converter. It is only a special character encoder - it will not take care of invisible Unicode characters. As an encoder it is a sloppy one - it can’t encode astral characters, such as “𝌆” or emoji. Furthermore, EoA Character Converter encodes into numeric HTML entities.
In my opinion, such limited functionality is insufficient for a professional email developer’s workflow and Detergent.io should be used instead.
Interestingly, unpaired surrogate cleaning function in Detergent allows it to clean the EoA Character Converter’s mangled astral characters. For example, EoA tool would wrongly convert already mentioned trigram into
�� - symbols combination that will appear as broken. If you fed this mess into Detergent, it will detect the mess and remove mangled astral character completely (thanks to unicode-dragon JS library).
For the record, Detergent cleans all Unicode characters, including astral symbols and emoji correctly, by default.
If you need to encode special characters, remove invisible ones or do some operations on text, try Detergent.io.