Converting HTML to plain text is a useful strategy for storing and making code readable in any format. Let’s say you’re working with a rich text editor and need to strip the HTML tags from the string and store it in the database. There are a few different methods for accomplishing that goal.
How to Convert HTML to Plain Text
- Use
.replace(/<[^>]*>/g, ‘’)
. - Create a Temporary DOM Element and Retrieve the Text.
- Use the
html-to-text
NPM Package.
Let’s dive in and see how it works.
3 Methods to Convert HTML to Plain Text
1. Using .replace(/<[^>]*>/g, ‘’)
Using .replace(/<[^>]*>/g, ‘’)
is a simple and efficient way to remove the tags from the text. This method uses the JavaScript string method .replace(old value,new value)
, which replaces the HTML tag values with the empty string. The /g
signals that the action should occur globally, meaning every value found in the string gets replaced with the specified value.
The drawback of this method is that we can’t remove some HTML entities. This method still works well if you have a simple HTML and want a quick conversion.
var myHTML= "<div><h1>Jimbo.</h1>\n<p>That's what she said</p></div>";
var strippedHtml = myHTML.replace(/<[^>]+>/g, '');
// Jimbo.
// That's what she said
console.log(stripedHtml);
2. Create a Temporary DOM Element and Retrieve the Text
This is the most efficient way of doing the task. Create a dummy element and assign it to a variable. We can extract later using the element objects. After assigning the HTML text to the innerHTML of the dummy element, we’ll get the plain text from the text element objects.
function convertToPlain(html){
// Create a new div element
var tempDivElement = document.createElement("div");
// Set the HTML content with the given value
tempDivElement.innerHTML = html;
// Retrieve the text property of the element
return tempDivElement.textContent || tempDivElement.innerText || "";
}
var htmlString= "<div><h1>Bears Beets Battlestar Galactica </h1>\n<p>Quote by Dwight Schrute</p></div>";
console.log(convertToPlain(htmlString));
// Expected Result:
// Bears Beets Battlestar Galactica
// Quote by Dwight Schrute
3. Use the html-to-text NPM Package
I recently discovered the html-to-text package on NPM. This is the converter library that parses HTML and returns beautiful text. It comes with many options to convert it to plain text, including: wordwrap
, tags
, whitespaceCharacters
and formattersetc
. You need to have Package.json
to be able to download and use it.
Installation
npm install html-to-text
Usage
const { htmlToText } = require('html-to-text');
const text = htmlToText('<div>Nope It is not Ashton Kutcher. It is Kevin Malone. <p>Equally Smart and equally handsome</p></div>', {
wordwrap: 130
});
console.log(text); // expected result:
// Nope It is not Ashton Kutcher. It is Kevin Malone.
// Equally Smart and equally handsome
Compare your own results to my example of the project.
Frequently Asked Questions
How do you convert HTML to plain text?
The most efficient method for converting HTML to plain text is to create a temporary DOM element and retrieve the text. Here’s how:
function convertToPlain(html){
// Create a new div element
var tempDivElement = document.createElement("div");
// Set the HTML content with the given value
tempDivElement.innerHTML = html;
// Retrieve the text property of the element
return tempDivElement.textContent '' tempDivElement.innerText '' "";
}
var htmlString= "<div><h1>Bears Beets Battlestar Galactica </h1>\n<p>Quote by Dwight Schrute</p></div>";
console.log(convertToPlain(htmlString));
// Expected Result:
// Bears Beets Battlestar Galactica
// Quote by Dwight Schrute
What is the advantage of converting HTML to plain text?
Converting HTML to plain text is a useful strategy for storing and making code readable in any format.