3 Ways to Convert HTML to Plain Text

Converting HTML to plain text makes the code readable in any format and makes it easier to store in a database. Here are three ways to do it.

Written by Sanchitha Sharma
Published on Jan. 21, 2025
Developer reviewing HTML code on laptop
Image: Shutterstock / Built In
Brand Studio Logo

Converting HTML to plain text is a useful strategy for storing and making code readable in any format. Let’s say you’re working with a rich text editor and need to strip the HTML tags from the string and store it in the database. There are a few different methods for accomplishing that goal.

How to Convert HTML to Plain Text

  1. Use .replace(/<[^>]*>/g, ‘’).
  2. Create a Temporary DOM Element and Retrieve the Text.
  3. Use the html-to-text NPM Package.

Let’s dive in and see how it works.

More on Software EngineeringMeta Charset UTF 8 in HTML Explained

 

3 Methods to Convert HTML to Plain Text

1. Using .replace(/<[^>]*>/g, ‘’)

Using .replace(/<[^>]*>/g, ‘’) is a simple and efficient way to remove the tags from the text. This method uses the JavaScript string method .replace(old value,new value), which replaces the HTML tag values with the empty string. The /g signals that the action should occur globally, meaning every value found in the string gets replaced with the specified value.

The drawback of this method is that we can’t remove some HTML entities. This method still works well if you have a simple HTML and want a quick conversion.

var myHTML= "<div><h1>Jimbo.</h1>\n<p>That's what she said</p></div>";

var strippedHtml = myHTML.replace(/<[^>]+>/g, '');

// Jimbo.
// That's what she said
console.log(stripedHtml);

2. Create a Temporary DOM Element and Retrieve the Text

This is the most efficient way of doing the task. Create a dummy element and assign it to a variable. We can extract later using the element objects. After assigning the HTML text to the innerHTML of the dummy element, we’ll get the plain text from the text element objects.

function convertToPlain(html){

    // Create a new div element
    var tempDivElement = document.createElement("div");

    // Set the HTML content with the given value
    tempDivElement.innerHTML = html;

    // Retrieve the text property of the element 
    return tempDivElement.textContent || tempDivElement.innerText || "";
}

var htmlString= "<div><h1>Bears Beets Battlestar Galactica </h1>\n<p>Quote by Dwight Schrute</p></div>";


console.log(convertToPlain(htmlString));
// Expected Result:
// Bears Beets Battlestar Galactica 
// Quote by Dwight Schrute
A tutorial on how to convert HTML to plain text. | Video: GeeksWonder

More on Software EngineeringBehavior-Driven Development (BDD) Explained

3. Use the html-to-text NPM Package

I recently discovered the html-to-text package on NPM. This is the converter library that parses HTML and returns beautiful text. It comes with many options to convert it to plain text, including: wordwrap, tags, whitespaceCharacters and formattersetc. You need to have Package.json to be able to download and use it. 

Installation

npm install html-to-text

Usage

const { htmlToText } = require('html-to-text');

const text = htmlToText('<div>Nope It is not Ashton Kutcher. It is Kevin Malone. <p>Equally Smart and equally handsome</p></div>', {
    wordwrap: 130
});
console.log(text); // expected result: 
// Nope It is not Ashton Kutcher. It is Kevin Malone.

// Equally Smart and equally handsome

Compare your own results to my example of the project.

Frequently Asked Questions

The most efficient method for converting HTML to plain text is to create a temporary DOM element and retrieve the text. Here’s how:

function convertToPlain(html){

    // Create a new div element
    var tempDivElement = document.createElement("div");

    // Set the HTML content with the given value
    tempDivElement.innerHTML = html;

    // Retrieve the text property of the element 
    return tempDivElement.textContent '' tempDivElement.innerText '' "";
}

var htmlString= "<div><h1>Bears Beets Battlestar Galactica </h1>\n<p>Quote by Dwight Schrute</p></div>";


console.log(convertToPlain(htmlString));
// Expected Result:
// Bears Beets Battlestar Galactica 
// Quote by Dwight Schrute

Converting HTML to plain text is a useful strategy for storing and making code readable in any format.

Explore Job Matches.