Categories
PHP

Convert HTML Entities and Special Characters

Learn how to encode or decode all or special HTML characters within a string.

  1. htmlspecialchars()
  2. htmlspecialchars_decode()
  3. htmlentities()
  4. html_entity_decode()
  5. Flags
  6. Double Encoding

htmlspecialchars()

<?php
//Syntax
htmlspecialchars(
    string $string,
    int $flags = ENT_QUOTES | ENT_SUBSTITUTE | ENT_HTML401,
    ?string $encoding = null,
    bool $double_encode = true
): string
  1. $string – The input string.
  2. $flags – A bitmask of one or more flags.
  3. $encoding – The default is UTF-8 character set. To use a different character set, pass the character set, for example, BIG5.
  4. $double_encode – If false PHP will not encode existing HTML entities in the string, the default is true to convert everything.

If you want to display HTML coding on a web page, you should convert the HTML special characters to HTML entities. The htmlspecialchars() function converts the following characters to their HTML entities. This results in the characters being displayed exactly as entered, rather than parsed and rendered by the browser as if they were actual HTML.

  • & (ampersand) converts to &amp;
  • ' (single quote) converts to &#039;
  • " (double quote) converts to &quot;
  • < (less than) converts to &lt;
  • > (greater than) converts to &gt;

See the following example:

<?php
 echo htmlspecialchars (' & '); // &amp;
 echo htmlspecialchars (' " '); // &quot;
 echo htmlspecialchars (" ' "); // '
 echo htmlspecialchars (" ' ", ENT_QUOTES); // &#039;
 echo htmlspecialchars (' < '); // &lt;
 echo htmlspecialchars (' > '); // &gt;

Note: Use the ENT_QUOTES flag to escape both single and double quotes into HTML entities.

htmlspecialchars_decode()

<?php
 //Syntax
 htmlspecialchars_decode(string $string,
  int $flags = ENT_QUOTES | ENT_SUBSTITUTE | ENT_HTML401
 ): string

Using this function you can reverse the effect of the htmlspecialchars() function. The htmlspecialchars_decode() function converts the following HTML entities to their characters:

  • &amp; converts to &
  • &#039; converts to '
  • &quot; converts to "
  • &lt; converts to <
  • &gt; converts to >
<?php
 echo htmlspecialchars_decode (' &amp; ');  // &
 echo htmlspecialchars_decode (' &lt; ');   // <
 echo htmlspecialchars_decode (' &gt; ');   // >
 echo htmlspecialchars_decode (' &quot; '); // "
 echo htmlspecialchars_decode (' &#039; '); // &#039;
 echo htmlspecialchars_decode (' &#039; ', ENT_QUOTES); // ';

htmlentities()

<?php
//Syntax
htmlentities(
    string $string,
    int $flags = ENT_QUOTES | ENT_SUBSTITUTE | ENT_HTML401,
    ?string $encoding = null,
    bool $double_encode = true
): string
  1. $string – The input string.
  2. $flags – A bitmask of one or more flags.
  3. $encoding – The default is UTF-8 character set. To use a different character set, pass the character set, for example, BIG5.
  4. $double_encode – If false PHP will not encode existing HTML entities in the string, the default is true to convert everything.

This htmlentities() function is helpful if you need to convert every character with a special meaning in HTML coding. For example, the copyright symbol ©, the cent sign ¢, or the grave accent è. See the following example:

<?php
 echo htmlentities (' ¢ '); // &cent;
 echo htmlentities (' © '); // &copy;
 echo htmlentities (' è '); // &egrave;
 
 echo htmlentities (' & '); // &amp;
 echo htmlentities (' " '); // &quot;
 echo htmlentities (" ' "); // '
 echo htmlentities (" ' ", ENT_QUOTES);// &#039;
 echo htmlentities (' < '); // &lt;
 echo htmlentities (' > '); // &gt; 

html_entity_decode()

<?php
 //Syntax
 html_entity_decode(
  string $string,
  int $flags = ENT_QUOTES | ENT_SUBSTITUTE | ENT_HTML401,
  ?string $encoding = null
 ): string

Using this function you can reverse the effect of the htmlentities() function. For example, the &copy; coverts to copyright symbol ©, the &cent; converts to cent sign ¢, or the &egrave; coverts to grave accent è. See the following example:

<?php
 echo html_entity_decode (' &copy; ');   // ©
 echo html_entity_decode (' &cent; ');   // ¢
 echo html_entity_decode (' &egrave; '); // è
 
 echo html_entity_decode (' &amp; ');  // &
 echo html_entity_decode (' &lt; ');   // <
 echo html_entity_decode (' &gt; ');   // >
 echo html_entity_decode (' &quot; '); // "
 echo html_entity_decode (' &#039; '); // &#039;
 echo html_entity_decode (' &#039; ', ENT_QUOTES); // ';

Flags

The above functions use a bitmask of one or more flags, which specify how to handle quotes, invalid code unit sequences, and the used document type. The default is ENT_QUOTES | ENT_SUBSTITUTE | ENT_HTML401.

  • ENT_COMPAT – Converts only double-quotes.
  • ENT_QUOTES – Converts both single and double quotes.
  • ENT_NOQUOTES – Doesn’t convert either single or double quotes.
  • ENT_IGNORE – Doesn’t convert anything.
  • ENT_SUBSTITUTE – Replaces invalid code with Unicode replacement characters instead of returning an empty string.
  • ENT_DISALLOWED – Replaces invalid code with Unicode replacement characters instead of leaving them as is.
  • ENT_HTML401 – Handles the code as HTML version 4.01.
  • ENT_XML1 – Handles the code as XML version 1.
  • ENT_XHTML – Handles the code as XHTML.
  • ENT_HTML5 – Handles the code as HTML5.

Double encoding

By default, htmlspecialchars(), htmlspecialchars_decode(), htmlentities(), and html_entity_decode() functions double encode existing character entities. As a result, &amp; is converted to &amp;amp; and &quot; is converted to &amp;quot;. You can use the double_encode named argument to turn off this default behavior, see the following example:

<?php
 echo htmlspecialchars('&'); //&amp;
 echo htmlentities ('&');    //&amp;
 
 echo htmlspecialchars('&amp;'); //&amp;amp;
 echo htmlentities('&amp;');     //&amp;amp;
 
 echo htmlspecialchars('&amp;',
           double_encode:false); //&amp;
 echo htmlentities('&amp;',
           double_encode:false); //&amp;

Working with Strings: