Escaping Strings for HTML
<?php $input = '<script>alert("I have a bad Föhnwelle...");</script>'; echo htmlspecialchars($input); /*Prints: <script>alert("I have a bad Föhnwelle...");</script>*/ echo htmlentities($input); /*Prints: <script>alert("I have a bad Föhnwelle...");</script>*/
Here, it is important to remove certain HTML markup. To make a long story short: It is almost impossible to really catch all attempts to inject JavaScript into data. It’s not only always done using the <script>
tag, but also in other HTML elements, such as <img onabort="badCode()" />
. Therefore, in most cases, all HTML must be removed.
The easiest way to do so is to call htmlspecialchars()
; this converts the string into HTML, including the replacement of all <
and >
characters by <
and >
. Another option is to call htmlentities()
. This uses HTML entities for characters, if available. The preceding code shows the differences between these two methods. The German ö
(o umlaut) is not converted by htmlspecialchars()
; however, htmlentities()
replaces it by its entity ö
.
The use of htmlspecialchars()
and htmlentities()
just outputs what the user entered in the browser. So if the user entered HTML markup, this very markup is shown. So htmlspecialchars()
and htmlentities()
please the browser, but might not please the user.
If you, however, want to prepare strings to be used within URLs, you have to use urlencode()
to properly encode special characters such as the space character that can be used in URLs.
Removing All HTML Tags
<?php //Syntax strip_tags(string $string, array|string|null $allowed_tags = null ): string
The function strip_tags()
does completely get rid of all HTML elements. If you just want to keep some elements (for example, some limited formatting functionalities with <b>
and <i>
and <br>
tags), you provide a list of allowed values in the second parameter for strip_tags()
.
The following script shows this; the figure depicts its output. As you can see, all unwanted HTML tags have been removed; however, their contents are still there.
<?php $text = 'A commonly used web <i>attack</i> is called<br> Cross-Site Scripting <b>XSS</b>.<br> For example:<br> <script>alert("Nice try!");</script> <img src="explicit.jpg">'; echo strip_tags($text, '<br><i><b>');
Working with Strings: