Friday, December 21, 2007

Output Character Encoding for Security - PHP

Objective: Allow for untrusted data to be displayed by the browser to the client without the threat of malicious javascript
(ie cross site scripting) or html characters.

Here's a quick snippet of code that can be used in php to encode text which will be displayed in the browser. Since we are using multiple layers of security (ie defense in depth) the output text should have been already vetted with a white list filter when originally supplied to the page (see previous article).

This code will perform two actions.

  1. The supplied data will be converted to utf8 (utf8_encode). Unless specifically required, there is no reason for the code to deal with the nuances and issues with other character formats. You may not want to take this action if you are supporting international text.

  2. We then apply the php function htmlentities. This will convert the supplied text into a html entities which will be displayed accurately on the page, but not interpreted by the browser. This will prevent the supplied text being interpreted as valid html or javascript.

function encode($dirty_data){
$encoded_data=htmlentities($utf8_dirty, ENT_QUOTES);
return $encoded_data;


I've created a basic page which accepts an url argument, applies the encode function, and then displays the data to the page.


<?php echo encode($_GET['arg']); ?>

The following URL contains a basic cross site scripting attack against this page.


By using the encode function the supplied data is safely displayed to the screen as:


Viewing the source of the page will show that the characters have been safely encoded as follows:


(Note that the underscore character _ will not actually be present. I added this so wordpress would stop interpreting the characters for this example)

This code can be used to safely allow user data to be displayed on the client browser. This code will safely encode characters to prevent html modification cross site scripting attacks. However, this code would not allow the user to supply any sort of html tags such as <b></b> for bold or <i></i> for italics. If rich text formating is desired then I would recommend a more robust filtering solution. Take a look at the OWASP AntiSammy project for more info on safely accepting rich text formating

-Michael Coates

Wednesday, December 19, 2007

Cross Site Scripting blacklist vs whitelist

Here are some examples of why blacklists are not effective to protect against cross site scripting attacks. It is always recommended to utilize white list filtering instead. These may be simple examples, but they illustrate the larger issue. Examples are in PHP.

Test 1

Lets start with the basics of filtering for <script> and </script>

Blacklist Filter

$pattern = '/<script>|<\/script>/i';
$replacement = '';
$clean_data= preg_replace($pattern, $replacement, $dirty_data);

XSS which bypasses filter



The filter replaces an instance of <script> or </script> with nothing. By nesting <script> within itself ( <scrip<script>> ) the script will remove the first occurrence of the term <script> and then leave a remaining, and now complete, term of <script>

<scrip<script>t>alert('xss')</sc</script>ript> --> <script>alert('xss')</script>

Test 2

Lets modify the filter to prevent the nesting of 'script ' by substituting something for the removed value. Now, the nested script of <scrip<script>t> will become <scripxt>, which will not be executed as javascript by the browser.

Blacklist Filter

$pattern = '/<script>|<\/script>/i';
$replacement = 'x';
$clean_data= preg_replace($pattern, $replacement, $dirty_data);

XSS which bypasses filter

http://localhost/javascript/blacklist_test.php?arg=<img src="" onerror=alert(/xss/)>


Cross site scripting attacks don't have to be launched from a simple <script></script>. They can also be executed as part of an img request. As we see here, this doesn't require the script tags at all and passes through the filter fine.

The point of these two simple examples is to demonstrate that for each black list filter there is a crafty method of bypassing the control. If the above attacks hadn't worked here are a few more.

http://localhost/javascript/blacklist_test.php?arg= <BODY ONLOAD=alert('XSS')>
http://localhost/javascript/blacklist_test.php?arg=<IFRAME src="javascript:alert('XSS');"></IFRAME>

Check out to get a better idea of the number of variations possible for XSS attacks


Don't rely on black list filtering. For every regular expression you create, someone will find a crafty way to bypass it. This method puts you in an endless mode of reacting to new attack vectors. Instead, use white list filtering. Only allow the types of characters which are required for the particular portion of the application. If the app needs numbers and letters then specifically allow text which contains numbers and letters, if there is anything else then strip it.

Here is an example whitelist function for php which allows alpha numeric characters only

function whitelist($dirty_data)

$dirty_array = str_split($dirty_data);
foreach($dirty_array as $char)
$clean_char=preg_replace( "/[^a-zA-Z0-9_]/", "", $char );

return $clean_data;

Remember, black lists are less effective and require more work. Always go with white list filtering.

-Michael Coates

Monday, December 17, 2007

Insecurities of PIN and SSN

I was standing inline at a bookstore today waiting to purchase a few Christmas gifts. The women in front of me made her purchase with a debit card and had to enter the PIN. With nothing else to do while waiting inline except to stare forward I easily observed the 4 digit pin. 2592 if I still remember correctly. This is just another reminder of the insignificant effectiveness of current security controls. We establish PINs for the purpose of securing the use of a debit card, but then provide the PIN in a relatively public fashion.

The same holds true with Social Security Numbers. How many times has someone asked you over the phone to verify the last 4 digits of your social security number? Has it occurred to you how pointless this is? In most cases, the caller will loudly reply with the 4 numbers to verify their identity. At this point, everyone in the vicinity knows the last 4 of the SSN and likely the person's full name too. If one of these people wanted to call that same number, they would have all of the information to impersonate the original caller. Again we have established a secret key used to identify the user, but provide this secret key in a public medium.

Here are a few suggestions to fix these problems. For the PIN number, simply use a digital screen to enter the numbers. On each use the ordering of the number changes, similar to the method used by some online banks. Combine this with a privacy screen which prevents those nearby from seeing the number layout and you have a much more secure solution.

While social security numbers themselves are not a good authentication item, we can still secure the transmission of them until a better solution is put in place. Instead of having the individual say the last 4 digits of the SSN to the other party, require the user to enter the 4 numbers into the phone. The party on the other end can decode the button presses as we see in many other phone applications. This system would prevent the disclosure of the SSN to people within earshot of the original caller.

Until we start to apply basic security to our most common uses of sensitive information we cannot expect to live without compromises of credit cards and loss of information used for identity theft.

-Michael Coates