ZF2: Zend\Escaper progress

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

ZF2: Zend\Escaper progress

Pádraic Brady
Hi all,

Escaper RFC: http://framework.zend.com/wiki/display/ZFDEV2/RFC+-+Escaper+Class

I've implemented a minimal Zend\Escaper\Escaper class in line with the
RFC at https://github.com/padraic/zf2/tree/rfc/escaper. One small task
remaining for basic escaping is to swap the regexes used with the
correct hex value ranges (where necessary) so we're not too greedy
about what needs escaping.

We had a few discussions once the RFC went up as to dependency
requirements so I've settled on the following pending debate here.

1. It will require PHP to have access to a PCRE version compiled with
UTF-8 support.
2. For all character encodings except UTF-8, an exception will be
thrown where there is no access to either iconv or mbstring.

Personally, I'd expect a production server to easily meet both of
these requirements for PHP 5.3 but there are always exceptions to the
rule. Rather than attempt to create a fallback position which may or
may not have implications on the escaping security, I've opted to just
throw an exception and let users arrange their own backup plan. The
alternative is to replicate iconv's functionality which will likely be
a) a huge amount of code, b) prone to errors, and c) very very slow.
We could also make the component explicitly UTF-8 only however that
seems a bit silly and we have no means to enforce it.

Besides that, I am curious whether anyone has a good method of
detecting when PCRE has no unicode support? It would be nice to throw
an exception for that instead of relying on whatever happens otherwise
(presumably a pcre compile error or something).

Any other feedback welcome too.

Paddy

--
Pádraic Brady

http://blog.astrumfutura.com
http://www.survivethedeepend.com
Zend Framework Community Review Team
Reply | Threaded
Open this post in threaded view
|

Re: ZF2: Zend\Escaper progress

David Muir-2
Two questions:

What about escapeHtmlAttr()?
Why is escapeUri() renamed to escapeUrl()?

Cheers,
David


On 03/12/2012 10:47 PM, Pádraic Brady wrote:

> Hi all,
>
> Escaper RFC: http://framework.zend.com/wiki/display/ZFDEV2/RFC+-+Escaper+Class
>
> I've implemented a minimal Zend\Escaper\Escaper class in line with the
> RFC at https://github.com/padraic/zf2/tree/rfc/escaper. One small task
> remaining for basic escaping is to swap the regexes used with the
> correct hex value ranges (where necessary) so we're not too greedy
> about what needs escaping.
>
> We had a few discussions once the RFC went up as to dependency
> requirements so I've settled on the following pending debate here.
>
> 1. It will require PHP to have access to a PCRE version compiled with
> UTF-8 support.
> 2. For all character encodings except UTF-8, an exception will be
> thrown where there is no access to either iconv or mbstring.
>
> Personally, I'd expect a production server to easily meet both of
> these requirements for PHP 5.3 but there are always exceptions to the
> rule. Rather than attempt to create a fallback position which may or
> may not have implications on the escaping security, I've opted to just
> throw an exception and let users arrange their own backup plan. The
> alternative is to replicate iconv's functionality which will likely be
> a) a huge amount of code, b) prone to errors, and c) very very slow.
> We could also make the component explicitly UTF-8 only however that
> seems a bit silly and we have no means to enforce it.
>
> Besides that, I am curious whether anyone has a good method of
> detecting when PCRE has no unicode support? It would be nice to throw
> an exception for that instead of relying on whatever happens otherwise
> (presumably a pcre compile error or something).
>
> Any other feedback welcome too.
>
> Paddy
>

Reply | Threaded
Open this post in threaded view
|

Re: ZF2: Zend\Escaper progress

Stewart Lord
In reply to this post by Pádraic Brady
Hi Padraic,

This looks good. I think it will be faster than what we are currently
using in-house.

Your exception message on line 59 seems wrong -- it appears to allow
null, just not empty string.

I'm curious why you are using sprintf, seems like simple concatenation
would work here. Also curious why you are using substr in that manner
instead of str_pad.

For detecting PCRE unicode support I found this, but there were some
reports of it not working 100% of the time:

<?php
if (@preg_match('/\pL/u', 'a') == 1) {
echo "PCRE unicode support is turned on.\n";
} else {
echo "PCRE unicode support is turned off.\n";
}
?>

I will be interested to see how you tweak the regexes.

Stew


On 3/12/12 4:47 AM, Pádraic Brady wrote:

> Hi all,
>
> Escaper RFC: http://framework.zend.com/wiki/display/ZFDEV2/RFC+-+Escaper+Class
>
> I've implemented a minimal Zend\Escaper\Escaper class in line with the
> RFC at https://github.com/padraic/zf2/tree/rfc/escaper. One small task
> remaining for basic escaping is to swap the regexes used with the
> correct hex value ranges (where necessary) so we're not too greedy
> about what needs escaping.
>
> We had a few discussions once the RFC went up as to dependency
> requirements so I've settled on the following pending debate here.
>
> 1. It will require PHP to have access to a PCRE version compiled with
> UTF-8 support.
> 2. For all character encodings except UTF-8, an exception will be
> thrown where there is no access to either iconv or mbstring.
>
> Personally, I'd expect a production server to easily meet both of
> these requirements for PHP 5.3 but there are always exceptions to the
> rule. Rather than attempt to create a fallback position which may or
> may not have implications on the escaping security, I've opted to just
> throw an exception and let users arrange their own backup plan. The
> alternative is to replicate iconv's functionality which will likely be
> a) a huge amount of code, b) prone to errors, and c) very very slow.
> We could also make the component explicitly UTF-8 only however that
> seems a bit silly and we have no means to enforce it.
>
> Besides that, I am curious whether anyone has a good method of
> detecting when PCRE has no unicode support? It would be nice to throw
> an exception for that instead of relying on whatever happens otherwise
> (presumably a pcre compile error or something).
>
> Any other feedback welcome too.
>
> Paddy
>

Reply | Threaded
Open this post in threaded view
|

Re: ZF2: Zend\Escaper progress

Antoine Hedgecock
I have not checked the coding standards for zf2 but should we not follow Matthews example for the kind of error handling ?

E.g prefixing a function with @

Here is his blogpost

Best regards
Antoine Hedgecock
Senior developer / Server technician 
PMG Media Group AB
Tel: currently unavailable

On Mar 13, 2012, at 4:45 AM, Stewart Lord wrote:

Hi Padraic,

This looks good. I think it will be faster than what we are currently using in-house.

Your exception message on line 59 seems wrong -- it appears to allow null, just not empty string.

I'm curious why you are using sprintf, seems like simple concatenation would work here. Also curious why you are using substr in that manner instead of str_pad.

For detecting PCRE unicode support I found this, but there were some reports of it not working 100% of the time:

<?php
if (@preg_match('/\pL/u', 'a') == 1) {
echo "PCRE unicode support is turned on.\n";
} else {
echo "PCRE unicode support is turned off.\n";
}
?>

I will be interested to see how you tweak the regexes.

Stew


On 3/12/12 4:47 AM, Pádraic Brady wrote:
Hi all,

Escaper RFC: http://framework.zend.com/wiki/display/ZFDEV2/RFC+-+Escaper+Class

I've implemented a minimal Zend\Escaper\Escaper class in line with the
RFC at https://github.com/padraic/zf2/tree/rfc/escaper. One small task
remaining for basic escaping is to swap the regexes used with the
correct hex value ranges (where necessary) so we're not too greedy
about what needs escaping.

We had a few discussions once the RFC went up as to dependency
requirements so I've settled on the following pending debate here.

1. It will require PHP to have access to a PCRE version compiled with
UTF-8 support.
2. For all character encodings except UTF-8, an exception will be
thrown where there is no access to either iconv or mbstring.

Personally, I'd expect a production server to easily meet both of
these requirements for PHP 5.3 but there are always exceptions to the
rule. Rather than attempt to create a fallback position which may or
may not have implications on the escaping security, I've opted to just
throw an exception and let users arrange their own backup plan. The
alternative is to replicate iconv's functionality which will likely be
a) a huge amount of code, b) prone to errors, and c) very very slow.
We could also make the component explicitly UTF-8 only however that
seems a bit silly and we have no means to enforce it.

Besides that, I am curious whether anyone has a good method of
detecting when PCRE has no unicode support? It would be nice to throw
an exception for that instead of relying on whatever happens otherwise
(presumably a pcre compile error or something).

Any other feedback welcome too.

Paddy



Reply | Threaded
Open this post in threaded view
|

Re: ZF2: Zend\Escaper progress

weierophinney
Administrator
In reply to this post by Stewart Lord
-- Stewart Lord <[hidden email]> wrote
(on Monday, 12 March 2012, 08:45 PM -0700):
> I'm curious why you are using sprintf, seems like simple
> concatenation would work here.

sprintf generally operates faster than string concat, and is also often
a bit more readable -- you can get the gist of the string, and
understand where substitutions will be made more quickly. It's a
personal preference only, however.

--
Matthew Weier O'Phinney
Project Lead            | [hidden email]
Zend Framework          | http://framework.zend.com/
PGP key: http://framework.zend.com/zf-matthew-pgp-key.asc
Reply | Threaded
Open this post in threaded view
|

Re: ZF2: Zend\Escaper progress

Pádraic Brady
In reply to this post by David Muir-2
On Mon, Mar 12, 2012 at 11:49 PM, David Muir <[hidden email]> wrote:
> Two questions:
>
> What about escapeHtmlAttr()?

Will be added shortly - I'm taking the slightly different route where
it avoids using htmlspecialchars() altogether, offering users a simple
choice - use escapeHtml() where attributes values are definitely
quoted or use escapeHtmlAttr() where quoting may be unreliable or
where you just want to eliminate all doubt.

> Why is escapeUri() renamed to escapeUrl()?

Must have been from the overloaded neurons that wrote my new blog
article on htmlspecialchars() and XSS ;).

http://blog.astrumfutura.com/2012/03/a-hitchhikers-guide-to-cross-site-scripting-xss-in-php-part-1-how-not-to-use-htmlspecialchars-for-output-escaping/

Paddy

> Cheers,
> David
>
>
> On 03/12/2012 10:47 PM, Pádraic Brady wrote:
>> Hi all,
>>
>> Escaper RFC: http://framework.zend.com/wiki/display/ZFDEV2/RFC+-+Escaper+Class
>>
>> I've implemented a minimal Zend\Escaper\Escaper class in line with the
>> RFC at https://github.com/padraic/zf2/tree/rfc/escaper. One small task
>> remaining for basic escaping is to swap the regexes used with the
>> correct hex value ranges (where necessary) so we're not too greedy
>> about what needs escaping.
>>
>> We had a few discussions once the RFC went up as to dependency
>> requirements so I've settled on the following pending debate here.
>>
>> 1. It will require PHP to have access to a PCRE version compiled with
>> UTF-8 support.
>> 2. For all character encodings except UTF-8, an exception will be
>> thrown where there is no access to either iconv or mbstring.
>>
>> Personally, I'd expect a production server to easily meet both of
>> these requirements for PHP 5.3 but there are always exceptions to the
>> rule. Rather than attempt to create a fallback position which may or
>> may not have implications on the escaping security, I've opted to just
>> throw an exception and let users arrange their own backup plan. The
>> alternative is to replicate iconv's functionality which will likely be
>> a) a huge amount of code, b) prone to errors, and c) very very slow.
>> We could also make the component explicitly UTF-8 only however that
>> seems a bit silly and we have no means to enforce it.
>>
>> Besides that, I am curious whether anyone has a good method of
>> detecting when PCRE has no unicode support? It would be nice to throw
>> an exception for that instead of relying on whatever happens otherwise
>> (presumably a pcre compile error or something).
>>
>> Any other feedback welcome too.
>>
>> Paddy
>>
>



--
Pádraic Brady

http://blog.astrumfutura.com
http://www.survivethedeepend.com
Zend Framework Community Review Team
Reply | Threaded
Open this post in threaded view
|

Re: ZF2: Zend\Escaper progress

Pádraic Brady
In reply to this post by Stewart Lord
On Tue, Mar 13, 2012 at 3:45 AM, Stewart Lord <[hidden email]> wrote:
> Hi Padraic,
>
> This looks good. I think it will be faster than what we are currently using
> in-house.
>
> Your exception message on line 59 seems wrong -- it appears to allow null,
> just not empty string.

I should probably just merge both and use the invalid encoding value
message since they are all not allowed.

> I'm curious why you are using sprintf, seems like simple concatenation would
> work here. Also curious why you are using substr in that manner instead of
> str_pad.

I used sprintf since it better clarifies what the string prototype is
(vs concat which tends to make this less obvious). Personal preference
;). Same for substr-  using str_pad probably is cleaner but it's also
longer since I need to pad right.

> For detecting PCRE unicode support I found this, but there were some reports
> of it not working 100% of the time:
>
> <?php
> if (@preg_match('/\pL/u', 'a') == 1) {
> echo "PCRE unicode support is turned on.\n";
> } else {
> echo "PCRE unicode support is turned off.\n";
> }
> ?>

It should do fine. Pity there no absolutely certain way...

> I will be interested to see how you tweak the regexes.

I hate regex ;). Shouldn't be any large changes though. I know some
alternatives allow for non-ASCII range UTF-8 alphanumerics in addition
but this is not allowed under the OWASP recommendations.

Will know more when I expand the tests to cover the OWASP defined
character ranges.

Paddy


--
Pádraic Brady

http://blog.astrumfutura.com
http://www.survivethedeepend.com
Zend Framework Community Review Team
Reply | Threaded
Open this post in threaded view
|

Re: ZF2: Zend\Escaper progress

weierophinney
Administrator
-- Pádraic Brady <[hidden email]> wrote
(on Tuesday, 13 March 2012, 05:04 PM +0000):

> On Tue, Mar 13, 2012 at 3:45 AM, Stewart Lord <[hidden email]> wrote:
> > For detecting PCRE unicode support I found this, but there were some reports
> > of it not working 100% of the time:
> >
> > <?php
> > if (@preg_match('/\pL/u', 'a') == 1) {
> > echo "PCRE unicode support is turned on.\n";
> > } else {
> > echo "PCRE unicode support is turned off.\n";
> > }
> > ?>
>
> It should do fine. Pity there no absolutely certain way...

I probably don't need to tell you this, but if you go this route, don't
use error suppression, but instead use a temporary error handler.

--
Matthew Weier O'Phinney
Project Lead            | [hidden email]
Zend Framework          | http://framework.zend.com/
PGP key: http://framework.zend.com/zf-matthew-pgp-key.asc
Reply | Threaded
Open this post in threaded view
|

Re: ZF2: Zend\Escaper progress

Stewart Lord
In reply to this post by Stewart Lord
On 12-03-13 7:31 AM, Pádraic Brady wrote:
> I used sprintf since it better clarifies what the string prototype is
> (vs concat which tends to make this less obvious). Personal preference
> ;). Same for substr-  using str_pad probably is cleaner but it's also
> longer since I need to pad right.

Thanks for the explanation.

> It should do fine. Pity there no absolutely certain way...

Agreed. Maybe someone knows a better way?

>> I will be interested to see how you tweak the regexes.
>
> I hate regex ;). Shouldn't be any large changes though. I know some
> alternatives allow for non-ASCII range UTF-8 alphanumerics in addition
> but this is not allowed under the OWASP recommendations.

It might depend on the context. For attribute escaping (which I'm
assuming you'll add) OWASP says you don't need to escape anything over
256. You only need to escape 0-256 excluding alphanumeric characters.

Not sure how best to exclude >256 from your match. Maybe you could make
a range from [0x100-10FFFF]? Or, you could just have an early exit in
your callback for that case.

> Will know more when I expand the tests to cover the OWASP defined
> character ranges.

Sounds good!

Stew