|
Hi all,
I've been doing a bit of work recently with Composer and find myself, once again, faced with a need to reorganise the use of sequential HTTP requests into something more asynchronous. Since this has become a reoccuring theme in my life, I'd like to float the idea of (at some point - just floating the concept for now since I value opinions) reworking Zend\Http\Client into something capable of working with parallel HTTP requests. I've been scoping this out based on some existing code I have mixed with some of Zend\Http\Client's classes. Obviously, despite the similar APIs of the underlying classes, it would still be a severe departure from how things normally look with a synchronous client since it requires a pretty significant shift in how responses are handled. For those who are familiar with it, it looks a lot like http.request from node.js. To address the "why", synchronous requests are simply slow. For a small count of requests, it's not a big deal but once you need to handle more than a few requests, the time wasted while you wait for responses starts to pile up in a linear fashion. Worse, those inefficiencies are then built into everything else which relies on the synchronous client. REST APIs, multiple GETs, and API navigation fall into relying on looped logic to extract data which may spawn even more blocking requests. In an asynchronous client, the only loop that exists should be the the one which is looping across active connections in search of a completed response to take action on (usually via a relevant callback). Since these callbacks are executed as soon as a completed response is located - they can be performed while waiting for other slower requests to complete. Bearing in mind this example has expanded closures (you could just as easily use a closure variable or a class/method array to remove the duplication), here's what it might look like applied to my current problem for Composer: $requestPool = new Pool; $request1 = new Request('http://pear.php.net/rest/c/categories.xml'); $request1->on('complete', function($response, $pool) { $packages = array(); $content = $response->getContent(); // Grab package list from categories as $packages and request them too foreach ($packages as $package) { $request = new Request($packages->uri); $request->on('complete', function($response, $pool) { // do something with the package data ... }); $pool->attach($request); // attach all the new requests to the pool } }); $request2 = new Request('http://pear.phpunit.de/rest/c/categories.xml'); $request2->on('complete', function($response, $pool) { $packages = array(); $content = $response->getContent(); // Grab package list from categories as $packages and request them too foreach ($packages as $package) { $request = new Request($packages->uri); $request->on('complete', function($response, $pool) { // do something with the package data ... }); $pool->attach($request); // attach all the new requests to the pool } }); $requestPool->attach(array($request1, $request2))->execute(); If you can follow that you can take away two things: 1. The Request Pool persists across the entire operation ready to pick up new requests and execute them asynchronously. 2. Basic response handling is attached directly to a pooled request via a callback (closure, closure ref or class method call). 3. The existence of a Request Pool might make it very easy to integrate caching, unified header handling, and static responses (i.e. deal with duplicate requests arising from your business logic). 4. It looks freaky (I know!). The real annoyance I'm solving here is that asynchronous support is pretty scarce in PHP libraries and when it is introduced it's either done half-cocked as a simple batch-processor - which still blocks while waiting for a slow response and still needs special handling and other code by the end user. Usually, such support is bare bones - barely encapsulating cURL's functions. Also, the solution I'm working on actually uses PHP streams instead of cURL - making it possible to share a socket pool with synchronous clients to facilitate keep-alives and such. The suggested API is probably a bit too rough for PHP users since you'd end up attaching callbacks for errors, pre-request, chunk receipts, etc. so the Pool could probably accept default callbacks to narrow that down. Anyway, food for thought. Might be something worth looking into when we're less busy on ZF2. I'm hacking away at it and will let you know when I have something actually functional to look at. -- Pádraic Brady http://blog.astrumfutura.com http://www.survivethedeepend.com Zend Framework Community Review Team -- List: [hidden email] Info: http://framework.zend.com/archives Unsubscribe: [hidden email] |
|
P.S. I'm not particularly pushing this for ZF2 in case people are
panicking :). It can sit around externally until ZF3 or ZF 2.1 or whenever is suitable. Just raising it because I need it for myself so I'll have a chunk of its code sitting around soon. On Wed, Feb 8, 2012 at 9:32 PM, Pádraic Brady <[hidden email]> wrote: > Hi all, > > I've been doing a bit of work recently with Composer and find myself, > once again, faced with a need to reorganise the use of sequential HTTP > requests into something more asynchronous. Since this has become a > reoccuring theme in my life, I'd like to float the idea of (at some > point - just floating the concept for now since I value opinions) > reworking Zend\Http\Client into something capable of working with > parallel HTTP requests. I've been scoping this out based on some > existing code I have mixed with some of Zend\Http\Client's classes. > Obviously, despite the similar APIs of the underlying classes, it > would still be a severe departure from how things normally look with a > synchronous client since it requires a pretty significant shift in how > responses are handled. For those who are familiar with it, it looks a > lot like http.request from node.js. > > To address the "why", synchronous requests are simply slow. For a > small count of requests, it's not a big deal but once you need to > handle more than a few requests, the time wasted while you wait for > responses starts to pile up in a linear fashion. Worse, those > inefficiencies are then built into everything else which relies on the > synchronous client. REST APIs, multiple GETs, and API navigation fall > into relying on looped logic to extract data which may spawn even more > blocking requests. In an asynchronous client, the only loop that > exists should be the the one which is looping across active > connections in search of a completed response to take action on > (usually via a relevant callback). Since these callbacks are executed > as soon as a completed response is located - they can be performed > while waiting for other slower requests to complete. > > Bearing in mind this example has expanded closures (you could just as > easily use a closure variable or a class/method array to remove the > duplication), here's what it might look like applied to my current > problem for Composer: > > $requestPool = new Pool; > > $request1 = new Request('http://pear.php.net/rest/c/categories.xml'); > $request1->on('complete', function($response, $pool) { > $packages = array(); > $content = $response->getContent(); > // Grab package list from categories as $packages and request them too > foreach ($packages as $package) { > $request = new Request($packages->uri); > $request->on('complete', function($response, $pool) { > // do something with the package data ... > }); > $pool->attach($request); // attach all the new requests to the pool > } > }); > > $request2 = new Request('http://pear.phpunit.de/rest/c/categories.xml'); > $request2->on('complete', function($response, $pool) { > $packages = array(); > $content = $response->getContent(); > // Grab package list from categories as $packages and request them too > foreach ($packages as $package) { > $request = new Request($packages->uri); > $request->on('complete', function($response, $pool) { > // do something with the package data ... > }); > $pool->attach($request); // attach all the new requests to the pool > } > }); > > $requestPool->attach(array($request1, $request2))->execute(); > > If you can follow that you can take away two things: > > 1. The Request Pool persists across the entire operation ready to pick > up new requests and execute them asynchronously. > 2. Basic response handling is attached directly to a pooled request > via a callback (closure, closure ref or class method call). > 3. The existence of a Request Pool might make it very easy to > integrate caching, unified header handling, and static responses (i.e. > deal with duplicate requests arising from your business logic). > 4. It looks freaky (I know!). > > The real annoyance I'm solving here is that asynchronous support is > pretty scarce in PHP libraries and when it is introduced it's either > done half-cocked as a simple batch-processor - which still blocks > while waiting for a slow response and still needs special handling and > other code by the end user. Usually, such support is bare bones - > barely encapsulating cURL's functions. Also, the solution I'm working > on actually uses PHP streams instead of cURL - making it possible to > share a socket pool with synchronous clients to facilitate keep-alives > and such. > > The suggested API is probably a bit too rough for PHP users since > you'd end up attaching callbacks for errors, pre-request, chunk > receipts, etc. so the Pool could probably accept default callbacks to > narrow that down. > > Anyway, food for thought. Might be something worth looking into when > we're less busy on ZF2. I'm hacking away at it and will let you know > when I have something actually functional to look at. > > -- > Pádraic Brady > > http://blog.astrumfutura.com > http://www.survivethedeepend.com > Zend Framework Community Review Team -- Pádraic Brady http://blog.astrumfutura.com http://www.survivethedeepend.com Zend Framework Community Review Team -- List: [hidden email] Info: http://framework.zend.com/archives Unsubscribe: [hidden email] |
|
I think it'd be great to have, personally. I was looking at ways of doing this the other day, myself. With a lot of service endpoints that people are connecting to, giving them the option of being able to pull that data while the application is doing other things or getting other pieces of data (I wouldn't know how you've implemented it) would help make for a much more responsive application.
Kevin Schroeder Technology Evangelist Zend Technologies, Ltd. www.zend.com www.twitter.com/kpschrade www.eschrade.com Skype: kevin.schroeder -----Original Message----- From: Pádraic Brady [mailto:[hidden email]] Sent: Wednesday, February 08, 2012 3:41 PM To: Zend Framework Contributors Subject: [zf-contributors] Re: Asynchronous HTTP Client concept P.S. I'm not particularly pushing this for ZF2 in case people are panicking :). It can sit around externally until ZF3 or ZF 2.1 or whenever is suitable. Just raising it because I need it for myself so I'll have a chunk of its code sitting around soon. On Wed, Feb 8, 2012 at 9:32 PM, Pádraic Brady <[hidden email]> wrote: > Hi all, > > I've been doing a bit of work recently with Composer and find myself, > once again, faced with a need to reorganise the use of sequential HTTP > requests into something more asynchronous. Since this has become a > reoccuring theme in my life, I'd like to float the idea of (at some > point - just floating the concept for now since I value opinions) > reworking Zend\Http\Client into something capable of working with > parallel HTTP requests. I've been scoping this out based on some > existing code I have mixed with some of Zend\Http\Client's classes. > Obviously, despite the similar APIs of the underlying classes, it > would still be a severe departure from how things normally look with a > synchronous client since it requires a pretty significant shift in how > responses are handled. For those who are familiar with it, it looks a > lot like http.request from node.js. > > To address the "why", synchronous requests are simply slow. For a > small count of requests, it's not a big deal but once you need to > handle more than a few requests, the time wasted while you wait for > responses starts to pile up in a linear fashion. Worse, those > inefficiencies are then built into everything else which relies on the > synchronous client. REST APIs, multiple GETs, and API navigation fall > into relying on looped logic to extract data which may spawn even more > blocking requests. In an asynchronous client, the only loop that > exists should be the the one which is looping across active > connections in search of a completed response to take action on > (usually via a relevant callback). Since these callbacks are executed > as soon as a completed response is located - they can be performed > while waiting for other slower requests to complete. > > Bearing in mind this example has expanded closures (you could just as > easily use a closure variable or a class/method array to remove the > duplication), here's what it might look like applied to my current > problem for Composer: > > $requestPool = new Pool; > > $request1 = new Request('http://pear.php.net/rest/c/categories.xml'); > $request1->on('complete', function($response, $pool) { > $packages = array(); > $content = $response->getContent(); > // Grab package list from categories as $packages and request them too > foreach ($packages as $package) { > $request = new Request($packages->uri); > $request->on('complete', function($response, $pool) { > // do something with the package data ... > }); > $pool->attach($request); // attach all the new requests to the pool > } > }); > > $request2 = new Request('http://pear.phpunit.de/rest/c/categories.xml'); > $request2->on('complete', function($response, $pool) { > $packages = array(); > $content = $response->getContent(); > // Grab package list from categories as $packages and request them too > foreach ($packages as $package) { > $request = new Request($packages->uri); > $request->on('complete', function($response, $pool) { > // do something with the package data ... > }); > $pool->attach($request); // attach all the new requests to the pool > } > }); > > $requestPool->attach(array($request1, $request2))->execute(); > > If you can follow that you can take away two things: > > 1. The Request Pool persists across the entire operation ready to pick > up new requests and execute them asynchronously. > 2. Basic response handling is attached directly to a pooled request > via a callback (closure, closure ref or class method call). > 3. The existence of a Request Pool might make it very easy to > integrate caching, unified header handling, and static responses (i.e. > deal with duplicate requests arising from your business logic). > 4. It looks freaky (I know!). > > The real annoyance I'm solving here is that asynchronous support is > pretty scarce in PHP libraries and when it is introduced it's either > done half-cocked as a simple batch-processor - which still blocks > while waiting for a slow response and still needs special handling and > other code by the end user. Usually, such support is bare bones - > barely encapsulating cURL's functions. Also, the solution I'm working > on actually uses PHP streams instead of cURL - making it possible to > share a socket pool with synchronous clients to facilitate keep-alives > and such. > > The suggested API is probably a bit too rough for PHP users since > you'd end up attaching callbacks for errors, pre-request, chunk > receipts, etc. so the Pool could probably accept default callbacks to > narrow that down. > > Anyway, food for thought. Might be something worth looking into when > we're less busy on ZF2. I'm hacking away at it and will let you know > when I have something actually functional to look at. > > -- > Pádraic Brady > > http://blog.astrumfutura.com > http://www.survivethedeepend.com > Zend Framework Community Review Team -- Pádraic Brady http://blog.astrumfutura.com http://www.survivethedeepend.com Zend Framework Community Review Team -- List: [hidden email] Info: http://framework.zend.com/archives Unsubscribe: [hidden email] -- List: [hidden email] Info: http://framework.zend.com/archives Unsubscribe: [hidden email] |
|
In reply to this post by Pádraic Brady
Hi Pádraic,
your idea is very interesting! Some times ago I was investigating how to execute multiple PHP code in parallel and I found some experimental solutions (as you know PHP is not multi-threading). Regarding the multiple HTTP requests I think we can investigate the possibility to implement something like the code that you proposed using a special "parallel adapter" for the Zend\Http\Client (i'm thinking to the cURL extension with curl_multi_* feature). But I think this way must be well experimented before to propose something in ZF2 or ZF3. Anyway, i think that's a good idea for the future and we should start to investigate. Regards, Enrico Zimuel On Wed, 2012-02-08 at 21:32 +0000, Pádraic Brady wrote: > Hi all, > > I've been doing a bit of work recently with Composer and find myself, > once again, faced with a need to reorganise the use of sequential HTTP > requests into something more asynchronous. Since this has become a > reoccuring theme in my life, I'd like to float the idea of (at some > point - just floating the concept for now since I value opinions) > reworking Zend\Http\Client into something capable of working with > parallel HTTP requests. I've been scoping this out based on some > existing code I have mixed with some of Zend\Http\Client's classes. > Obviously, despite the similar APIs of the underlying classes, it > would still be a severe departure from how things normally look with a > synchronous client since it requires a pretty significant shift in how > responses are handled. For those who are familiar with it, it looks a > lot like http.request from node.js. > > To address the "why", synchronous requests are simply slow. For a > small count of requests, it's not a big deal but once you need to > handle more than a few requests, the time wasted while you wait for > responses starts to pile up in a linear fashion. Worse, those > inefficiencies are then built into everything else which relies on the > synchronous client. REST APIs, multiple GETs, and API navigation fall > into relying on looped logic to extract data which may spawn even more > blocking requests. In an asynchronous client, the only loop that > exists should be the the one which is looping across active > connections in search of a completed response to take action on > (usually via a relevant callback). Since these callbacks are executed > as soon as a completed response is located - they can be performed > while waiting for other slower requests to complete. > > Bearing in mind this example has expanded closures (you could just as > easily use a closure variable or a class/method array to remove the > duplication), here's what it might look like applied to my current > problem for Composer: > > $requestPool = new Pool; > > $request1 = new Request('http://pear.php.net/rest/c/categories.xml'); > $request1->on('complete', function($response, $pool) { > $packages = array(); > $content = $response->getContent(); > // Grab package list from categories as $packages and request them too > foreach ($packages as $package) { > $request = new Request($packages->uri); > $request->on('complete', function($response, $pool) { > // do something with the package data ... > }); > $pool->attach($request); // attach all the new requests to the pool > } > }); > > $request2 = new Request('http://pear.phpunit.de/rest/c/categories.xml'); > $request2->on('complete', function($response, $pool) { > $packages = array(); > $content = $response->getContent(); > // Grab package list from categories as $packages and request them too > foreach ($packages as $package) { > $request = new Request($packages->uri); > $request->on('complete', function($response, $pool) { > // do something with the package data ... > }); > $pool->attach($request); // attach all the new requests to the pool > } > }); > > $requestPool->attach(array($request1, $request2))->execute(); > > If you can follow that you can take away two things: > > 1. The Request Pool persists across the entire operation ready to pick > up new requests and execute them asynchronously. > 2. Basic response handling is attached directly to a pooled request > via a callback (closure, closure ref or class method call). > 3. The existence of a Request Pool might make it very easy to > integrate caching, unified header handling, and static responses (i.e. > deal with duplicate requests arising from your business logic). > 4. It looks freaky (I know!). > > The real annoyance I'm solving here is that asynchronous support is > pretty scarce in PHP libraries and when it is introduced it's either > done half-cocked as a simple batch-processor - which still blocks > while waiting for a slow response and still needs special handling and > other code by the end user. Usually, such support is bare bones - > barely encapsulating cURL's functions. Also, the solution I'm working > on actually uses PHP streams instead of cURL - making it possible to > share a socket pool with synchronous clients to facilitate keep-alives > and such. > > The suggested API is probably a bit too rough for PHP users since > you'd end up attaching callbacks for errors, pre-request, chunk > receipts, etc. so the Pool could probably accept default callbacks to > narrow that down. > > Anyway, food for thought. Might be something worth looking into when > we're less busy on ZF2. I'm hacking away at it and will let you know > when I have something actually functional to look at. > > -- > Pádraic Brady > > http://blog.astrumfutura.com > http://www.survivethedeepend.com > Zend Framework Community Review Team > -- Enrico Zimuel Senior PHP Engineer | [hidden email] Zend Framework Team | http://framework.zend.com Zend Technologies Ltd. http://www.zend.com -- List: [hidden email] Info: http://framework.zend.com/archives Unsubscribe: [hidden email] |
|
In reply to this post by Pádraic Brady
On Wed, Feb 8, 2012 at 10:32 PM, Pádraic Brady <[hidden email]>wrote:
> > Anyway, food for thought. Might be something worth looking into when > we're less busy on ZF2. I'm hacking away at it and will let you know > when I have something actually functional to look at. > I hope I don't burst your bubble, but how do you introduce async (parallel) processing to php code? Looks great on paper (examples) if it was possible... As far as I know and researched it a while ago, even with 5.4 there is no cross-platform way to run parallel functions (short of posix-only pcntl_fork). If it's not multi-threaded (multitasked) then the whole app execution will still hang (wait) until particular fopen(stream) or curl_exec() finishes .... -- __ /.)\ +48 695 600 936 \(./ [hidden email] |
|
There are some possibilities with socket_set_nonblock()
http://stackoverflow.com/questions/1432477/can-php-asynchronously-use-sockets On Feb 9, 2012, at 12:00 , Artur Bodera wrote: > On Wed, Feb 8, 2012 at 10:32 PM, Pádraic Brady <[hidden email]>wrote: > >> >> Anyway, food for thought. Might be something worth looking into when >> we're less busy on ZF2. I'm hacking away at it and will let you know >> when I have something actually functional to look at. >> > > I hope I don't burst your bubble, but how do you introduce async (parallel) > processing to php code? Looks great on paper (examples) if it was > possible... > As far as I know and researched it a while ago, even with 5.4 there is no > cross-platform way to run parallel functions (short of posix-only > pcntl_fork). > > If it's not multi-threaded (multitasked) then the whole app execution will > still hang (wait) until particular fopen(stream) or curl_exec() finishes > .... > > > > -- > __ > /.)\ +48 695 600 936 > \(./ [hidden email] -- List: [hidden email] Info: http://framework.zend.com/archives Unsubscribe: [hidden email] |
|
In reply to this post by Pádraic Brady
I would strongly suggest looking at the Facebook Futures model
https://github.com/facebook/libphutil and docs explaining how they work and how to use here http://www.phabricator.com/docs/libphutil/article/Using_Futures.html It is a full implementation of async HTTP calls that we are using in production for our tools. Its reduced page load times significantly, for example we reduced page loads with 15 API calls down by 40%. There is great scope for async operations in PHP and one of the things that we are going to look at once the zf2 view/layout rendering system settles is the potential for async view partial loading inside of a layout. For example a layout with a main content area, navigation header and the a side bar with many components could all be loaded asynchronously. I'd be happy to help you out Pádraic if you want to float implementation ideas for Zf2 or want someone to test it. Definitely check out how the Futures library. Sorry for being so succinct but still not used to my phones keyboard. Sent from my Windows Phone From: Artur Bodera Sent: 09/02/2012 11:01 To: Pádraic Brady Cc: Zend Framework Contributors Subject: Re: [zf-contributors] Asynchronous HTTP Client concept On Wed, Feb 8, 2012 at 10:32 PM, Pádraic Brady <[hidden email]>wrote: > > Anyway, food for thought. Might be something worth looking into when > we're less busy on ZF2. I'm hacking away at it and will let you know > when I have something actually functional to look at. > I hope I don't burst your bubble, but how do you introduce async (parallel) processing to php code? Looks great on paper (examples) if it was possible... As far as I know and researched it a while ago, even with 5.4 there is no cross-platform way to run parallel functions (short of posix-only pcntl_fork). If it's not multi-threaded (multitasked) then the whole app execution will still hang (wait) until particular fopen(stream) or curl_exec() finishes .... -- __ /.)\ +48 695 600 936 \(./ [hidden email] -- List: [hidden email] Info: http://framework.zend.com/archives Unsubscribe: [hidden email] |
|
In reply to this post by Artur Bodera
On Thu, Feb 9, 2012 at 11:00 AM, Artur Bodera <[hidden email]> wrote:
> I hope I don't burst your bubble, but how do you introduce async (parallel) > processing to php code? Looks great on paper (examples) if it was > possible... > As far as I know and researched it a while ago, even with 5.4 there is no > cross-platform way to run parallel functions (short of posix-only > pcntl_fork). > > If it's not multi-threaded (multitasked) then the whole app execution will > still hang (wait) until particular fopen(stream) or curl_exec() finishes I'll explain a bit further because it's important to see where the benefit lies. First of, the Request Pool exists solely in the current PHP process - there is no forking or proc_open() calls to be founds. The tasks performed on a Response are indeed blocking. However, the asynchronous bit is found is how the HTTP requests are handled. To give an inflated example. Let's assume that all HTTP responses take 1 second to be received from a Host. 10 such requests using a standard synchronous client would take 10 seconds to complete. 10 similar requests performed asynchronously would 1 second (a saving of 9 seconds). That's a saving of 9 seconds in the current PHP process. Within the remaining 1 second of time - some responses will return very quickly and some very slowly. We can take advantage of the faster responses by processing them on the spot while the other responses are yet to be received. That, by itself, is simply making more efficient use of the time spent waiting for responses - but it gets a lot better when any one response triggers at least one more request (we just shunt it back into the asynchronous pool and keep piling up on the original 9 second saving). So the tasks are not asynchronous - they are just attached to requests so the network latency doesn't itself hold up executing work. Even if the work involved exceeds the slowest response time - we've still saved the original 9 seconds anyway ;). So in a sense, the task attachment is a bet with no downside. If we lose, we still saved 9 seconds - if we win, we might save a lot more ;). So what is asynchronous are the HTTP requests themselves. This doesn't need new processes - you just need asynchronous connections. Luckily, PHP allows for those using either curl_multi() or using PHP streams. The streams approach is what I used (default adapter in Zend\Http\Client afterall). You can create async connections using stream_socket_connect() for each request using the flags "STREAM_CLIENT_ASYNC_CONNECT|STREAM_CLIENT_CONNECT". To make sure they are non-blocking when reading (i.e. so we don't have fread() waiting for readable responses when there are none), you can use stream_set_blocking($socket, 0). Once you have all the requests, you can open them by passing them as an array to stream_select(). After that, you just need to loop across all the sockets (the Pool), write the request where needed, and check then them for readable content. Inside that loop, units of work can be executed on responses (i.e. they can be performed (blocking, of course) but then the loop can continue checking active sockets for content). Response actions can, of course, add new requests to the Pool to be looped over. -- Pádraic Brady http://blog.astrumfutura.com http://www.survivethedeepend.com Zend Framework Community Review Team -- List: [hidden email] Info: http://framework.zend.com/archives Unsubscribe: [hidden email] |
|
I think this is definitely doable and should be done - for once, there should be an interface defined on HTTP transport adapters (e.g. the curl and if we ever have one the pecl_http one) designating support for parallel request support. In addition, the default socket adapter could probably be made to support this though async IO which is possible in PHP, even if not trivial.
Basically to simplify the API what I thought about is instead of sending a single request and receiving a single response, you send a request pool and you get a response list for all requests in the pool once all of them were complete. If we want to make things more complex, we can also set up an event system to handle each finished request as it becomes available. I am now in the process of rewriting Http\Client, and this is a part of my plan, but unfortunately I have little time and things do not go as fast as I want. I do not want to promise that I will do it or that it can be done, but I can promise I'm thinking about it (and have thought for some time) :) Shahar. On Thu, Feb 9, 2012 at 2:43 PM, Pádraic Brady <[hidden email]> wrote:
|
|
In reply to this post by Enrico Zimuel-2
>your idea is very interesting! Some times ago I was investigating how to
>execute multiple PHP code in parallel and I found some experimental >solutions (as you know PHP is not multi-threading). You and me both ;). Most solutions are better suited for personal tools, e.g. for parallelising stuff like PHPUnit rather than speeding up non-persistant processes like PHP. >Regarding the multiple HTTP requests I think we can investigate the >possibility to implement something like the code that you proposed using >a special "parallel adapter" for the Zend\Http\Client (i'm thinking to >the cURL extension with curl_multi_* feature). But I think this way must >be well experimented before to propose something in ZF2 or ZF3. Anyway, >i think that's a good idea for the future and we should start to >investigate. Certainly, as my second email said I'm not expecting this to be in master any time soon. Just kicking around the concept as I get something down in code for future consideration. I should have something concrete over the weekend built on top of the Zend\Http\Client source. >I would strongly suggest looking at the Facebook Futures model >https://github.com/facebook/libphutil and docs explaining how they work >and how to use here >http://www.phabricator.com/docs/libphutil/article/Using_Futures.html > >It is a full implementation of async HTTP calls that we are using in >production for our tools. Its reduced page load times significantly, >for example we reduced page loads with 15 API calls down by 40%. Thanks for the link, Alex! I dug around the code and it's almost the same idea right down to the socket functions used ;). Good to know I'm not on the crazy track here. So my take should perform in the same ballpark though I have no idea how the API differences fit in there. The reason I decided on callbacks was to avoid having the user loop across responses checking their status and then having to deal with the logic needed to loop across additional requests while the originals might still have outstanding responses. On Thu, Feb 9, 2012 at 10:13 AM, Enrico Zimuel <[hidden email]> wrote: > Hi Pádraic, > > your idea is very interesting! Some times ago I was investigating how to > execute multiple PHP code in parallel and I found some experimental > solutions (as you know PHP is not multi-threading). > > Regarding the multiple HTTP requests I think we can investigate the > possibility to implement something like the code that you proposed using > a special "parallel adapter" for the Zend\Http\Client (i'm thinking to > the cURL extension with curl_multi_* feature). But I think this way must > be well experimented before to propose something in ZF2 or ZF3. Anyway, > i think that's a good idea for the future and we should start to > investigate. > > Regards, > Enrico Zimuel > > On Wed, 2012-02-08 at 21:32 +0000, Pádraic Brady wrote: >> Hi all, >> >> I've been doing a bit of work recently with Composer and find myself, >> once again, faced with a need to reorganise the use of sequential HTTP >> requests into something more asynchronous. Since this has become a >> reoccuring theme in my life, I'd like to float the idea of (at some >> point - just floating the concept for now since I value opinions) >> reworking Zend\Http\Client into something capable of working with >> parallel HTTP requests. I've been scoping this out based on some >> existing code I have mixed with some of Zend\Http\Client's classes. >> Obviously, despite the similar APIs of the underlying classes, it >> would still be a severe departure from how things normally look with a >> synchronous client since it requires a pretty significant shift in how >> responses are handled. For those who are familiar with it, it looks a >> lot like http.request from node.js. >> >> To address the "why", synchronous requests are simply slow. For a >> small count of requests, it's not a big deal but once you need to >> handle more than a few requests, the time wasted while you wait for >> responses starts to pile up in a linear fashion. Worse, those >> inefficiencies are then built into everything else which relies on the >> synchronous client. REST APIs, multiple GETs, and API navigation fall >> into relying on looped logic to extract data which may spawn even more >> blocking requests. In an asynchronous client, the only loop that >> exists should be the the one which is looping across active >> connections in search of a completed response to take action on >> (usually via a relevant callback). Since these callbacks are executed >> as soon as a completed response is located - they can be performed >> while waiting for other slower requests to complete. >> >> Bearing in mind this example has expanded closures (you could just as >> easily use a closure variable or a class/method array to remove the >> duplication), here's what it might look like applied to my current >> problem for Composer: >> >> $requestPool = new Pool; >> >> $request1 = new Request('http://pear.php.net/rest/c/categories.xml'); >> $request1->on('complete', function($response, $pool) { >> $packages = array(); >> $content = $response->getContent(); >> // Grab package list from categories as $packages and request them too >> foreach ($packages as $package) { >> $request = new Request($packages->uri); >> $request->on('complete', function($response, $pool) { >> // do something with the package data ... >> }); >> $pool->attach($request); // attach all the new requests to the pool >> } >> }); >> >> $request2 = new Request('http://pear.phpunit.de/rest/c/categories.xml'); >> $request2->on('complete', function($response, $pool) { >> $packages = array(); >> $content = $response->getContent(); >> // Grab package list from categories as $packages and request them too >> foreach ($packages as $package) { >> $request = new Request($packages->uri); >> $request->on('complete', function($response, $pool) { >> // do something with the package data ... >> }); >> $pool->attach($request); // attach all the new requests to the pool >> } >> }); >> >> $requestPool->attach(array($request1, $request2))->execute(); >> >> If you can follow that you can take away two things: >> >> 1. The Request Pool persists across the entire operation ready to pick >> up new requests and execute them asynchronously. >> 2. Basic response handling is attached directly to a pooled request >> via a callback (closure, closure ref or class method call). >> 3. The existence of a Request Pool might make it very easy to >> integrate caching, unified header handling, and static responses (i.e. >> deal with duplicate requests arising from your business logic). >> 4. It looks freaky (I know!). >> >> The real annoyance I'm solving here is that asynchronous support is >> pretty scarce in PHP libraries and when it is introduced it's either >> done half-cocked as a simple batch-processor - which still blocks >> while waiting for a slow response and still needs special handling and >> other code by the end user. Usually, such support is bare bones - >> barely encapsulating cURL's functions. Also, the solution I'm working >> on actually uses PHP streams instead of cURL - making it possible to >> share a socket pool with synchronous clients to facilitate keep-alives >> and such. >> >> The suggested API is probably a bit too rough for PHP users since >> you'd end up attaching callbacks for errors, pre-request, chunk >> receipts, etc. so the Pool could probably accept default callbacks to >> narrow that down. >> >> Anyway, food for thought. Might be something worth looking into when >> we're less busy on ZF2. I'm hacking away at it and will let you know >> when I have something actually functional to look at. >> >> -- >> Pádraic Brady >> >> http://blog.astrumfutura.com >> http://www.survivethedeepend.com >> Zend Framework Community Review Team >> > > -- > Enrico Zimuel > Senior PHP Engineer | [hidden email] > Zend Framework Team | http://framework.zend.com > Zend Technologies Ltd. > http://www.zend.com > -- Pádraic Brady http://blog.astrumfutura.com http://www.survivethedeepend.com Zend Framework Community Review Team |
|
In reply to this post by Shahar Evron-2
On Thu, Feb 9, 2012 at 4:17 PM, Shahar Evron <[hidden email]> wrote:
> I think this is definitely doable and should be done - for once, there > should be an interface defined on HTTP transport adapters (e.g. the curl and > if we ever have one the pecl_http one) designating support for parallel > request support. In addition, the default socket adapter could probably be > made to support this though async IO which is possible in PHP, even if not > trivial. It's actually pretty trivial on the PHP end - the problem is wrapping it in such a way that we can loop across the open sockets for writes/reads and ensure the loop will end once the pool of open requests is exhausted (we'd still need to preserve those sockets with keep-alives for reuse later). It's the wrapping that is sorely lacking to make this more accessible (Alex pointed out one of the very few examples of good wrapping out there in a Facebook lib earlier in this topic). > Basically to simplify the API what I thought about is instead of sending a > single request and receiving a single response, you send a request pool and > you get a response list for all requests in the pool once all of them were > complete. If we want to make things more complex, we can also set up an > event system to handle each finished request as it becomes available. This is fine too but you end up losing efficiency. For example, let's say we have a REST API with 5 resources, each of which leads to another 3 resources (the actual data we need). We end up needing two discrete pools of 5 then 15. Since we have to wait for each pool to close (i.e. each pool is like a blocking unit) - a slow response in the first 5 could hold up running the those of the 15 we know from the faster responses (some of which could have completed by now if injected into the original async pool). That's why a single asynchronous pool should, in theory, be faster then sets of discrete pools. > I am now in the process of rewriting Http\Client, and this is a part of my > plan, but unfortunately I have little time and things do not go as fast as I > want. I do not want to promise that I will do it or that it can be done, but > I can promise I'm thinking about it (and have thought for some time) :) > > Shahar. If it helps, I am writing code to do all this - it relies on bits of Zend\Http\Client so when the time comes and you want to look into it there'll be something concrete to examine and poke at :P. I would suggest that, given Zend\Http\Client's focus we may need a separate Zend\Http\Pool class with the asynchronous friendly API. Socket resources or whatever can be held by some common shared static store as needed so we don't disobey keep-alive connection headers set on either one. Paddy |
|
In reply to this post by Pádraic Brady
On Thu, Feb 9, 2012 at 1:43 PM, Pádraic Brady <[hidden email]> wrote:
Ah, ok. So async socks :-) That facebook toy also works that way (sprinkled with sneaky proc_open() here and there) You're right. That might speed up things. Even with 10 requests (out of 100) finishing the same time, they will not be processed in parallel... but nevertheless, they can linearly parse those results while the rest 90 requests wait for network (or remote API).
Also --- I wonder if you've stress-tested it. Is it thread-safe ? Of course there are no threads, but what will happen if 2 request really end EXACTLY at the same time? (or you put a counter in a loop and sleep). Will it maintain cross-listener consistency? This is very important, because if one wants to use it i.e. to call remote API, you _MUST_ have a consistent result (i.e. a collection of retrieved resources). If it's not consistent, then one request could overwrite the result array of another one (or worse).
global $result; do{ $result[] = $x++; usleep(10000);} while($x<100000);
....in all listeners and then try to run many parallel connections. |
|
Interesting stuff. I was planning on doing a bit of refactoring here as discussed w/ Matthew a while back; however, it seems you guys are planning something quite a bit more ambitious than what I was willing to take on.
I will happily cheer you (Paddy, Shahar, Enrico) on as you move forward with this :)
--
Wil Moore III Best Practices for Working with Open-Source Developers http://www.faqs.org/docs/artu/ch19s02.html Why is Bottom-posting better than Top-posting: http://www.caliburn.nl/topposting.html DO NOT TOP-POST and DO trim your replies: http://linux.sgms-centre.com/misc/netiquette.php#toppost |
|
In reply to this post by Pádraic Brady
Hi all,
today I did some tests using it similar on reading local files (incl. locking): https://gist.github.com/1783820 I used 3 files: 1. 630927 bytes on ext3 2. 11574 bytes on ext3 3. 2285 bytes from a shared folder ntfs folder (vbox) The async way is ~36% faster: php test.php sync: 0.17657089233398 (64478600) async: 0.12063097953796 (64478600) Greetings Marc On 08.02.2012 22:32, Pádraic Brady wrote: > Hi all, > > I've been doing a bit of work recently with Composer and find myself, > once again, faced with a need to reorganise the use of sequential HTTP > requests into something more asynchronous. Since this has become a > reoccuring theme in my life, I'd like to float the idea of (at some > point - just floating the concept for now since I value opinions) > reworking Zend\Http\Client into something capable of working with > parallel HTTP requests. I've been scoping this out based on some > existing code I have mixed with some of Zend\Http\Client's classes. > Obviously, despite the similar APIs of the underlying classes, it > would still be a severe departure from how things normally look with a > synchronous client since it requires a pretty significant shift in how > responses are handled. For those who are familiar with it, it looks a > lot like http.request from node.js. > > To address the "why", synchronous requests are simply slow. For a > small count of requests, it's not a big deal but once you need to > handle more than a few requests, the time wasted while you wait for > responses starts to pile up in a linear fashion. Worse, those > inefficiencies are then built into everything else which relies on the > synchronous client. REST APIs, multiple GETs, and API navigation fall > into relying on looped logic to extract data which may spawn even more > blocking requests. In an asynchronous client, the only loop that > exists should be the the one which is looping across active > connections in search of a completed response to take action on > (usually via a relevant callback). Since these callbacks are executed > as soon as a completed response is located - they can be performed > while waiting for other slower requests to complete. > > Bearing in mind this example has expanded closures (you could just as > easily use a closure variable or a class/method array to remove the > duplication), here's what it might look like applied to my current > problem for Composer: > > $requestPool = new Pool; > > $request1 = new Request('http://pear.php.net/rest/c/categories.xml'); > $request1->on('complete', function($response, $pool) { > $packages = array(); > $content = $response->getContent(); > // Grab package list from categories as $packages and request them too > foreach ($packages as $package) { > $request = new Request($packages->uri); > $request->on('complete', function($response, $pool) { > // do something with the package data ... > }); > $pool->attach($request); // attach all the new requests to the pool > } > }); > > $request2 = new Request('http://pear.phpunit.de/rest/c/categories.xml'); > $request2->on('complete', function($response, $pool) { > $packages = array(); > $content = $response->getContent(); > // Grab package list from categories as $packages and request them too > foreach ($packages as $package) { > $request = new Request($packages->uri); > $request->on('complete', function($response, $pool) { > // do something with the package data ... > }); > $pool->attach($request); // attach all the new requests to the pool > } > }); > > $requestPool->attach(array($request1, $request2))->execute(); > > If you can follow that you can take away two things: > > 1. The Request Pool persists across the entire operation ready to pick > up new requests and execute them asynchronously. > 2. Basic response handling is attached directly to a pooled request > via a callback (closure, closure ref or class method call). > 3. The existence of a Request Pool might make it very easy to > integrate caching, unified header handling, and static responses (i.e. > deal with duplicate requests arising from your business logic). > 4. It looks freaky (I know!). > > The real annoyance I'm solving here is that asynchronous support is > pretty scarce in PHP libraries and when it is introduced it's either > done half-cocked as a simple batch-processor - which still blocks > while waiting for a slow response and still needs special handling and > other code by the end user. Usually, such support is bare bones - > barely encapsulating cURL's functions. Also, the solution I'm working > on actually uses PHP streams instead of cURL - making it possible to > share a socket pool with synchronous clients to facilitate keep-alives > and such. > > The suggested API is probably a bit too rough for PHP users since > you'd end up attaching callbacks for errors, pre-request, chunk > receipts, etc. so the Pool could probably accept default callbacks to > narrow that down. > > Anyway, food for thought. Might be something worth looking into when > we're less busy on ZF2. I'm hacking away at it and will let you know > when I have something actually functional to look at. > |
|
Administrator
|
In reply to this post by Wil Moore III
-- Wil Moore III <[hidden email]> wrote
(on Thursday, 09 February 2012, 03:26 PM -0700): > Interesting stuff. I was planning on doing a bit of refactoring here as > discussed w/ Matthew a while back; however, it seems you guys are planning > something quite a bit more ambitious than what I was willing to take on. > > I will happily cheer you (Paddy, Shahar, Enrico) on as you move forward with > this :) Or, perhaps, collaborate... ;-) -- Matthew Weier O'Phinney Project Lead | [hidden email] Zend Framework | http://framework.zend.com/ PGP key: http://framework.zend.com/zf-matthew-pgp-key.asc |
|
In reply to this post by Artur Bodera
In terms of tracking responses and conflicts, this is all perfectly
safe. Each socket opened is a separate resource (as in is_resource() true) and they can be tracked and associated with the correct request/response objects to prevent any accidental interplay. Since they are sockets, those responses only enter PHP's scope when we choose to use fread() to grab a chunk of the response and assign it to a variable or via a response object's setter. There's nothing automatically assigned somewhere that might introduce timing conflicts or race conditions. I wouldn't worry about proc_open() either - it's not needed for socket handling though Facebook's lib probably uses it to fake thread behaviour as another speedup - personally, I wouldn't do that unless it was a very expensive piece of processing I just had to have in the current processes scope somehow, otherwise you're better off using Gearman or something. I'm pretty sure the approach does have a gotcha somewhere - setting a timeout on sockets was broken when used on the command line - I assume it still is. There's a way around that however, by tracking request start times and checking for timeout limits after socket fread() calls. On Thu, Feb 9, 2012 at 10:16 PM, Artur Bodera <[hidden email]> wrote: > Ah, ok. So async socks :-) > That facebook toy also works that way (sprinkled with sneaky > proc_open() here and there) > > You're right. That might speed up things. Even with 10 requests (out of 100) > finishing the same time, they will not be processed in parallel... but > nevertheless, they can linearly parse those results while the rest 90 > requests wait for network (or remote API). > > Also --- I wonder if you've stress-tested it. Is it thread-safe ? Of course > there are no threads, but what will happen if 2 request really end EXACTLY > at the same time? (or you put a counter in a loop and sleep). Will it > maintain cross-listener consistency? This is very important, because if one > wants to use it i.e. to call remote API, you _MUST_ have a consistent result > (i.e. a collection of retrieved resources). If it's not consistent, then one > request could overwrite the result array of another one (or worse). > > Try something like: > global $result; > do{ $result[] = $x++; usleep(10000);} > while($x<100000); > > ....in all listeners and then try to run many parallel connections. > > > -- > __ > /.)\ +48 695 600 936 > \(./ [hidden email] > > > > -- Pádraic Brady http://blog.astrumfutura.com http://www.survivethedeepend.com Zend Framework Community Review Team |
|
In reply to this post by Wil Moore III
By all means, keep your own ideas moving ;). A request pool will still
rely on most of the available classes in Zend\Http - the main changes would be having new adapters to handle pooled sockets/curl_multi and whatever pool-friendly API might get layered on top. Also, as Enrico and I both noted - this is just a concept at the moment and there's no guarantee it could (or even should) be ZF 2.0 ready. It would be even farther down the line before we would see other ZF components using it internally in some way (e.g. Zend\Feed\Reader and Service components). On Thu, Feb 9, 2012 at 10:26 PM, Wil Moore III <[hidden email]> wrote: > Interesting stuff. I was planning on doing a bit of refactoring here as > discussed w/ Matthew a while back; however, it seems you guys are planning > something quite a bit more ambitious than what I was willing to take on. > > I will happily cheer you (Paddy, Shahar, Enrico) on as you move forward with > this :) > > -- Pádraic Brady http://blog.astrumfutura.com http://www.survivethedeepend.com Zend Framework Community Review Team |
|
For those interesting, a quick prototype now exists at
https://github.com/padraic/hasty - check the examples directory for an idea of the API. Comments welcome. Paddy On Thu, Feb 9, 2012 at 10:58 PM, Pádraic Brady <[hidden email]> wrote: > By all means, keep your own ideas moving ;). A request pool will still > rely on most of the available classes in Zend\Http - the main changes > would be having new adapters to handle pooled sockets/curl_multi and > whatever pool-friendly API might get layered on top. Also, as Enrico > and I both noted - this is just a concept at the moment and there's no > guarantee it could (or even should) be ZF 2.0 ready. It would be even > farther down the line before we would see other ZF components using it > internally in some way (e.g. Zend\Feed\Reader and Service components). > > On Thu, Feb 9, 2012 at 10:26 PM, Wil Moore III <[hidden email]> wrote: >> Interesting stuff. I was planning on doing a bit of refactoring here as >> discussed w/ Matthew a while back; however, it seems you guys are planning >> something quite a bit more ambitious than what I was willing to take on. >> >> I will happily cheer you (Paddy, Shahar, Enrico) on as you move forward with >> this :) >> >> > > > > -- > Pádraic Brady > > http://blog.astrumfutura.com > http://www.survivethedeepend.com > Zend Framework Community Review Team -- Pádraic Brady http://blog.astrumfutura.com http://www.survivethedeepend.com Zend Framework Community Review Team |
|
Hi Pádraic,
I took a small look into your code and noted that you are using "usleep(30000);" to wait until the next call of "stream_select". I'm not sure 100% but should it not better to use a notification callback to note if bytes are ready to read or if the stream is ready to write ? -> php.net/manual/context.params.php Marc On 13.02.2012 17:27, Pádraic Brady wrote: > For those interesting, a quick prototype now exists at > https://github.com/padraic/hasty - check the examples directory for an > idea of the API. > > Comments welcome. > > Paddy > > On Thu, Feb 9, 2012 at 10:58 PM, Pádraic Brady <[hidden email]> wrote: >> By all means, keep your own ideas moving ;). A request pool will still >> rely on most of the available classes in Zend\Http - the main changes >> would be having new adapters to handle pooled sockets/curl_multi and >> whatever pool-friendly API might get layered on top. Also, as Enrico >> and I both noted - this is just a concept at the moment and there's no >> guarantee it could (or even should) be ZF 2.0 ready. It would be even >> farther down the line before we would see other ZF components using it >> internally in some way (e.g. Zend\Feed\Reader and Service components). >> >> On Thu, Feb 9, 2012 at 10:26 PM, Wil Moore III <[hidden email]> wrote: >>> Interesting stuff. I was planning on doing a bit of refactoring here as >>> discussed w/ Matthew a while back; however, it seems you guys are planning >>> something quite a bit more ambitious than what I was willing to take on. >>> >>> I will happily cheer you (Paddy, Shahar, Enrico) on as you move forward with >>> this :) >>> >>> >> >> >> -- >> Pádraic Brady >> >> http://blog.astrumfutura.com >> http://www.survivethedeepend.com >> Zend Framework Community Review Team > > |
|
Quite possibly - for now though, it's enough to get a prototype out
without killing everyone's CPU with a constant running loop ;). I'll continue making changes over the next week or so so I'll see if the notifications will help with this. Paddy On Mon, Feb 13, 2012 at 9:56 PM, Marc Bennewitz <[hidden email]> wrote: > Hi Pádraic, > > I took a small look into your code and noted that you are using > "usleep(30000);" to wait until the next call of "stream_select". > > I'm not sure 100% but should it not better to use a notification > callback to note if bytes are ready to read or if the stream is ready to > write ? > -> php.net/manual/context.params.php > > Marc > > On 13.02.2012 17:27, Pádraic Brady wrote: >> For those interesting, a quick prototype now exists at >> https://github.com/padraic/hasty - check the examples directory for an >> idea of the API. >> >> Comments welcome. >> >> Paddy >> >> On Thu, Feb 9, 2012 at 10:58 PM, Pádraic Brady <[hidden email]> wrote: >>> By all means, keep your own ideas moving ;). A request pool will still >>> rely on most of the available classes in Zend\Http - the main changes >>> would be having new adapters to handle pooled sockets/curl_multi and >>> whatever pool-friendly API might get layered on top. Also, as Enrico >>> and I both noted - this is just a concept at the moment and there's no >>> guarantee it could (or even should) be ZF 2.0 ready. It would be even >>> farther down the line before we would see other ZF components using it >>> internally in some way (e.g. Zend\Feed\Reader and Service components). >>> >>> On Thu, Feb 9, 2012 at 10:26 PM, Wil Moore III <[hidden email]> wrote: >>>> Interesting stuff. I was planning on doing a bit of refactoring here as >>>> discussed w/ Matthew a while back; however, it seems you guys are planning >>>> something quite a bit more ambitious than what I was willing to take on. >>>> >>>> I will happily cheer you (Paddy, Shahar, Enrico) on as you move forward with >>>> this :) >>>> >>>> >>> >>> >>> -- >>> Pádraic Brady >>> >>> http://blog.astrumfutura.com >>> http://www.survivethedeepend.com >>> Zend Framework Community Review Team >> >> -- Pádraic Brady http://blog.astrumfutura.com http://www.survivethedeepend.com Zend Framework Community Review Team |
| Powered by Nabble | Edit this page |
