This seems to be a typical scalability issue.
Unlikely a problem with the framework: you are just reaching the limit of active users for the deployment configuration you have: you should scale it up.
Scaling is not linear: the more users, the more memory/cpu/io is going on, and if you reach the cap for any of those on any of the machines, it chokes and gets to a grinding halt (typically dropping connections or being very very slow).