Debugging Gateway Errors
Published on by Chris Fidao
You'll sometimes hit Gateways errors, usually
502 Bad Gateway or
504 Gateway Timeout.
These are errors Nginx returns when it sends a request to PHP but PHP is returning some error saying it can't process the request. Typically these are NOT errors occuring in your application, but instead are (usually) errors hit before the application even processes a request.
What is Gateway
A Gateway is a thing sitting between the web server (usually Nginx) and your application. For most of us, this is PHP-FPM. Nginx will use the fastcgi protocol to convert a web request into something PHP-FPM can understand. PHP-FPM then runs your application, setting up PHP with the information it needs (setting superglobals
If PHP-FPM returns and error, Nginx gives us a Gateway error.
Bad Gateway is returned when PHP-FPM returns an error. This usually is one of:
- PHP-FPM is not running (perhaps due to too many error)
- PHP-FPM has reached it's
max_childrenlimit and cannot process any more requests
- Some sort of PHP error such as a segfault
Gateway timeout errors typically occur when your app is handling too much traffic. This can correspond to PHP-FPM
max_children errors (too many requests that it's configured to handle) but mostly occurs when your database is overloaded and it can't handle additional connections making queries. It takes too much time to return queries.
This can also occur if any network connection your application makes is not returning responses in time, but the database is the most common bottleneck.
Debugging Gateway Errors
tl;dr for debugging gateway errors is the logs. I start from the top of the network request stack, and move down. This means the order of logs I check are:
- Server resource usage
- Application logs
The Nginx logs typically have the least useful data, altho it might clue you into issues of PHP-FPM not running (if it can't find PHP-FPM's socket file such as
/var/run/php-fpm.sock for example).
The FPM logs are usually the most beneficial, since PHP-FPM is the gateway returning an error! Often you'll see an error about hitting (or being close to) the
max_children limit. Less often you might see a setfault error (if you see that, you probably have some recursion in your code somewhere).
Server resource usage is what I'd check next. You can use
htop or similar to check CPU/RAM usage, and what processes are using them. You should also check disk usage via
df -h to check if the disk is out of space.
You may also run out of inodes! Inodes are "index nodes", things used to track file usage on linux systems. Since everything is a file (including how Linux handles open network connections!) running out of inodes can be an issue. You can run
df -i to see inode usage per disk drive.
Finally I check application logs. These MIGHT show errors related to timeouts of or database errors, but sometimes the issue is not specific to the application code base. How useful these logs are will vary for Gateway errors.