Host names should not contain underscores (especially when connecting with Apache HttpClient)
We had a problem connecting to a tomcat service using HttpClient 3.1 on my current project. After a lot of hunting I tracked this down to the URI class in Apache HttpClient failing to parse the port number correctly. Several of our servers have been mis-configured with hostnames which have underscores. It seems the underscores cause the URI parsing to fail. (See bug report)
The effect is that although a connection is attempted to the correct host, the port information is lost and the connection defaults to use the http standard port 80. In some situations – e.g. there is a production server process on port 80 and a qa instance on another non-standard port (not that it would be an especially good idea to run both environments on the same server!), this could be dangerous!!
The interesting aspect of this is that in fact domain names with underscores are invalid. However a lot of software applications (e.g. browsers) will accept host names with underscores, in an attempt to be lenient. So really this is not a ‘bug’ as such, although the real issue is that the failure to parse the URI should result in an exception which details the failure. At present a connection may be silently made to the wrong port.
Sometimes this leniency in interpreting specifications leads to a lot of problems. If browsers did not accept underscores in domain names, our mis-configured servers would likely have been renamed a long time ago, and we would not now have this problem.
You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.
Leave a Reply