Best Practices
How to use messaging
General Recommendations
Keep in mind that what is currently being used may change in the future.
Stick to mainstream features (i.e. the ones available in all major implementations) to avoid vendor lock-in.
Minimize the amount of code you write to use the messaging infrastructure:
- factorize common code
- re-use existing code when if it fits your needs
- isolate technology independent code from the rest, especially the brokers and destinations to use, which should be easily configurable
Do not (only) trust the documentation, test that things indeed work as you expect.
The following things may happen so be prepared for them:
- messages can get lost (so you may have to resend them if you really care)
- messages can arrive out of order (so you may have to reorder them if order is important)
- messages can be delivered multiple times (so you may have to filter duplicates)
Do take into account security requirements:
- by default, do not trust the data you receive
- if needed, use cryptography: encryption (for privacy) and signing (to verify the sender)
Messaging should be used to transfer small pieces of information and not for large data transfers.
Messaging Protocol
Based on the experience gained during Grid European projects (EGEE, EGI...) and because of its technology agnostic nature, the recommended messaging protocol to use is STOMP. It should be good enough for most use cases and it is supported by most brokers.
If you decide to use another protocol, this may bind you to a specific broker technology. For instance, OpenWire is only supported by ActiveMQ.
Although it is not a messaging protocol, the best solution for Java programs is probably JMS since most brokers support it. When changing broker technology, you will not have to change your code (as the API is standard) but you will have to change the library (JMS provider) used.
Message Persistence
JMS defines its `NON_PERSISTENT message delivery as follows:
The NON_PERSISTENT mode is the lowest overhead delivery mode because it does not require that the message be logged to stable storage. A JMS provider failure can cause a NON_PERSISTENT message to be lost.
By default, ActiveMQ STOMP messages are non-persistent and therefore lost when
the broker restarts. You can mark a message as persistent by adding
persistent:true
to its header.
Message Header
The message header should be used to put simple information about the message body (aka meta data). It contains a list of key/value pairs.
Since what is allowed in the header may vary with the protocol and broker used, it is safer to restrict yourself to the minimum that is available everywhere:
- use descriptive keys that do not conflict with well-known ones such as
persistent
ordestination
; ideally: prefix them like inglite-role
- for the keys, use only ASCII letters, digits, dots and dashes (like a host name)
- for the values, use only printable ASCII characters
- avoid using too many keys (a dozen max)
- avoid using too long values (1KB in total for the header is reasonable)
Note: if you plan to use selectors, only use ASCII letters, digits and underscores for the header keys as other characters like dots and dashes may confuse the selector parser.
Message Body
The message body should be used to put what you want to send (aka data).
For simple applications, JSON is recommended: it is human friendly, widely used and well supported in many programming languages. This is very important because messaging is used to link different software components together: even if all your code is in Java today, tomorrow another component in Python will want to read your messages.
If you really need something else than JSON, there is no obvious best format for all use cases. See Wikipedia's Comparison of data-serialization formats for more information.
Message Size
Messages are copied many times (client library, socket buffer, network, memory, disk...) and therefore should be reasonably small.
For the total message size (i.e. header plus body), the 1KB - 10KB range is probably the best and 1MB should be seen as an absolute maximum.
Large messages are incompatible with a high message rate. For instance, a broker that has been measured as being able to handle 700k 20B msg/s could not handle more than 1.1k 256kB msg/s. These were local benchmarks, the network could very easily become a bottleneck.
On the other end of the scale, too small messages are not efficient as 1B of body requires around 100B of header. It is much better to send 1 message of 1KB rather than 1k messages of 1B.
Applications can easily adjust the message size. They could split large data chunks onto several messages or, conversely, merge small data chunks in order to use larger messages.
Note also that, in some cases, compression can be very beneficial. For
instance text data can typically be reduced by 90% with gzip
. This
however has a CPU cost.
Message Rate
Caveat: the only reliable performance numbers are the ones measured in a realistic environment.
Establishing a session to a broker can be very expensive, especially when using X.509 authentication. Try to minimise the number of sessions by grouping messages before sending them. Long lived connections can be problematic too as they consume resources on the broker. All the message rates below are for an existing session.
What matters is the total number of messages that come in and go out of the broker. For a topic with 10 subscribers, each incoming message will be delivered 10 times so the total count would be 11 messages.
For small messages, on WAN and using STOMP, an application should stay below 1k msg/s. On LAN and with a binary protocol such as OpenWire, it should stay below 10k msg/s.
For persistent messages (that are copied to disk to survive a service interruption), these numbers should probably reduced by one order of magnitude.
For big messages (more than 1kB), throughput can easily become a bottleneck. For instance: 1kB messages at 1k msg/s represent 10Mbit/s at network level.
When not to use messaging
Messaging is very versatile but it should not be seen as the magic bullet that can solve all IT problems. In particular, messaging is usually not the best solution for:
- large data transfers: messages should be small, most brokers keep them in RAM
- very high message rates: what can be achieved in reality is one or two orders of magnitude lower than the numbers coming from specific lab setups
- time critical applications: brokers do add latency compared to direct communications
- high security environments: messaging adds extra code to be audited and extra services to be secured; brokers usually provide only basic security features and the rest must be added on top: firewall, message encryption...
In case of doubt, other technologies should be investigated to check whether they work better than messaging for a given problem.
Dejan Bosanac described the main messaging anti-patterns: