Correlation IDs — The Key to Customer Support | Support Driven Development
Learn how Correlation IDs revolutionize customer support by linking user requests across systems, enabling faster issue resolution. Discover implementation methods and design decisions you can copy
Over the course of this post, I’ll be explaining what a Correlation ID is, what it can be used for and how it can be used to greatly improve the support your customer gets from your support team.
I’m fairly new to this topic myself, having only been working with the concept for about a year. My experience on this is mainly from business operations and introducing Correlation IDs to our product has been extremely useful for both our customers and internal teams including customer support, business operations & software engineering.
The ID essentially links every log associated with a user’s request (typically HTTP) through a system, which could span across multiple components such as in a microservices architecture.
If a platform has correlation ID logging and that ID can be conveniently exposed to the customer in an error scenario, then the customer can provide that value to the support team resulting in speedier issue resolution due to a reduction in the back-and-forth between the support team and the customer.
Note: This is the 4th post in the series around Support Driven Development. If you have the time and wish to start from the beginning, you can find the intro here.
What is a Correlation ID and how can I use it?
This description is taken from a good article I found that explains how they can be added to your logs:
The idea is simple. When a user-facing service receives a request it’ll create a correlation ID, and:
pass it along in the HTTP header to every other service
include it in every log message
The correlation ID can then be used to quickly find all the relevant log messages for this request
After reading my explanation of why you should have them, read the article here to see how it can be done:
If your platform has a microservices architecture, then this is especially relevant to you. Below is a sample log line that would have helped our support team find the cause quicker for the issue that we mentioned in the intro post where the user was trying to reset their password:
2021–03–23T23:14:22+00:00 LandingPageLogger {“correlationID”:”b6efab13-d538–44c0-aa34–67c245b531b7", “ “action”:”PasswordReset”, “userId”:”23423324"}
2021–03–23T23:14:32+00:00 LandingPageLogger {“correlationID”:”167ccfee-eb28–4917–9d54–88f30af673c4", “action”:”PasswordReset”,”userId”:”24323432"}
2021–03–23T23:14:41+00:00 MailDeliveryLogger {“correlationID”:”167ccfee-eb28–4917–9d54–88f30af673c4", “action”:”sendMail”, “status”:”200"}
2021–03–23T23:14:42+00:00 LandingPageLogger {“correlationID”:”102ce5f8–702e-4af2-bd37-e616f29ddf19", “action”:”PasswordReset”,”userId”:”24323422"}
2021–03–23T23:14:43+00:00 MailDeliveryLogger {“correlationID”:”b6efab13-d538–44c0-aa34–67c245b531b7", “status”:”500", “errorMessage”:”invalidCharacter ( ‘ ) in email”}
As you can see above in the bold italic lines, the logs that have the same correlation ID are linked and we can clearly see the reason why the email is not sending.
If you have an application (e.g. Splunk) for indexing log files and your support was able to get the user ID from their (LINK support UI) they should be able to find the reason for their issue.
With this information, the support team may be able to correct a misconfigured account or raise a defect for the engineering team to fix in an upcoming release and suggest a workaround to the customer in the interim.
Design Decisions
As with any new feature, we had to make a lot of design decisions along the way. While not all of these are in place for my team yet, we’ll be implementing them soon to improve the solution. Below is one of the final versions of the design we settled on.
Some of these may be useful for you too:
When choosing the design for our new error notification, we decided to add a ‘copy’ button beside the correlation ID. This was to give the user a good initial support experience to quickly facilitate providing the value to the support team.
We’ve decided not to translate the words “Correlation ID” in the error message — for now at least. We have a lot of different locales on our platform, but the language that most of our support team speak is English. They are likely going to be asking for this ID in English so we want whoever is getting the request for the ID to know exactly what they are looking for.
We created dashboards in our log indexing tool so that our support team can easily use an ID to find an error. They should now be able to help out a lot more before having to get developers involved.
If you have an error message that hides automatically after a certain amount of time, the user may not be able to take a screenshot or copy the message in time. They may even want to ring support while the message is on the screen. Keeping it from being hidden will mean they can copy the ID at their leisure.
We also made sure the ID is visible in the original error notification and not hidden by default. This means that anyone taking a screenshot of an error message to send to support will automatically have the ID in the screenshot — yet again saving time for support.
Well that’s all for today. As always, if you have any suggestions for topics I haven’t covered yet, feel free to leave a comment or drop me a message on any of the social media accounts linked to the page.
Join the page
I’d love you to be a part of a community of people who share the interests of the page. Participate in the comments of the posts, the chat section, or support this work by subscribing!
Share it with a friend!
If you think anyone you know might like this content, please feel free to also share the page
Continue the conversation elsewhere
You may also reach out directly on any of the below if you would like to get in touch.
My own Linkedin | Linkedin for the page | Twitter/X.com | Instagram