Over the last few days, most of us have been getting to grips with this new, albeit temporary, norm of remote working. At SquaredUp, we’ve always been able to work like this as all of the tools we use day-to-day are SaaS products or hosted in Azure, but while we’ve always had the right tools, we’ve never actually had to experience everyone being out of the office at the same time.
Following the announcement by the UK government on March 15th recommending that everyone should work from home to help #flattenthecurve of COVID-19, we had a couple of days of upheaval while people took home their Surface Docks, monitors and keyboards, and even their office chairs in some cases, but we’re pretty much business-as-usual now. However, there are some challenges we’re facing alongside a fairly significant cultural shift now that we’re not saying hello to each other face-to-face each day.
The first thing that I missed was the atmosphere. Sure, I’ve got Spotify and ThrowBackThurdays is keeping me happy, but what I’m really missing is being surrounded by conversations which give me an idea of what everyone is working on and give me an opportunity to contribute when people are stuck on something. Not to let that defeat us though, our CEO Richard set up a virtual kitchen on Microsoft Teams as a place for people to connect while they’re in their own real kitchens making a cuppa, and it’s proving very popular.
Alongside the interpersonal and social challenges come a number of technical challenges as well. Many organisations that use SquaredUp with SCOM or Azure are monitoring technology stacks like Citrix, VMware Horizon and Azure Virtual Desktop and in most cases, these systems are in place to serve a percentage of the workforce who are already remote i.e. field sales, support, or branch offices. But, when you go from 10% of your workforce using these systems to 100% overnight, capacity and performance become a real cause for concern as all of a sudden, these systems are a prerequisite for your entire organisation remaining productive. And even when these key systems are SaaS like most of ours, there’s still a potential problem, it’s just further down the chain. More visibility is needed, and giving that visibility to everyone is the key.
On my first day working from home, Microsoft Teams was having issues and a number of our Azure VM’s which power off overnight didn’t come back on citing a lack of available capacity (in Azure West Europe… what?). Now this isn’t something I’ve really concerned myself with in the past, but I had a sudden desire to start monitoring all the 3rd party services we rely on in an effort to spot any problems quickly and keep the team informed.
The next statement shouldn’t come as a shock given where I work, but I always start with the dashboard and work back from there. Ultimately, a dashboard and the monitoring behind it should be built for its target audience so once I figured out what I wanted to show and to who, I then turned my attention to the how.
Luckily, I’m spoilt for choice when it comes to the buffet of monitoring tools in the SquaredUp lab but I decided to keep it simple and visual and rely on SCOM to do the heavy lifting (with a little help from Azure). All of the following examples were built with SquaredUp for SCOM, and I also pulled in some data from Azure Monitor Logs and ServiceNow.
This was my first attempt at a simple all-hands health dashboard that would be understandable by my whole audience. Green equals good 😊
A view like this is typically good for the whole company as it’s easy to grasp what it means, and it answers the question “is it working?” in a pretty straightforward way. This dashboard is powered by SCOM web tests that were set up using SquaredUp’s Enterprise Application Monitoring feature and the tests are being run from servers in Europe and the US. The colour is determined by whether the test succeeds within the time limit I configured, and if an app doesn’t respond, or responds too slowly… green becomes red. Simple.
The previous approach doesn’t suit everyone though and as a perfect example, a post on Slack appears moments later saying “hey, is Teams really slow for anyone else?” At this point it’s time to add in a bit of basic performance data for those who want it. I’ve opted to show the response time that my web tests are recording as they quite accurately represent the end-user experience as the tests themselves are being run from geographically similar locations to my users.
Instantly, this new dashboard highlights that a couple of the services we use are hitting some occasional performance issues. Given that most of these services are hosted by 3rd parties I can’t really do much with this information in terms of fixing something, but at the least, it lets my remote users know that an issue isn’t just with them / their laptop / their home internet connection etc.
The next evolution of this dashboard was to help fix repeat requests into the service desk. It’s great that everyone can see the health of the apps they use and whether they’re performing well, but if something is wrong people still love to open a case with the service desk. And chances are, issues with applications like these aren’t going to be limited to a single person so exposing recent service desk cases could come in handy.
For those of you who use ServiceNow, this next part is really easy as we already have an integration that can show numbers, tables and donut charts for incidents and change requests. For any other ITSM platform with a REST API, it’s still a simple task to enhance your dashboard with external data.
Now that my dashboard is showing active cases related to our key apps, it’s easy to see that someone already reported an issue with Teams so I won’t bombard the service desk with another ticket. If my issue wasn’t there already though, I’ve made the process of logging a new case really easy by putting a “Log a case” button at the top of the dashboard that jumps me right over to ServiceNow (or any other web address).
The final step I want to take with my end-user dashboard is to publish it somewhere that everyone can see it. SquaredUp makes this nice and easy with a feature called Open Access, which put simply is an option to share a read-only view of your dashboard with whoever you want. In this case, I’m going to embed it into a SharePoint page that’s visible in Teams. Once shared in embedded format, I get something that looks like this, and Open Access will refresh it every 60 seconds to keep it current.
I’m pretty happy with my dashboard now and for the majority of our workforce it gives them some really useful information when they’re experiencing an issue with a key application. One team that almost certainly wants to see more than this though is the IT team who have other areas they need to keep a watchful eye on.
I’ve started with most of the content from my end-user dashboard but I’ve also layered in some data from Azure Monitor Logs’ NPM solution which I’m using to perform site-to-site tests for our core network, and I’m also ingesting some basic logs from our OpenVPN server to show stats like latency and active user count. Core infrastructure metrics and alerts are also important at this level, so I’ve picked some of the common things like alerts and lowest disk space.
As a one-stop-shop for the key health and performance data that matters most to keeping my remote workers happy, I think this dashboard does it for me.
To support business continuity during the COVID-19 crisis, we have released a new free edition of SquaredUp dashboards for SCOM. It is completely free for 6 months from your sign-up date, and sign-ups will be available until 1 May 2020.