
Sameer Mhaisekar
DevRel Engineer, SquaredUp & Microsoft MVP


DevRel Engineer, SquaredUp & Microsoft MVP
I recently spoke with Vipul Gupta (Senior Software Engineer at Balena) as part of my Observability in the Wild series. Our discussion reinforced a simple truth: observability is not just about collecting logs, metrics, and traces. It’s about clarity, trust, and collaboration in complex systems.

At Balena, Vipul’s team manages hundreds of IoT devices, each with its own embedded Linux OS. That creates a unique challenge: every code change needs to be tested on real hardware using “hardware in the loop” pipelines.
CI/CD at this scale is fragile. Pipelines fail constantly due to dependency changes, hardware flakiness, or transient network issues. GitHub’s native insights weren’t enough to answer critical questions like:
To close those gaps, Vipul’s team built their own observability stack on top of GitHub Actions, using OpenTelemetry, Prometheus, Grafana, and Sentry. This gave them visibility across hundreds of workflows running against fleets of devices.
Vipul described a shift that resonated with me: most teams discover observability reactively — when something breaks. But real maturity comes when observability helps teams prevent problems instead of chasing them down.
That means:
As he put it: “Without actual work going into observability, it’s just metrics, just shiny dashboards. Teams need to come together around the data to actually execute on solutions.”
Dashboards were a major focus in our conversation. At Balena, the philosophy is one dashboard, one metric. Simple, reliable, and easy to interpret in seconds.
For example: tracking retry counts in GitHub Actions. A single retry doesn’t show up as a “failure,” but over time, retries reveal patterns:
Visualizations like these transform observability from raw data into shared understanding. Teams can trust what they see, act on it, and align around the same view of reality.
[Shameless plug: at SquaredUp, this is exactly what we do. We call it “Operationally intelligent dashboards“ - click here to find out more!]
One recurring theme was the difficulty of establishing a clear source of truth.
For them, GitHub often remains the ultimate ground truth for failures — even as teams pipe data into Sentry and Grafana for visualization. The bigger goal is standardizing error reporting into OpenTelemetry so observability platforms themselves can be the authoritative source.
That’s still evolving, but it’s where the industry is heading.
AI came up as a natural extension. Observability generates massive, noisy datasets, and humans can only process so much. AI can help by:
Imagine an AI assistant that not only reports a failed workflow but also highlights:
That’s the kind of context that turns observability into proactive engineering.
Vipul shared two lessons that stood out:
As he put it: “Being perfect is the enemy of good. The immediate value of getting basic observability up and running for the whole team is far more important than chasing 100%.”
Vipul's experience highlights something we hear often: once you have plenty of telemetry, the real challenge becomes making it consumable. Teams need a way to share the right view with the right people — from engineers watching CI/CD pipelines to product leaders looking for trends.
That's where dashboards shine. With SquaredUp, you can simply bypass the issue of "trust in the data", since we bring data straight from GitHub, Azure DevOps, Jenkins, GitLab and more into clean, lightweight dashboards that surface what matters most. No digging, no noise — just the key signals, rolled up and easy to share across teams.
Whether your pipelines are as complex as Balena's or just business-critical in different ways, observability truly becomes actionable with operationally intelligent dashboards.