Updog by Datadog
Updog by Datadog is a public web page that provides a single dashboard for monitoring the near real-time health of major SaaS APIs and AWS services. It covers widely used platforms like OpenAI, GitHub, Slack, Stripe, ServiceNow, Zendesk, and Zoom, as well as AWS services such as Amazon S3, AWS Lambda, and Amazon DynamoDB. Updog turns anonymized telemetry data from thousands of environments into real-time status updates, highlighting performance issues or outages the moment they emerge. Engineers can immediately verify if a problem is part of a broader incident or confined to their systems without waiting on vendor-maintained status pages. Updog also offers historical views, providing up to 90 days of degradation history, for easy identification of recurring reliability issues, such as API disruptions that consistently affect customer checkouts. Teams can use these insights to make informed architectural decisions and improve fault tolerance. Observability has traditionally been bound by the walls of individual systems, with teams focused on what they could measure within their own environments. Datadog is redefining that boundary by collecting and correlating telemetry data across the entire breadth of their products and customer base. With one of the world's largest and most diverse streams of telemetry data, they can apply AI models that identify patterns and risks that no single organization can see on its own. This represents a shift from simply helping customers manage their environments to creating shared intelligence. Updog is an expression of this shift. By analyzing aggregated, anonymized APM telemetry data from thousands of organizations, a Bayesian model that infers abnormal error rates across independent customer environments, and correlation across customers and regions to confirm whether degradations are systemic, Datadog can detect issues faster than vendor-controlled pages. For example, Updog recently surfaced an Amazon DynamoDB degradation 32 minutes before AWS updated its own status page. The result is a reliable, AI-driven signal that reflects the real-world experience of users around the globe. This iteration of Updog is just the first step. Over time, its scope will expand beyond availability to include real-time updates for systemic risks, including GPU availability monitoring, spot interruption monitoring, and cyber attack and vector monitoring. Built on anonymized observability data and AI at internet scale, Updog is a comprehensive public resource for real-time service transparency.
Comments
Please log in to post a comment.