Designing a Scalable Multi-Platform Cost Intelligence Experience

The Team

Project Goals

Provide a unified platform for tenant admins to monitor and manage cloud platform costs across Snowflake, Databricks, and others

Design customizable dashboards that offer visibility into key cost drivers, usage patterns, and cost optimization opportunities

Enable chargeback reporting by tag, org/unit, and platform for both single-platform and multi-platform users

Surface platform and object-level insights, helping users track trends, understand performance shifts, and improve cost efficiency

Improve data visualization clarity and storytelling across all modules for actionable decision-making

The Team

Problem Statement

  • Admins working across multiple platforms (or even a single platform) lacked a centralized, intuitive, and dynamic interface to track cloud-related costs. They were hindered by:

    • Siloed cost visibility (Snowflake vs. Databricks)

    • Inability to see savings or costliest areas at a glance

    • No platform-agnostic cost breakdown for chargeback reporting

    • Static dashboards that didn’t reflect real-time fluctuations

    • Lack of object-level performance insights tied to cost shifts

    This resulted in manual data exports, reactive rather than proactive cost management, and missed optimization opportunities.

The Team

Target Audience

The primary users of the platform were tenant admins, LOB (line-of-business) analysts, and FinOps stakeholders responsible for managing, optimizing, and reporting on cloud platform costs across Snowflake, Databricks, and other internal systems.

These users often operated across multi-platform environments, overseeing:

  • Budget forecasting

  • Chargeback and cost attribution

  • Cost efficiency tracking

  • System performance diagnostics

Secondary audiences included DevOps engineers and platform leads, who needed visibility into workload costs, object-level performance metrics, and opportunities for scaling or remediation.

User Research

User Research

To guide design decisions, we conducted a hybrid research approach combining qualitative and quantitative insights:

  • 15+ user interviews with platform admins and analysts from Snowflake-only, Databricks-only, and multi-platform orgs

  • Contextual inquiry sessions to observe workflows for chargeback creation, trend analysis, and diagnostics

  • Review of support tickets, usage logs, and dashboard analytics to identify friction points

  • Survey data to benchmark confidence in current tools and key metrics users wanted to track

Insights from this research directly shaped the hierarchy of content on the homepage, the need for tag-based breakdowns, and the real-time alerting structure in the Health Hub.

Market Analysis

The cloud cost management and FinOps landscape is rapidly evolving. As organizations scale multi-cloud adoption, managing costs across platforms becomes increasingly complex.

Common pain points in the market include:

  • Lack of centralized visibility across Snowflake, Databricks, and internal compute systems

  • Inefficient cost allocation practices due to weak tagging or lack of reporting tools

  • Difficulty connecting cost anomalies with root causes like failed jobs or scaling events

  • Over-reliance on static exports (CSV) or third-party tools for reporting and optimization

Our product aimed to directly address these gaps with:

  • Unified platform dashboards

  • Built-in optimization suggestions

  • Real-time usage and health indicators

  • Platform-agnostic tagging and filtering

Competitive Analysis

To position our solution effectively, we analyzed key features of similar tools:

PlatformStrengthsGaps Identified
AWS Cost ExplorerStrong native integration, clean UILacks support for Snowflake/Databricks; limited drilldowns
Snowflake ConsoleFamiliar to Snowflake usersNo multi-platform view; weak visualizations
Datadog DashboardsReal-time monitoring, extensibilityHigh complexity; steep learning curve
CloudZero / ApptioAdvanced FinOps tooling, tagging reportsExpensive and overengineered for mid-tier users

Our Advantage:

  • Unified cross-platform view (Databricks + Snowflake)

  • Tag-based chargeback analytics

  • Customizable homepage snapshots (costliest jobs, top savings)

  • Embedded Health Hub for proactive optimization

Features

Data Graph Selection (Platform & Metric Picker)

    • Multi-select component allowed users to choose:

      • One or multiple platforms (e.g., Snowflake, Databricks, Azure)

      • Specific metrics (e.g., Total Cost, AI/ML Jobs, Retry Rate)

    • Visualizations updated dynamically based on user selections

    • Enabled detailed comparison between business units or platforms

Side-Scroll Navigation (Horizontal Expandability)

  • Designed a horizontal nav layout to accommodate numerous views:

    • Cost Trends

    • Cost Breakdown

    • Business Units

    • Health Hub

    • Object Detail Pages

  • Sticky sub-navigation ensured quick access on scroll-heavy pages

Tag-Based Filtering & Chargeback Attribution

  • Users could filter costs by:

    • Tags (e.g., AI/ML, Infrastructure, Security)

    • Org/Unit

    • Account or Workspace

  • Enabled precise chargeback calculation and exportable reports

  • Custom tag groups could be saved and reused across sessions

Health Hub Alert cards

  • Dashboard module displaying:

    • Failed objects

    • Retry spikes

    • High-cost anomalies

  • Cards included status, affected object, time of failure, and quick links to detail views

  • Designed to prevent reactive cost management by flagging early signals

Object Detail Pages with Summary + Trendlines

  • Detailed views for Workspaces, Jobs, and AI/ML objects

  • Showed:

    • Average cost

    • Retry rate

    • Duration and scale activity over time

  • Integrated Change History Panel to correlate setting updates with cost or performance shifts

Optimization Snapshot & Insights

  • Snapshot cards highlighted:

    • Idle cluster costs

    • Query over budget

    • Opportunities to scale down or right-size resources

  • Users could export these insights as part of weekly or monthly reports

Quantitative Research

To gather quantitative data for the “AI Toolbox” project, crafting focused questions that can be quantitatively analyzed is crucial. Below are five quantitative research questions along with hypothetical answers based on a survey of potential users. These responses are accompanied by observations to provide insights into user needs and preferences.

How frequently do you monitor your platform cost analytics?

Answer Options:

  • Daily

  • Weekly

  • Monthly

  • Only during audits

  • Rarely

Hypothetical Responses:

  • Daily: 18%

  • Weekly: 46%

  • Monthly: 25%

  • During audits: 8%

  • Rarely: 3%

📝 Observation:
Most users access dashboards weekly, suggesting that default views should prioritize weekly time increments and make it easy to scan changes over that period. Daily or real-time views should remain secondary for advanced users.

Which type of cost breakdown do you use most often?

Answer Options:

  • By Platform

  • By Tags (e.g., AI/ML, DevOps)

  • By Workspace

  • By Business Unit / Org

  • By Account

  • I don’t use breakdowns

Hypothetical Responses:

  • Platform: 58%

  • Tags: 44%

  • Workspace: 36%

  • Business Unit: 51%

  • Account: 22%

  • No breakdown: 6%

📝 Observation:
Platform and Business Unit filters are essential, but strong interest in tag-level filtering validates our chargeback-focused design. Workspace-level breakdowns are useful for developers and should be included in object-level detail views.

How do you typically investigate a cost spike or anomaly?

Answer Options:

  • Use trend graphs

  • Filter by platform/org/unit

  • Drill into object detail pages

  • Export CSV and analyze externally

  • Ask DevOps for help

  • Ignore unless flagged

Hypothetical Responses:

  • Trend graphs: 42%

  • Filter platform/org: 34%

  • Object detail: 15%

  • Export CSV: 7%

  • Ask DevOps: 1%

  • Ignore: 1%

📝 Observation:
Users rely heavily on graphs and filters. This validated our decision to place visuals first, followed by contextual filters and drill-downs. It also confirmed that Change History and Object Detail Pages should only be a click away.

What is your preferred view for analyzing data?

Answer Options:

  • Graph view

  • Table view

  • Toggle between both

  • Neither – I prefer exporting

  • Doesn’t matter

Hypothetical Responses:

  • Graph view: 28%

  • Table view: 24%

  • Toggle both: 43%

  • Export: 4%

  • Doesn’t matter: 1%

📝 Observation:
Users want control over their data view, with the majority preferring a toggle between graph and table. This justified building a seamless toggle feature that retains filters and state between views.

How helpful are current platform filters for your analysis needs?

Answer Options:

  • Very helpful

  • Somewhat helpful

  • Neutral

  • Not enough filtering

  • Too many filters – it’s overwhelming

Hypothetical Responses:

  • Very helpful: 20%

  • Somewhat helpful: 38%

  • Neutral: 14%

  • Not enough: 22%

  • Too many: 6%

📝 Observation:
While filters are appreciated, 22% say filtering is still insufficient. This led us to prioritize multi-layer filtering (e.g., time + platform + tag + org) and add global + local filter controls to improve flexibility.

What features would help you take action faster on cost insights?

Answer Options (multi-select):

  • Real-time anomaly alerts

  • Optimization suggestions

  • Health indicators (failures, retries)

  • Predictive trends

  • Slack/email notifications

  • None – I prefer manual analysis

Hypothetical Responses:

  • Real-time alerts: 48%

  • Optimization suggestions: 62%

  • Health indicators: 44%

  • Predictive trends: 38%

  • Notifications: 27%

  • None: 3%

📝 Observation:
Users desire smarter, actionable dashboards, not just visualizations. This insight directly led to the creation of the Health Hub, as well as optimization cards and system alerts tied to usage anomalies.

Pain Points / Impacts / Solutions

Fragmented Platform Visibility

  • Impact:
    Users managing multiple platforms (Snowflake, Databricks) had to switch between tools or export data into spreadsheets to compare costs. This slowed decision-making and introduced errors in analysis.

  • Solution:
    We built a unified multi-platform dashboard with toggles for platform selection, consistent visual patterns, and shared filters. Users could now analyze costs, trends, and breakdowns across all platforms in one place.

Inability to Track Cost Spikes or Optimization Opportunities

    • Impact:
      Admins couldn’t proactively detect cost anomalies or surface savings insights. Spikes often went unnoticed until after budgets were exceeded.

    • Solution:
      We introduced a Health Hub and Optimization Snapshot module that flagged failed jobs, retry spikes, and savings from Databolt. Visual alert cards allowed users to act before problems escalated.

Static Cost Reporting with No Real-Time or Trend Visibility

  • Impact:
    Users relied on outdated exports or weekly syncs to track spend. Static views didn’t reflect dynamic scaling or real-time workload costs.

  • Solution:
    Designed a dynamic Cost Trends module with time controls (daily/weekly/monthly), real-time graphs, and platform/org-based breakdowns. This enabled teams to spot fluctuations early and react with context.

Lack of Tag-Based Filtering and Chargeback Clarity

  • Impact:
    Financial operations teams couldn’t easily segment costs by team, function, or initiative. Manual chargeback reports were time-consuming and inconsistent.

  • Solution:
    Built a Cost Breakdown view with tag filters (e.g., AI/ML, DevOps), business unit selectors, and exportable reports. This empowered users to attribute costs confidently and streamline internal billing.

Too Many or Too Few Filters—Hard to Customize Views

  • Impact:
    Some users felt overwhelmed by filters, while others felt constrained. It was difficult to balance simplicity and flexibility.

  • Solution:
    Introduced global + per-graph time filters, and configurable dashboards where users could add/remove modules. This allowed power users to dive deep while casual users got simplified overviews.

Graphs Didn’t Suit All Analysis Needs

  • Impact:
    Some users preferred tables for exact values, while others relied on visuals for pattern recognition. Having only one view limited usability.

  • Solution:
    Enabled Graph vs. Table toggle across all major views, with synchronized filters and export options. This dual-mode flexibility improved adoption across user types.

Alerts and Their Role in Real-Time Monitoring Dashboards

Context & Purpose

To empower platform admins with immediate, actionable insights, we implemented a real-time alert module embedded directly within the cost dashboard. These alerts proactively surface anomalies and inefficiencies that require urgent attention—helping teams minimize waste and maintain cost control across platforms like Snowflake and Databricks.

Key Features in the Alert Module

Each alert card follows a consistent layout, ensuring clarity and scannability:

  • Categorization Tabs:
    Users can toggle between Cost, Performance, Security, and Compliance alerts—providing focused monitoring.

  • Alert Type & Severity Icon:
    Visual icons (e.g., red triangle, yellow warning) immediately communicate the nature and urgency of the issue.

  • Source Context:
    Alerts clearly display the data source (e.g., SNOWFLAKE_WAREHOUSE_HISTORY, DATABRICKS_WORKSPACE_HISTORY) for traceability.

  • Cost Impact Metrics:
    Large font dollar values show the exact financial impact, paired with % change (red downward arrow) to highlight the deviation severity.

  • Embedded Graph Visualization:
    A mini time-series graph shows when the spike or anomaly occurred—providing instant temporal context without leaving the card.

  • Timestamped Trigger Info:
    Each alert includes a “Triggered X min ago” label for real-time situational awareness.

  • Call to Action:
    A View source link allows the user to jump directly into the raw data or detail page to investigate further.

Object details

In the context of complex cost monitoring and optimization workflows, surface-level dashboards are not enough. Users need to investigate root causes, track historical changes, and act quickly on anomalies—all without losing context. This is where our Side Panel architecture becomes critical.

Cost trends

Dynamic visualizations of cost behavior over time, including: Platform-to-platform comparisons Cost by workspace, org/unit, account, and type Ability to switch time increments (daily, weekly, monthly)

Case Breakdown

Designed for chargeback and cost attribution, this view: Breaks down cost by tags across all platforms Enables cost allocation reporting across cross-functional units Supports filtering by platform, org, and business unit

Health Hub

Proactive cost governance feature: Summarizes cost spikes, failed jobs, high retry counts, and system health Includes visual alerts on objects needing attention Encourages pre-emptive actions before major cost increases

object Detail: Side Panel

In the context of complex cost monitoring and optimization workflows, surface-level dashboards are not enough. Users need to investigate root causes, track historical changes, and act quickly on anomalies—all without losing context. This is where our Side Panel architecture becomes critical.

Key Takeaways

To gather quantitative data for the “AI Toolbox” project, crafting focused questions that can be quantitatively analyzed is crucial. Below are five quantitative research questions along with hypothetical answers based on a survey of potential users. These responses are accompanied by observations to provide insights into user needs and preferences.

Design for Modularity and Flexibility Pays Off

Creating configurable dashboards, graph/table toggles, and per-graph filters empowered different types of users—from executive stakeholders to platform engineers—to tailor the interface to their workflow needs without overwhelming others.

Unifying Multi-Platform Data Unlocks Real Business Value

By consolidating Snowflake, Databricks, and internal platform data into one centralized experience, we helped eliminate tool-switching, reduced data silos, and accelerated decision-making for tenant admins and FinOps teams.

Smarter Visualizations Beat Static Reports

Dynamic trend graphs, real-time breakdowns, and contextual tooltips improved comprehension and trust—turning passive dashboards into interactive tools for action and strategy.

User Feedback Is a Design Compass

Quantitative surveys and qualitative interviews directly informed everything from filter controls to layout hierarchy. Listening to users guided prioritization and helped reduce time-to-adoption post-launch.

Proactive Insights Are Better Than Reactive Reporting

The introduction of the Health Hub, change history panels, and optimization snapshots allowed users to prevent problems rather than diagnose them later—shifting the experience from reactive to predictive.

Good Design is Invisible but Transformative

Consistency in patterns, clarity in visual hierarchy, and precision in UI details (like tooltips and time increments) quietly improved usability and scaled the experience to complex data scenarios—without ever overwhelming users.