As the Databricks AI World Tour arrived in Amsterdam, SGI hosted an exclusive roundtable to help Databricks users and customers tackle one of the biggest challenges in AI today: GenAI Quality and Observability.
We were joined by Angelo Sgambati—Databricks Champion, Certified Instructor and Resident Solution Architect with Databricks, as well as a Principal Consultant at RevoData—who shared practical strategies for building reliable, high-performing GenAI systems.
Why GenAI Quality Matters
GenAI models can deliver incredible results, but they can also return incorrect information, leading to unhappy or misinformed users. Why does this happen?
- Unpredictable inputs and non-deterministic outputs make quality assessment complex.
- Model selection, tooling, and system architecture directly impact cost, latency, and reliability.
- Agentic systems often perform better but introduce complexity.
Questions Every Brickster Should Ask
When developing LLM-based solutions, it’s important to ask yourself:
- Is the system reliable and behaving as expected?
- Are users satisfied with the outcomes?
- Is the solution cost-effective?
- Are there biases or ethical concerns?
Testing GenAI-like Software
Prompt engineering and vibe checks are not enough. Unlike traditional software, GenAI requires continuous evaluation:
- Run structured evaluations (unit tests, QA, E2E testing)
- Implement production telemetry
- Monitor and refine metrics to align with human judgment
LLM-as-a-Judge & Human-in-the-Loop
Can LLMs evaluate themselves? Yes, using LLM-as-a-Judge, where rules are applied automatically to assess responses.
But metrics alone aren’t perfect. Bias and lack of context mean human oversight is essential to ensure accuracy and trust.
Driving Development Through Evaluation
Start small:
- Use 100-example evaluations to track progress
- Monitor workflow changes and refine metrics that fail to reflect real-world quality
- Apply error analysis to identify and prioritise fixes
Your Next Step
Building GenAI solutions that users trust requires quality, observability, and the right talent.
If you’re looking to hire Databricks specialists in the Netherlands or want to join our next event, connect with Jordan Wright or Ross Paterson.
