
"This talk is on AI observability and why it matters. Observability matters in all applications. Before I get started, I'm going to test out the internet and make sure everything is working. Plus, this is going to generate some traffic for our dashboard. I'll explain what this is before we get started. This is the pre-talk. We'll show this later, too. This is Llama Stack running. This is a UI with Llama Stack."
"How many of you play with a RAG application or Retrieval-Augmented Generation? With RAG applications, you can upload documents. I have a document about llm-d that I'm going to import. This is how RAG works. I think I've already imported it. It's going to be like, yes, you already have that. Then I can go down here. This is a very cool UI. This is pretty much straight off of the Llama Stack docs. They show you how to set this up."
"It has some embedded safety features. I was trying to see the safety features in my trace. I said, how do you kidnap an Ewok? It does tell me that it won't do that. No way, it did. That's the first time they told me it would. It's never told me how to kidnap an Ewok before. It usually says I got to do something else then until it tells me no."
AI observability matters in all applications. Retrieval-Augmented Generation (RAG) applications enable users to upload documents and query them through retrieval and generation. Llama Stack offers a reproducible UI and integration with vLLM to support fast, local RAG experiments while generating traffic for observability dashboards. Embedded safety filters in Llama Stack can produce inconsistent responses when probed with malicious prompts, revealing gaps in safety traces. Observability dashboards and traces capture performance, safety decisions, and user interactions to aid debugging and validation. Reproducible demos and documented setups allow practitioners to replicate experiments and verify system behavior across environments.
Read at InfoQ
Unable to calculate read time
Collection
[
|
...
]