How Dropbox Built a Scalable Context Engine for Enterprise Knowledge Search

"Dropbox engineers have detailed how the organization was able to build the context engine behind Dropbox Dash, demonstrating a shift towards index-based retrieval, knowledge graph-derived context, and continuous evaluation to support enterprise AI knowledge retrieval at scale. The design points to a broader pattern emerging across enterprise assistants, whereby teams are deliberately constraining their live tool usage and instead relying more heavily on pre-processed, permission-aware context to speed latency, improve quality and ease token pressure."

"As part of a recent engineering talk, Dropbox VP of Engineering, Josh Clemm described their application as a response to work in enterprises being distributed across dozens of SaaS applications, each with their own distinct APIs, permission structures and rate limits. Despite the latest language models incorporating reasoning, Clemm said they lack direct access to an enterprise's data for context. This leads to additional infrastructure being necessary to retrieve potentially sensitive information safely."

"The architecture at the center of Dash relies on pre-processing content rather than runtime inference retrieval. Data from the connected knowledge applications is normalized, enriched and indexed before a query is made using a mix of lexical search and dense vectors. This allows the application to return results without having to create a spiderweb of API calls at query-time. This method does incur higher"

Dropbox built a context engine for Dash by shifting to index-based retrieval, knowledge-graph-derived context, and continuous evaluation to support enterprise AI knowledge retrieval at scale. The architecture relies on pre-processing content rather than runtime inference retrieval. Data from connected knowledge applications is normalized, enriched, and indexed before queries, using a mix of lexical search and dense vectors to avoid numerous runtime API calls. Knowledge graphs model relationships across people, documents, and meetings, with derived knowledge bundles for retrieval. Preprocessing increases complexity and storage costs but enables offline ranking experiments, improved relevance signals, predictable query-time performance, and safer permission-aware access to enterprise data.

#enterprise-ai #index-based-retrieval #knowledge-graphs #preprocessing

Read at InfoQ

Unable to calculate read time

Collection

[

...

]

How Dropbox Built a Scalable Context Engine for Enterprise Knowledge SearchHow Dropbox Built a Scalable Context Engine for Enterprise Knowledge Search Briefly

How Dropbox Built a Scalable Context Engine for Enterprise Knowledge Search
How Dropbox Built a Scalable Context Engine for Enterprise Knowledge Search
Briefly