WordSmith, an AI assistant tailored for in-house legal teams, has integrated LangSmith into its operations to enhance the lifecycle of its product, according to LangChain Blog. This integration spans from prototyping to debugging and evaluation, significantly improving the performance and reliability of WordSmith’s LLM-powered features.
Prototyping & Development: Wrangling Complexity
WordSmith initially implemented a configurable Retrieval-Augmented Generation (RAG) pipeline for Slack, which has since evolved to support complex multi-stage inferences across various data sources. The AI assistant now processes Slack messages, Zendesk tickets, pull requests, and legal documents, optimizing for cost and latency using LLMs from OpenAI, Anthropic, Google, and Mistral.
LangSmith’s hierarchical tracing feature has been instrumental in this evolution. It provides transparent insights into what the LLM receives and produces at each step, allowing engineers to iterate quickly and confidently. This has proven to be more efficient than relying solely on Cloudwatch logs for debugging.
Performance Measurement: Establishing Baselines
WordSmith employs LangSmith to create static evaluation sets for various tasks, including RAG, agentic workloads, attribute extractions, and XML-based changeset targeting. These evaluation sets offer several key benefits:
- They clarify the requirements for each feature by setting clear expectations and requirements for the LLM.
- They enable rapid iteration and confident deployment of new models, such as when comparing Claude 3.5 to GPT-4.
- They optimize cost and latency while maintaining accuracy, reducing costs on specific tasks by up to 10x.
Operational Monitoring: Rapid Debugging
LangSmith’s visibility features also make it a core part of WordSmith’s online monitoring suite. Production errors can be linked directly to LangSmith traces, reducing debugging time from minutes to seconds. LangSmith’s indexed queries make it easy to isolate production errors related to inference issues, streamlining the debugging process.
WordSmith uses Statsig for feature flagging and experiment exposure, mapping each exposure to the appropriate LangSmith tag for simplified experiment analyses. This allows for seamless analysis and comparison between experiment groups.
Future Plans: Customer-Specific Optimization
Looking ahead, WordSmith plans to integrate LangSmith further into its product lifecycle to tackle complex optimization challenges. The company aims to optimize hyperparameters for each customer and use case, creating online datasets that automatically adjust based on query patterns and datasets.
This forward-thinking approach could lead to a highly personalized and efficient RAG experience for each customer, setting a new standard in legal AI operations.
Image source: Shutterstock
Credit: Source link