Skip to content

Generative AI Examples v1.3 Release Notes

Latest
Compare
Choose a tag to compare
@ftian1 ftian1 released this 14 May 05:10
· 49 commits to main since this release

OPEA Release Notes v1.3

We are excited to announce the release of OPEA version 1.3, which includes significant contributions from the open-source community. This release addresses over 520 pull requests.

More information about how to get started with OPEA v1.3 can be found on the Getting Started page. All project source code is maintained in the opea-project organization. To pull Docker images, please access the Docker Hub. For instructions on deploying Helm Charts, please refer to the guide.

Table of Contents

What's New in OPEA v1.3

This release introduces exciting capabilities, optimizations, and user-centric enhancements:

Advanced Agent Capabilities

  • Multi-Turn Conversation: Enhanced the OPEA agent framework for dynamic, context-aware dialogues. (GenAIComps#1248)
  • Finance Agent Example: A financial agent example for automating financial data aggregation and leveraging LLMs to generate insights, forecasts, and strategic recommendations. (GenAIExamples#1539)

Performance and Scalability

  • vLLM Enhancement: Integrated vLLM as the default LLM serving backend for key GenAI examples across Intel® Xeon® processors, Intel® Gaudi® accelerators, and AMD® GPUs. (GenAIExamples#1436)
  • KubeAI Operator for OPEA (Alpha release): Simplified OPEA inference operations in cloud environment and enabled optimal out-of-the-box performance for specific models and hardware using profiles. (GenAIInfra#945)

Ecosystem Integrations

  • Haystack Integration: Enabled OPEA as a backend of Haystack. (Haystack-OPEA#1)
  • Cloud Readiness: Expanded automated Terraform deployment for ChatQnA to include support for Azure, and enabled CodeGen deployments on AWS and GCP. (GenAIExamples#1731)

New GenAI Capabilities

  • OPEA Store: Delivered a unified data store access API and a robust data store integration layer that streamlines data store integration. ArangoDB was integrated. (GenAIComps#1493)
  • CodeGen using RAG and Agent: Leveraged RAG and code agent to provide an additional layer of intelligence and adaptability for CodeGen example. (GenAIExamples#1757)
  • Enhanced Multimodality: Added support for additional audio file types (.mp3) and supported spoken audio captions with image ingestion. (GenAIExamples#1549)
  • Struct to Graph: Supported transforming structured data to graphs using Neo4j graph database. (GenAIComps#1502)
  • Text to Graph: Supported creating graphs from text by extracting graph triplets. (GenAIComps#1357, GenAIComps#1472)
  • Text to Cypher: Supported generating and executing Cypher queries from natural language for graph database retrieval. (GenAIComps#1319)

Enhanced Evaluation

  • Enhanced Long-Context Model Evaluation: Supported evaluating long-context model on Intel® Gaudi® with vLLM. (HELMET#20)
  • TAG-Bench for SQL Agents: Integrated TAG-Bench to evaluate complex SQL query generation (GenAIEval#230).
  • DocSum Support: GenAIEval now supports evaluating the performance of DocSum. (GenAIEval#252)
  • Toxicity Detection Evaluation: Introduced a workflow to evaluate the capability of detecting toxic language based on LLMs. (GenAIEval#241)
  • Model Card: Added a model card generator for generating reports containing model performance and fairness metrics. (GenAIEval#236)

Observability

  • OpenTelemetry Tracing: Leveraged OpenTelemetry to enable tracing for ChatQnA and AgentQnA along with TGI and TEI. (GenAIExamples#1542)
  • Application dashboards: Helm installed application E2E performance dashboard(s). (GenAIInfra#800)
  • E2E (end-to-end) metric improvements: E2E metrics are summed together for applications that use multiple megaservice instances. Tests for the E2E metrics + fixes. (GenAIComps#1301, (GenAIComps#1343)

Better User Experience

  • GenAIStudio: Supported drag-and-drop creation of agentic applications. (GenAIStudio#50)
  • Documentation Refinement: Refined READMEs for key examples to help readers easily locate documentation tailored to deployment, customization, and hardware. (GenAIExamples#1741)
  • Optimized Dockerfiles: Simplified application Dockerfiles for faster image builds. (GenAIExamples#1585)

Exploration

  • SQFT: Supported low-precision sparse parameter-efficient fine-tuning on LLMs. (GenAIResearch#1)

Newly Supported Models

OPEA introduced the support for the following models in this release.

Model TGI-Gaudi vLLM-CPU vLLM-Gaudi vLLM-ROCm OVMS Optimum-Habana PredictionGuard
deepseek-ai/DeepSeek-R1-Distill-Llama-8B - -
deepseek-ai/DeepSeek-R1-Distill-Llama-70B - -
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B - -
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B - -
deepseek-ai/DeepSeek-R1-Distill-Qwen-14B - -
deepseek-ai/DeepSeek-R1-Distill-Qwen-32B - -
deepseek-ai/Deepseek-v3 - - -
Hermes-3-Llama-3.1-8B - - - - -
ibm-granite/granite-3.2-8b-instruct - - - - -
Phi-4-mini x x x x -
Phi-4-multimodal-instruct x x x x -
mistralai/Mistral-Small-24B-Instruct-2501 - - -
mistralai/Mistral-Large-Instruct-2411 x - - -

(✓: supported; -: not validated; x: unsupported)

Newly Supported Hardware

Other Notable Changes

Expand the following lists to read:

GenAIExamples
  • Functionalities

    • [AgentQnA] Added web search tool support and simplify the run instructions. (#1656) (e8f2313)
    • [ChatQnA] Added support for latest deepseek models on Gaudi (#1491) (9adf7a6)
    • [EdgeCraftRAG] A sleek new UI based on Vue and Ant Design for enhanced user experience, supporting concurrent multi-requests on vLLM, JSON pipeline configuration, and API-based prompt modification. (#1665) (5a50ae0)
    • [EdgeCraftRAG] Supported multi-card deployment of Intel ARC GPU for vllm inference (#1729) (1a0c5f0)
    • [FaqGen] Merged FaqGen into ChatQnA for unified Chatbot experience. (#1654) (6d24c1c)
  • Benchmark

    • [ChatQnA] Provided unified scalable deployment and benchmarking support for examples (#1315) (ed16308)
  • Deployment

  • Bug Fixes

    • [AgentQnA] Fixed errors for running AgentQnA on xeon with openai and update readme (#1664) (fecc227)
    • [AudioQnA] Fixed the LLM model field for inputs alignment (#1611) (2dfcfa0)
  • Documentation

    • Updated README.md for OPEA OTLP tracing (#1406) (4c41a5d)
    • Updated README.md for Agent UI (#1495) (88a8235)
    • Refactored AudioQnA README (#1508) (9f36e84)
    • Added a new section to change LLM model such as deepseek based on validated model table in LLM microservice (#1501) (970b869)
    • Updated README.md of AIPC quick start (#1578) (852bc70)
    • Added short descriptions to the images OPEA publishes on Docker Hub (#1637) (68747a9)
  • CI/CD/UT

    • Added UT for rerank finetuning on Gaudi (#1472) (5f4b182)
    • Enabled Gaudi 3, Rocm and Arc on manually release test. (#1615) (63b789a)
    • Enabled base image build in CI/CD (#1669) (2204fe8)
    • ChatQnA run CI with latest base image, group logs in GHA outputs. (#1736) (c48cd65)
GenAIComps
  • Functionalities

    • [agent] Enabled custom prompt for react_llama and react_langgraph (#1391) (558a2f6)
    • [dataprep] Added Multimodal support for Milvus for dataprep component (#1380) (006bd91)
    • [dataprep]: New Arango integration (#1558)
    • [dataprep]: Added ability to customize Dataprep unique input parameters by way of subclassing the DataprepRequest pydantic model. Avoids having to introduce parameters unique to a few Dataprep integrations across all Dataprep providers (#1525)
    • [retrieval]: New Arango integration (#1558)
    • [cores/mega] Added remote endpoint support (#1399) (1871dec)
    • [docsum] Enlarged DocSum prompt buffer (#1471) (772ef6e)
    • [embeddings] Refined CLIP embedding microservice by leveraging the third-party CLIP (#1298) (7727235)
    • [finetuning] Added xtune to finetuning for Intel ARC GPU (#1432) (80ef317)
    • [guardrails] Added native support for toxicity detection guardrail microservice (#1258) (625aec9)
    • [llm/text-generation] Added support for string message in Bedrock textgen (#1291) (364ccad)
    • [ipex] Added native LLM microservice using IPEX (#1337) (d51a136)
    • [lvm] Integrated vLLM to lvm as a backend (#1362) (831c5a3)
    • [lvm] Integrated UI-TARS vLLM in lvm component (#1458) (4a15795)
    • [nubula] Docker deployment support for Nebula graph database (#1396) (342c1ed)
    • [OVMS] Text generation, Embeddings and Reranking microservices based on OVMS component (#1318) (78b94fc)
    • [retriever/milvus] Added Multimodal support for Milvus for retriever component (#1381) (40d431a)
    • [text2image & image2image] Enriched input parameters of text2image and image2image. (#1339) (42f323f)
    • Refined synchronized I/O in asynchronous functions (#1300) (b08571f)
  • Bug Fixes

    • Docsum error by HuggingFaceEndpoint (#1246) (30e3dea)
    • Fixed tei embedding and tei reranking bug (#1256) (fa01f46)
    • Fixed web-retrievers hub client and tei endpoint issue (#1270) (ecb7f7b)
    • Fixed Dataprep Ingest Data Issue. (#1271) (b777db7)
    • Fixed metric id issue when init multiple Orchestrator instance (#1280) (f8e6216)
    • Bug Fix neo4j dataprep ingest error handling and skip_ingestion argument passing (#1288) (4a90692)
    • Fixed the retriever issue of Milvus (#1286) (47f68a4)
    • Fixed Qdrant retriever RAG issue. (#1289) (c3c8497)
    • Fixed agent message format. (#1297) (022d052)
    • Fixed milvus dataprep ingest files failure (#1299) (a033c05)
    • Fixed docker image security issues (#1321) (589587a)
    • Megaservice / orchestrator metric testing + fixes (#1348) (1064b2b)
    • Fixed finetuning python regex syntax error (#1446) (380f95c)
    • Upgraded Optimum Habana version to fix security check issue (#1571) (83350aa)
    • Make llamaguard compatible with both TGI and vLLM (#1581) (4024302)
  • Documentation

    • GraphRAG README/compose fixes post refactor (#1221) (b38d9f3)
    • Updated docs for LLamaGuard & WildGuard Microservice (#1259) (0df374b)
    • Fixed Readme errors in dataprep component for all VectorDBs (#1377) (492f028)
    • Refined the README for llms/doc-summarization (#1437) (559ebb2)
  • CI/CD/UT

    • Refine dataprep test scripts (#1305) (a4f6af1)
GenAIEval
  • Auto Tuner

    • RAG Pilot - A RAG pipeline tuning tool allowing fine-grained control over key aspects of parsing, chunking, postprocessing, and generating selection, enabling better retrieval and response generation. (#243) (97da8f2)
  • Monitoring

    • Integrate with memory bandwidth exporter to support collection and reporting of memory bandwidth, cpu, mem metrics. (#218) (df5fd3e)
    • Add benchmark docker image to support getting metrics among microservices and fixed a missing package for benchmarking with Dockerfile (#249) (dc3409f)
  • Metrics

    • Collect vllm latency metric for e2e test (#244) (1b6a91d)
  • Bug Fixes

    • Fix relative path issue for possion. (#234) (3b9981a)
    • Add the missed file in release package (#233) (28ed0db)
    • fix the error of TTFT and TPOT while the bench target is chatqna_qlist_pubmed (#238) (da04a9f)
    • Fix performance benchmark with pubmed (#239) (5c8ab6e)
  • Documentation

    • Add recommendations to platform optimization documentation (ea086a6)
GenAIInfra
  • HelmChart

    • [TDX] Added Intel TDX support to helm charts (#799) (040860e)
    • Add helm starter chart for developing new charts (#776) (6154b6c)
    • HPA enabling usability improvement (#770) (3016f5f)
    • Helm chart for Ollama (#774) (7d66afb)
    • Helm: Added Qdrant support (#796) (99ccf0c)
    • Chatqna: Added Qdrant DB support (#813) (5576cfd)
    • Helm installed application metric Grafana dashboards (#800) (f46e8c1)
    • LLM TextGen Bedrock Support (#811) (da37b9f)
    • codegen: Add rag pipeline and change default UI (#985) (46b1b6b)
    • dataprep/retriever: Support airgap offline environment (#980) (b9b10e9)
  • CSP

    • Added automated provisioning of CosmosDB and App Insights for OPEA applications (#657) (d29bd2d)
  • Bug Fixes

    • Fixed the helm chart release dependency update (#842) (f121edd)
  • CI/CD/UT

GenAIStudio
  • Updated studio fe table UI and updated studio be according to the dataprep refactor (#32) (1168507)
  • [Feat] Added GenAI Studio UI improvement (#48) (ad64f7c)
  • Enabled LLM Traces for sandbox (#51) (df6b73e)
  • Migrated to internal k8 mysql and enable deployment package generation for agentqna (#52) (0cddbe0)

Deprecations

Deprecated Examples

The following GenAI examples are deprecated, and were removed since OPEA v1.3:

Example Migration Solution Reasons for Deprecation
FaqGen Use the example ChatQnA instead. Provide users with a unified chatbot experience and reduce redundancy.

Deprecated Docker Images

The following Docker images are deprecated, and not updated / tagged for OPEA v1.3 release:

Deprecated Docker Image Migration Solution Reasons for Deprecation
opea/agent-ui Use opea/agent-openwebui instead. Open WebUI based UI for better user experience.
opea/chathistory-mongo-server Use opea/chathistory-mongo instead. Follow the OPEA naming rules
opea/faqgen Use opea/chatqna or opea/chatqna-without-rerank instead. FaqGen is deprecated.
opea/faqgen-ui Use opea/chatqna-ui instead. FaqGen is deprecated.
opea/faqgen-react-ui Use opea/chatqna-ui instead. FaqGen is deprecated.
opea/feedbackmanagement Use opea/feedbackmanagement-mongo instead. Follow the OPEA naming rules
opea/promptregistry-mongo-server Use opea/promptregistry-mongo instead. Follow the OPEA naming rules

The following Docker images are deprecated, and will not be updated / tagged since OPEA v1.4 release:

Deprecated Docker Image Migration Solution Reasons for Deprecation
opea/chathistory-mongo Use opea/chathistory instead. The Docker image will be released with the latest tag before the v1.4 release. OPEA introduced OPEAStore to decouple chathistory component from MongoDB.
opea/feedbackmanagement-mongo Use opea/feedbackmanagement instead. The Docker image will be released with the latest tag before the v1.4 release. OPEA introduced OPEAStore to decouple feedback management component from MongoDB.
opea/promptregistry-mongo Use opea/promptregistry instead. The Docker image will be released with the latest tag before the v1.4 release. OPEA introduced OPEAStore to decouple prompt registry component from MongoDB.

All OPEA docker images

Deprecated GenAIExample Variables

Example Type Variable Migration Solution
ChatQnA environment variable your_hf_api_token Removed from Intel AIPC deployment. Use the environment variable HUGGINGFACEHUB_API_TOKEN instead. This change aligns with the standardized naming conventions for environment variables.
ChatQnA environment variable OLLAMA_HOST Removed from Intel AIPC deployment. Instead, users can customize LLM_SERVER_HOST_IP in ChatQnA/docker_compose/intel/cpu/aipc/compose.yaml.
DocIndexRetriever environment variable TGI_LLM_ENDPOINT Removed due to no uses.
DocIndexRetriever environment variable MEGA_SERVICE_HOST_IP Removed due to no uses.
DocIndexRetriever environment variable LLM_SERVICE_HOST_IP Removed due to no uses.
GraphRAG environment variable MAX_OUTPUT_TOKENS Instead, it has been split into two new environment variables: MAX_INPUT_TOKENS (default: 4096) and MAX_TOTAL_TOKENS (default: 8192) to control the maximum token limits.

Deprecated GenAIComps Parameters

Component Parameter Migration Solution
agent with_store of agent_config in the Assistants APIs Its functionality is now fully covered by the new memory_type parameter. In v1.3, please use "with_memory": true and "memory_type": persistent as its replacement. The with_memory parameter in agent_config of APIs is now enabled by default (true) for enabling multi-turn conversations. Please refer to the guide for more details.

Updated Dependencies

Dependency Hardware Scope Version Version in OPEA v1.2 Comments
gradio - all examples 5.11.0 5.5.0
huggingface/text-generation-inference AMD GPU all examples 2.4.1-rocm 2.3.1-rocm
huggingface/text-embeddings-inference all all examples cpu-1.6 cpu-1.5
langchain
langchain_community
- llms/doc-summarization
llms/faq-generation
0.3.14 0.3.15 Avoid bugs in FaqGen and DocSum.
optimum-habana Gaudi lvms/llama-vision 1.17.0 -
pytorch Gaudi all components 2.5.1 2.4.0
transformers - lvms/llama-vision 4.48.0 4.45.1
vllm Xeon all supported examples except EdgeCraftRAG v0.8.3 -
vllm Gaudi all supported examples except EdgeCraftRAG v0.6.6.post1+Gaudi-1.20.0 v0.6.4.post2+Gaudi-1.19.0
vllm AMD GPU all supported examples rocm6.3.1_instinct_vllm0.8.3_20250410 -

Changes to Default Behavior

  • [agent] The default model changed from meta-llama/Meta-Llama-3-8B-Instruct to meta-llama/Llama-3.3-70B-Instruct.

Validated Hardware

  • Intel® Arc™ Graphics GPU (A770)
  • Intel® Gaudi® Al Accelerators (2nd, 3rd)
  • Intel® Xeon® Scalable processor (4th, 5th, 6th)
  • AMD® Instinct™ MI300X Accelerators (CDNA3)

Validated Software

  • AMD® ROCm™ Software v6.3.3
  • Docker 28.0.4
  • Docker Compose v2.34.0
  • Intel® Gaudi® software and drivers v1.20
  • Kubernetes v1.29.15
  • TEI v1.6
  • TGI v2.4.0 (Xeon), v2.3.1(Gaudi), v2.4.1 (ROCm)
  • Torch v2.5.1
  • Ubuntu 22.04
  • vLLM v0.8.3 (Xeon/ROCm), v0.6.6 (Gaudi)

Known Issues

Full Changelogs

Contributors

This release would not have been possible without the contributions of the following organizations and individuals.

Contributing Organizations

  • Amazon: Ollama deployment, Bedrock integration, OVMS integration and bug fixes.
  • AMD: vLLM enablement on AMD GPUs for key examples, AMD GPUs enabling on more examples, AMD OPEA blogs.
  • ArangoDB: OPEA Store and ArangoDB integration.
  • Intel: Development and improvements to GenAI examples, components, infrastructure, and evaluation.
  • Infosys: Azure support and documentation updates.
  • National Chiao Tung University: Documentation updates.
  • Prediction Guard: Maintenance of Prediction Guard components.

Individual Contributors

For a comprehensive list of individual contributors, please refer to the Full Changelogs section.