Generative AI Examples v1.3 Release Notes

OPEA Release Notes v1.3

We are excited to announce the release of OPEA version 1.3, which includes significant contributions from the open-source community. This release addresses over 520 pull requests.

More information about how to get started with OPEA v1.3 can be found on the Getting Started page. All project source code is maintained in the opea-project organization. To pull Docker images, please access the Docker Hub. For instructions on deploying Helm Charts, please refer to the guide.

What's New in OPEA v1.3

This release introduces exciting capabilities, optimizations, and user-centric enhancements:

Advanced Agent Capabilities

Multi-Turn Conversation: Enhanced the OPEA agent framework for dynamic, context-aware dialogues. (GenAIComps#1248)
Finance Agent Example: A financial agent example for automating financial data aggregation and leveraging LLMs to generate insights, forecasts, and strategic recommendations. (GenAIExamples#1539)

Performance and Scalability

vLLM Enhancement: Integrated vLLM as the default LLM serving backend for key GenAI examples across Intel® Xeon® processors, Intel® Gaudi® accelerators, and AMD® GPUs. (GenAIExamples#1436)
KubeAI Operator for OPEA (Alpha release): Simplified OPEA inference operations in cloud environment and enabled optimal out-of-the-box performance for specific models and hardware using profiles. (GenAIInfra#945)

Ecosystem Integrations

Haystack Integration: Enabled OPEA as a backend of Haystack. (Haystack-OPEA#1)
Cloud Readiness: Expanded automated Terraform deployment for ChatQnA to include support for Azure, and enabled CodeGen deployments on AWS and GCP. (GenAIExamples#1731)

New GenAI Capabilities

OPEA Store: Delivered a unified data store access API and a robust data store integration layer that streamlines data store integration. ArangoDB was integrated. (GenAIComps#1493)
CodeGen using RAG and Agent: Leveraged RAG and code agent to provide an additional layer of intelligence and adaptability for CodeGen example. (GenAIExamples#1757)
Enhanced Multimodality: Added support for additional audio file types (.mp3) and supported spoken audio captions with image ingestion. (GenAIExamples#1549)
Struct to Graph: Supported transforming structured data to graphs using Neo4j graph database. (GenAIComps#1502)
Text to Graph: Supported creating graphs from text by extracting graph triplets. (GenAIComps#1357, GenAIComps#1472)
Text to Cypher: Supported generating and executing Cypher queries from natural language for graph database retrieval. (GenAIComps#1319)

Enhanced Evaluation

Enhanced Long-Context Model Evaluation: Supported evaluating long-context model on Intel® Gaudi® with vLLM. (HELMET#20)
TAG-Bench for SQL Agents: Integrated TAG-Bench to evaluate complex SQL query generation (GenAIEval#230).
DocSum Support: GenAIEval now supports evaluating the performance of DocSum. (GenAIEval#252)
Toxicity Detection Evaluation: Introduced a workflow to evaluate the capability of detecting toxic language based on LLMs. (GenAIEval#241)
Model Card: Added a model card generator for generating reports containing model performance and fairness metrics. (GenAIEval#236)

Observability

OpenTelemetry Tracing: Leveraged OpenTelemetry to enable tracing for ChatQnA and AgentQnA along with TGI and TEI. (GenAIExamples#1542)
Application dashboards: Helm installed application E2E performance dashboard(s). (GenAIInfra#800)
E2E (end-to-end) metric improvements: E2E metrics are summed together for applications that use multiple megaservice instances. Tests for the E2E metrics + fixes. (GenAIComps#1301, (GenAIComps#1343)

Better User Experience

GenAIStudio: Supported drag-and-drop creation of agentic applications. (GenAIStudio#50)
Documentation Refinement: Refined READMEs for key examples to help readers easily locate documentation tailored to deployment, customization, and hardware. (GenAIExamples#1741)
Optimized Dockerfiles: Simplified application Dockerfiles for faster image builds. (GenAIExamples#1585)

Exploration

SQFT: Supported low-precision sparse parameter-efficient fine-tuning on LLMs. (GenAIResearch#1)

Newly Supported Models

OPEA introduced the support for the following models in this release.

Model	TGI-Gaudi	vLLM-CPU	vLLM-Gaudi	vLLM-ROCm	OVMS	Optimum-Habana	PredictionGuard
deepseek-ai/DeepSeek-R1-Distill-Llama-8B	✓	✓	✓	✓	-	✓	-
deepseek-ai/DeepSeek-R1-Distill-Llama-70B	✓	✓	✓	✓	-	✓	-
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B	✓	✓	✓	✓	-	✓	-
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B	✓	✓	✓	✓	-	✓	-
deepseek-ai/DeepSeek-R1-Distill-Qwen-14B	✓	✓	✓	✓	-	✓	-
deepseek-ai/DeepSeek-R1-Distill-Qwen-32B	✓	✓	✓	✓	-	✓	-
deepseek-ai/Deepseek-v3	✓	-	✓	✓	-	✓	-
Hermes-3-Llama-3.1-8B	-	-	-	✓	-	-	✓
ibm-granite/granite-3.2-8b-instruct	-	-	✓	✓	-	-	-
Phi-4-mini	x	x	x	✓	x	✓	-
Phi-4-multimodal-instruct	x	x	x	✓	x	✓	-
mistralai/Mistral-Small-24B-Instruct-2501	✓	-	✓	✓	-	✓	-
mistralai/Mistral-Large-Instruct-2411	x	-	✓	✓	-	✓	-

(✓: supported; -: not validated; x: unsupported)

Newly Supported Hardware

AMD® GPU using AMD® ROCm™ for 9 examples. (GenAIExamples#1613 and 8 more.)

Other Notable Changes

Expand the following lists to read:

GenAIExamples

Functionalities
- [AgentQnA] Added web search tool support and simplify the run instructions. (#1656) (e8f2313)
- [ChatQnA] Added support for latest deepseek models on Gaudi (#1491) (9adf7a6)
- [EdgeCraftRAG] A sleek new UI based on Vue and Ant Design for enhanced user experience, supporting concurrent multi-requests on vLLM, JSON pipeline configuration, and API-based prompt modification. (#1665) (5a50ae0)
- [EdgeCraftRAG] Supported multi-card deployment of Intel ARC GPU for vllm inference (#1729) (1a0c5f0)
- [FaqGen] Merged FaqGen into ChatQnA for unified Chatbot experience. (#1654) (6d24c1c)
Benchmark
- [ChatQnA] Provided unified scalable deployment and benchmarking support for examples (#1315) (ed16308)
Deployment
- Sync values yaml file for 1.3 release (#1748) (46ebb78)
Bug Fixes
- [AgentQnA] Fixed errors for running AgentQnA on xeon with openai and update readme (#1664) (fecc227)
- [AudioQnA] Fixed the LLM model field for inputs alignment (#1611) (2dfcfa0)
Documentation
- Updated README.md for OPEA OTLP tracing (#1406) (4c41a5d)
- Updated README.md for Agent UI (#1495) (88a8235)
- Refactored AudioQnA README (#1508) (9f36e84)
- Added a new section to change LLM model such as deepseek based on validated model table in LLM microservice (#1501) (970b869)
- Updated README.md of AIPC quick start (#1578) (852bc70)
- Added short descriptions to the images OPEA publishes on Docker Hub (#1637) (68747a9)
CI/CD/UT
- Added UT for rerank finetuning on Gaudi (#1472) (5f4b182)
- Enabled Gaudi 3, Rocm and Arc on manually release test. (#1615) (63b789a)
- Enabled base image build in CI/CD (#1669) (2204fe8)
- ChatQnA run CI with latest base image, group logs in GHA outputs. (#1736) (c48cd65)

GenAIComps

Functionalities
- [agent] Enabled custom prompt for react_llama and react_langgraph (#1391) (558a2f6)
- [dataprep] Added Multimodal support for Milvus for dataprep component (#1380) (006bd91)
- [dataprep]: New Arango integration (#1558)
- [dataprep]: Added ability to customize Dataprep unique input parameters by way of subclassing the DataprepRequest pydantic model. Avoids having to introduce parameters unique to a few Dataprep integrations across all Dataprep providers (#1525)
- [retrieval]: New Arango integration (#1558)
- [cores/mega] Added remote endpoint support (#1399) (1871dec)
- [docsum] Enlarged DocSum prompt buffer (#1471) (772ef6e)
- [embeddings] Refined CLIP embedding microservice by leveraging the third-party CLIP (#1298) (7727235)
- [finetuning] Added xtune to finetuning for Intel ARC GPU (#1432) (80ef317)
- [guardrails] Added native support for toxicity detection guardrail microservice (#1258) (625aec9)
- [llm/text-generation] Added support for string message in Bedrock textgen (#1291) (364ccad)
- [ipex] Added native LLM microservice using IPEX (#1337) (d51a136)
- [lvm] Integrated vLLM to lvm as a backend (#1362) (831c5a3)
- [lvm] Integrated UI-TARS vLLM in lvm component (#1458) (4a15795)
- [nubula] Docker deployment support for Nebula graph database (#1396) (342c1ed)
- [OVMS] Text generation, Embeddings and Reranking microservices based on OVMS component (#1318) (78b94fc)
- [retriever/milvus] Added Multimodal support for Milvus for retriever component (#1381) (40d431a)
- [text2image & image2image] Enriched input parameters of text2image and image2image. (#1339) (42f323f)
- Refined synchronized I/O in asynchronous functions (#1300) (b08571f)
Bug Fixes
- Docsum error by HuggingFaceEndpoint (#1246) (30e3dea)
- Fixed tei embedding and tei reranking bug (#1256) (fa01f46)
- Fixed web-retrievers hub client and tei endpoint issue (#1270) (ecb7f7b)
- Fixed Dataprep Ingest Data Issue. (#1271) (b777db7)
- Fixed metric id issue when init multiple Orchestrator instance (#1280) (f8e6216)
- Bug Fix neo4j dataprep ingest error handling and skip_ingestion argument passing (#1288) (4a90692)
- Fixed the retriever issue of Milvus (#1286) (47f68a4)
- Fixed Qdrant retriever RAG issue. (#1289) (c3c8497)
- Fixed agent message format. (#1297) (022d052)
- Fixed milvus dataprep ingest files failure (#1299) (a033c05)
- Fixed docker image security issues (#1321) (589587a)
- Megaservice / orchestrator metric testing + fixes (#1348) (1064b2b)
- Fixed finetuning python regex syntax error (#1446) (380f95c)
- Upgraded Optimum Habana version to fix security check issue (#1571) (83350aa)
- Make llamaguard compatible with both TGI and vLLM (#1581) (4024302)
Documentation
- GraphRAG README/compose fixes post refactor (#1221) (b38d9f3)
- Updated docs for LLamaGuard & WildGuard Microservice (#1259) (0df374b)
- Fixed Readme errors in dataprep component for all VectorDBs (#1377) (492f028)
- Refined the README for llms/doc-summarization (#1437) (559ebb2)
CI/CD/UT
- Refine dataprep test scripts (#1305) (a4f6af1)

GenAIEval

Auto Tuner
- RAG Pilot - A RAG pipeline tuning tool allowing fine-grained control over key aspects of parsing, chunking, postprocessing, and generating selection, enabling better retrieval and response generation. (#243) (97da8f2)
Monitoring
- Integrate with memory bandwidth exporter to support collection and reporting of memory bandwidth, cpu, mem metrics. (#218) (df5fd3e)
- Add benchmark docker image to support getting metrics among microservices and fixed a missing package for benchmarking with Dockerfile (#249) (dc3409f)
Metrics
- Collect vllm latency metric for e2e test (#244) (1b6a91d)
Bug Fixes
- Fix relative path issue for possion. (#234) (3b9981a)
- Add the missed file in release package (#233) (28ed0db)
- fix the error of TTFT and TPOT while the bench target is chatqna_qlist_pubmed (#238) (da04a9f)
- Fix performance benchmark with pubmed (#239) (5c8ab6e)
Documentation
- Add recommendations to platform optimization documentation (ea086a6)

GenAIInfra

HelmChart
- [TDX] Added Intel TDX support to helm charts (#799) (040860e)
- Add helm starter chart for developing new charts (#776) (6154b6c)
- HPA enabling usability improvement (#770) (3016f5f)
- Helm chart for Ollama (#774) (7d66afb)
- Helm: Added Qdrant support (#796) (99ccf0c)
- Chatqna: Added Qdrant DB support (#813) (5576cfd)
- Helm installed application metric Grafana dashboards (#800) (f46e8c1)
- LLM TextGen Bedrock Support (#811) (da37b9f)
- codegen: Add rag pipeline and change default UI (#985) (46b1b6b)
- dataprep/retriever: Support airgap offline environment (#980) (b9b10e9)
CSP
- Added automated provisioning of CosmosDB and App Insights for OPEA applications (#657) (d29bd2d)
Bug Fixes
- Fixed the helm chart release dependency update (#842) (f121edd)
CI/CD/UT
- CI: Enabled milvus related test (#767) (5b2cca9)

GenAIStudio

Updated studio fe table UI and updated studio be according to the dataprep refactor (#32) (1168507)
[Feat] Added GenAI Studio UI improvement (#48) (ad64f7c)
Enabled LLM Traces for sandbox (#51) (df6b73e)
Migrated to internal k8 mysql and enable deployment package generation for agentqna (#52) (0cddbe0)

Deprecations

Deprecated Examples

The following GenAI examples are deprecated, and were removed since OPEA v1.3:

Example	Migration Solution	Reasons for Deprecation
FaqGen	Use the example ChatQnA instead.	Provide users with a unified chatbot experience and reduce redundancy.

Deprecated Docker Images

The following Docker images are deprecated, and not updated / tagged for OPEA v1.3 release:

Deprecated Docker Image	Migration Solution	Reasons for Deprecation
opea/agent-ui	Use opea/agent-openwebui instead.	Open WebUI based UI for better user experience.
opea/chathistory-mongo-server	Use opea/chathistory-mongo instead.	Follow the OPEA naming rules
opea/faqgen	Use opea/chatqna or opea/chatqna-without-rerank instead.	FaqGen is deprecated.
opea/faqgen-ui	Use opea/chatqna-ui instead.	FaqGen is deprecated.
opea/faqgen-react-ui	Use opea/chatqna-ui instead.	FaqGen is deprecated.
opea/feedbackmanagement	Use opea/feedbackmanagement-mongo instead.	Follow the OPEA naming rules
opea/promptregistry-mongo-server	Use opea/promptregistry-mongo instead.	Follow the OPEA naming rules

The following Docker images are deprecated, and will not be updated / tagged since OPEA v1.4 release:

Deprecated Docker Image	Migration Solution	Reasons for Deprecation
opea/chathistory-mongo	Use opea/chathistory instead. The Docker image will be released with the `latest` tag before the v1.4 release.	OPEA introduced OPEAStore to decouple chathistory component from MongoDB.
opea/feedbackmanagement-mongo	Use opea/feedbackmanagement instead. The Docker image will be released with the `latest` tag before the v1.4 release.	OPEA introduced OPEAStore to decouple feedback management component from MongoDB.
opea/promptregistry-mongo	Use opea/promptregistry instead. The Docker image will be released with the `latest` tag before the v1.4 release.	OPEA introduced OPEAStore to decouple prompt registry component from MongoDB.

All OPEA docker images

Deprecated GenAIExample Variables

Example	Type	Variable	Migration Solution
ChatQnA	environment variable	`your_hf_api_token`	Removed from Intel AIPC deployment. Use the environment variable `HUGGINGFACEHUB_API_TOKEN` instead. This change aligns with the standardized naming conventions for environment variables.
ChatQnA	environment variable	`OLLAMA_HOST`	Removed from Intel AIPC deployment. Instead, users can customize `LLM_SERVER_HOST_IP` in `ChatQnA/docker_compose/intel/cpu/aipc/compose.yaml`.
DocIndexRetriever	environment variable	`TGI_LLM_ENDPOINT`	Removed due to no uses.
DocIndexRetriever	environment variable	`MEGA_SERVICE_HOST_IP`	Removed due to no uses.
DocIndexRetriever	environment variable	`LLM_SERVICE_HOST_IP`	Removed due to no uses.
GraphRAG	environment variable	`MAX_OUTPUT_TOKENS`	Instead, it has been split into two new environment variables: `MAX_INPUT_TOKENS` (default: 4096) and `MAX_TOTAL_TOKENS` (default: 8192) to control the maximum token limits.

Deprecated GenAIComps Parameters

Component	Parameter	Migration Solution
agent	`with_store` of `agent_config` in the Assistants APIs	Its functionality is now fully covered by the new `memory_type` parameter. In v1.3, please use `"with_memory": true` and `"memory_type": persistent` as its replacement. The `with_memory` parameter in `agent_config` of APIs is now enabled by default (true) for enabling multi-turn conversations. Please refer to the guide for more details.

Updated Dependencies

Dependency	Hardware	Scope	Version	Version in OPEA v1.2	Comments
gradio	-	all examples	5.11.0	5.5.0
huggingface/text-generation-inference	AMD GPU	all examples	2.4.1-rocm	2.3.1-rocm
huggingface/text-embeddings-inference	all	all examples	cpu-1.6	cpu-1.5
langchain langchain_community	-	llms/doc-summarization llms/faq-generation	0.3.14	0.3.15	Avoid bugs in FaqGen and DocSum.
optimum-habana	Gaudi	lvms/llama-vision	1.17.0	-
pytorch	Gaudi	all components	2.5.1	2.4.0
transformers	-	lvms/llama-vision	4.48.0	4.45.1
vllm	Xeon	all supported examples except EdgeCraftRAG	v0.8.3	-
vllm	Gaudi	all supported examples except EdgeCraftRAG	v0.6.6.post1+Gaudi-1.20.0	v0.6.4.post2+Gaudi-1.19.0
vllm	AMD GPU	all supported examples	rocm6.3.1_instinct_vllm0.8.3_20250410	-

Changes to Default Behavior

[agent] The default model changed from meta-llama/Meta-Llama-3-8B-Instruct to meta-llama/Llama-3.3-70B-Instruct.

Validated Hardware

Intel® Arc™ Graphics GPU (A770)
Intel® Gaudi® Al Accelerators (2nd, 3rd)
Intel® Xeon® Scalable processor (4th, 5th, 6th)
AMD® Instinct™ MI300X Accelerators (CDNA3)

Validated Software

AMD® ROCm™ Software v6.3.3
Docker 28.0.4
Docker Compose v2.34.0
Intel® Gaudi® software and drivers v1.20
Kubernetes v1.29.15
TEI v1.6
TGI v2.4.0 (Xeon), v2.3.1(Gaudi), v2.4.1 (ROCm)
Torch v2.5.1
Ubuntu 22.04
vLLM v0.8.3 (Xeon/ROCm), v0.6.6 (Gaudi)

Known Issues

AvatarChatbot can not work in K8s environment because of a functional gap in wav2clip service. (GenAIExamples#1506)

Full Changelogs

GenAIExamples: v1.2...v1.3
GenAIComps: v1.2...v1.3
GenAIInfra: v1.2...v1.3
GenAIEval: v1.2...v1.3
GenAIStudio: v1.2...v1.3
docs: v1.2...v1.3

Contributors

This release would not have been possible without the contributions of the following organizations and individuals.

Contributing Organizations

Amazon: Ollama deployment, Bedrock integration, OVMS integration and bug fixes.
AMD: vLLM enablement on AMD GPUs for key examples, AMD GPUs enabling on more examples, AMD OPEA blogs.
ArangoDB: OPEA Store and ArangoDB integration.
Intel: Development and improvements to GenAI examples, components, infrastructure, and evaluation.
Infosys: Azure support and documentation updates.
National Chiao Tung University: Documentation updates.
Prediction Guard: Maintenance of Prediction Guard components.

Individual Contributors

For a comprehensive list of individual contributors, please refer to the Full Changelogs section.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly