ICICLE AI Embed Service

FastAPI service that turns text into embedding vectors using Qwen3-Embedding-0.6B (GGUF quantized) via llama-cpp-python, designed for the ICICLE AI Tapis tenant. The service runs the model locally — no external API calls — so a single .gguf file plus a Tapis token is everything a deployment needs.

API reference

This component exposes an HTTP API — see its API documentation on this site.

Pairs with the ICICLE AI Vector Service: this service produces vectors, that service stores and searches them.

References

Qwen3-Embedding model card
Qwen3-Embedding GGUF repo
llama-cpp-python
FastAPI Documentation
Tapis Project
Diataxis Framework

Acknowledgements

National Science Foundation (NSF) funded AI institute for Intelligent Cyberinfrastructure with Computational Learning in the Environment (ICICLE) (OAC 2112606)

Issue Reporting

Please report issues via GitHub Issues. Include steps to reproduce, expected behavior, and any relevant logs.

References​

Acknowledgements​

Issue Reporting​

References

Acknowledgements

Issue Reporting