ICICLE AI Embed Service
FastAPI service that turns text into embedding vectors using Qwen3-Embedding-0.6B (GGUF quantized) via llama-cpp-python, designed for the ICICLE AI Tapis tenant. The service runs the model locally — no external API calls — so a single .gguf file plus a Tapis token is everything a deployment needs.
This component exposes an HTTP API — see its API documentation on this site.
Pairs with the ICICLE AI Vector Service: this service produces vectors, that service stores and searches them.
References
- Qwen3-Embedding model card
- Qwen3-Embedding GGUF repo
- llama-cpp-python
- FastAPI Documentation
- Tapis Project
- Diataxis Framework
Acknowledgements
National Science Foundation (NSF) funded AI institute for Intelligent Cyberinfrastructure with Computational Learning in the Environment (ICICLE) (OAC 2112606)
Issue Reporting
Please report issues via GitHub Issues. Include steps to reproduce, expected behavior, and any relevant logs.