Skip to main content

ICICLE AI Embed Service

FastAPI service that turns text into embedding vectors using Qwen3-Embedding-0.6B (GGUF quantized) via llama-cpp-python, designed for the ICICLE AI Tapis tenant. The service runs the model locally — no external API calls — so a single .gguf file plus a Tapis token is everything a deployment needs.

GitHub Repo License: GPL v3

API reference

This component exposes an HTTP API — see its API documentation on this site.

Pairs with the ICICLE AI Vector Service: this service produces vectors, that service stores and searches them.

References

Acknowledgements

National Science Foundation (NSF) funded AI institute for Intelligent Cyberinfrastructure with Computational Learning in the Environment (ICICLE) (OAC 2112606)

Issue Reporting

Please report issues via GitHub Issues. Include steps to reproduce, expected behavior, and any relevant logs.