genos.api.embedding_extractor

Classes

EmbeddingExtractorAPI(session, base_url[, ...])

Wrapper class for the Embedding Extraction API endpoints.

class EmbeddingExtractorAPI(session: Session, base_url: str, timeout: int = 30, config=None)[source]

Bases: BaseAPI

Wrapper class for the Embedding Extraction API endpoints.

This class handles requests to the embedding extraction endpoints, including single sequence and batch processing.

extract(sequence: str, model_name: str = 'Genos-1.2B', pooling_method: str = 'mean') dict | List[dict][source]

Extracts a numerical embedding representation for a given nucleotide sequence.

Parameters:
  • sequence (str) – DNA sequence string .

  • model_name (str, optional) – Model name to use. Default is “Genos-1.2B”. Options: “Genos-1.2B”, “Genos-10B”

  • pooling_method (str, optional) – Pooling method. Default is “mean”. Options: “mean”, “max”, “last”, “none”

Returns:

  • “token_count”: number of tokens

  • ”embedding_shape”: shape of embedding array

  • ”embedding_dim”: dimension of embedding

  • ”embedding”: embedding array (list)

Return type:

dict

Raises:
  • ValueError – If sequence is not a valid string or list.

  • ValidationError – If parameters are invalid.

  • APIRequestError – If the API request fails.

Examples

>>> # Single sequence
>>> result = embedding_api.extract("ATCGATCGATCG")
>>> print(result['embedding_dim'])
4096