Introduction: A general-purpose model for efficient, scalable learning of stable and inductive entity embeddings for heterogeneous relational data.
Added on: Jan 21, 2025
Cleora.ai

What is Cleora.ai

Cleora is a powerful tool designed for embedding entities in n-dimensional spherical spaces using fast, stable, and iterative random projections. It is particularly effective for heterogeneous relational data, including graphs, hypergraphs, and categorical arrays. The latest version, Cleora 2.0.0, introduces significant performance improvements, reduced memory usage, and new features like Python-native support and integration with NumPy.

How to Use Cleora.ai

  1. Install the Python package using pip install pycleora.
  2. Prepare your data in a format suitable for Cleora, such as grouping entities by context.
  3. Use the provided Python API to create embeddings, perform Markov random walks, and normalize the results.
  4. Utilize the embeddings for tasks like similarity comparison or further machine learning applications.

Use Cases of Cleora.ai

Cleora is ideal for embedding entities that interact or co-occur in various contexts, such as products in shopping baskets, locations visited by users, or co-authors of academic papers. It is particularly useful for large-scale datasets where efficiency and scalability are critical.

Features of Cleora.ai

  • Performance Optimizations

    Cleora 2.0.0 is approximately 10x faster and uses significantly less memory compared to previous versions.

  • Python Native

    The new version is available as a Python package, making it easier to integrate into existing workflows.

  • Custom Embeddings Initialization

    Supports initializing embeddings with external data, such as text or image vectors, for enhanced flexibility.

  • Adjustable Vector Projection

    Allows for customizable normalization and projection of vectors after each propagation step.

FAQs from Cleora.ai

1

What should I embed?

Any entities that interact or co-occur in a given context, such as products in a shopping basket or locations visited by users.
2

How should I construct the input?

Group entities by context and feed them as whitespace-separated lines. Use the `complex::reflexive` modifier for optimal results.
3

Can I embed users and products simultaneously?

No, it is methodologically incorrect. Instead, create product embeddings first and then derive user embeddings from them.
4

What embedding dimensionality should I use?

Higher dimensions (1024 to 4096) generally yield better results, as memory and computational resources allow.
5

How many iterations of Markov propagation should I use?

Typically, 3 to 7 iterations are sufficient, depending on whether you want to capture co-occurrence or contextual similarity.