Introduction: Search, quantify, and edit data for LLMs with blazing fast dataset computations.
Added on: Jan 20, 2025
Lilac

What is Lilac

Lilac Garden is a powerful tool designed for data exploration and quality control, particularly for large language models (LLMs). It offers features like clustering, semantic and keyword search, editing and comparing fields, and detecting PII, duplicates, and custom signals. With its ability to process datasets at high speeds, Lilac Garden accelerates data transformations and helps users understand and refine their datasets effectively.

How to Use Lilac

  1. Install Lilac using Python.
  2. Use the user interface to explore and edit your datasets.
  3. Perform clustering, semantic search, and other data transformations to refine your data.

Features of Lilac

  • Clustering

    Cluster and title 1 million data points in 20 minutes.

  • Semantic & Keyword Search

    Perform advanced searches to find relevant data points.

  • Edit & Compare Fields

    Easily edit and compare different fields within your dataset.

  • PII, Duplicates, Language Detection

    Detect personally identifiable information, duplicates, and language within your dataset.

  • Fuzzy-Concept Search with Refinement

    Refine your search with fuzzy-concept matching to find the most relevant data.

  • Blazing Fast Dataset Computations

    Embed your dataset at half a billion tokens per minute.

FAQs from Lilac

1

What is Lilac Garden?

Lilac Garden is a tool for data exploration and quality control, designed to help users search, quantify, and edit data for large language models.
2

How fast can Lilac Garden process data?

Lilac Garden can cluster and title 1 million data points in 20 minutes and embed datasets at half a billion tokens per minute.
3

What features does Lilac Garden offer?

Lilac Garden offers clustering, semantic and keyword search, editing and comparing fields, PII detection, duplicates detection, language detection, and fuzzy-concept search with refinement.