Document IDs

_id field

Documents in a collection are always identified by an ID that is unique within the collection. This identifier is stored in the reserved field _id.

Default document IDs

When you create a collection, you can specify the default ID type for documents in the collection. The Data API supports Object IDs, version 4 UUIDs, version 6 UUIDs, and version 7 UUIDs. If you don’t specify the default ID type, the default type is a string form of a version 4 UUID.

If you don’t explicitly set the _id field when you insert a document into the collection, the Data API will automatically generate the _id field based on the default ID type for the collection.

For more information about setting the default ID type, see Create a collection.

Specifying document IDs

DataStax recommends using the automatically generated document ID instead of specifying the ID. This ensures uniqueness across the database and reduces the complexity of your code. However, you can use the reserved _id field to specify a document ID when you insert a document.

If you try to insert a document with an _id field that is not unique in the collection, the Data API will throw an error.

Deduplicating documents

If you want to prevent duplicate documents, you can generate the document’s _id as a hash of one or more fields. Because the Data API enforces uniqueness on the _id field, attempting to insert a document with an existing _id results in an error. Your application can catch that error and skip inserting the duplicate.

The following example shows how to generate the _id as a stable hash of a single field named content. If a document’s identity depends on multiple fields, you can instead hash a canonicalized representation of those fields.

  • Python

  • TypeScript

  • Java

import hashlib

from astrapy import DataAPIClient
from astrapy.exceptions.data_api_exceptions import DataAPIResponseException

# Get an existing collection
client = DataAPIClient()
database = client.get_database("API_ENDPOINT", token="APPLICATION_TOKEN")
collection = database.get_collection("COLLECTION_NAME")

# Example document
document = {
    "title": "Example article",
    "content": "This is the main text of the document. _id is generated from this field so that this field is never duplicated across documents.",
    "source": "https://example.com",
}

# Derive a deterministic _id based on the "content" field
document["_id"] = hashlib.sha256(document["content"].encode("utf-8")).hexdigest()

try:
    result = collection.insert_one(document)
    print("Inserted new document with _id:", result.inserted_id)
except DataAPIResponseException as exception:
    # Check for DOCUMENT_ALREADY_EXISTS from the Data API error code
    is_duplicate = any(
        descriptor.error_code == "DOCUMENT_ALREADY_EXISTS"
        for descriptor in exception.error_descriptors
    )

    if is_duplicate:
        print("Document already exists with this _id; skipping insert.")
    else:
        # Re-raise for any other Data API error
        raise
import crypto from "crypto";
import { DataAPIClient, DataAPIResponseError } from "@datastax/astra-db-ts";

// Get an existing collection
const client = new DataAPIClient();
const database = client.db("API_ENDPOINT", {
  token: "APPLICATION_TOKEN",
});
const collection = database.collection("COLLECTION_NAME");

(async function () {
  // Example document
  const document = {
    title: "Example article",
    content:
      "This is the main text of the document. _id is generated from this field so that this field is never duplicated across documents.",
    source: "https://example.com",
  };

  // Derive a deterministic _id based on the "content" field
  const id = crypto
    .createHash("sha256")
    .update(document.content, "utf8")
    .digest("hex");

  const documentWithId = { ...document, _id: id };

  try {
    const result = await collection.insertOne(documentWithId);
    console.log("Inserted new document with _id:", result.insertedId);
  } catch (error) {
    if (error instanceof DataAPIResponseError) {
      const errors = error.rawResponse?.errors ?? [];
      // Check for DOCUMENT_ALREADY_EXISTS from the Data API error code
      const isDuplicate = errors.some(
        (e) => e.errorCode === "DOCUMENT_ALREADY_EXISTS",
      );

      if (isDuplicate) {
        console.log("Document already exists with this _id; skipping insert.");
        return;
      }
    }

    // Re-throw for any other error
    throw error;
  }
})();
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.collections.Collection;
import com.datastax.astra.client.collections.commands.results.CollectionInsertOneResult;
import com.datastax.astra.client.collections.definition.documents.Document;
import com.datastax.astra.client.exceptions.DataAPIResponseException;
import java.nio.charset.StandardCharsets;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.util.HexFormat;

public class Example {

  public static void main(String[] args) throws NoSuchAlgorithmException {
    // Get an existing collection
    Collection<Document> collection =
        new DataAPIClient("APPLICATION_TOKEN")
            .getDatabase("API_ENDPOINT")
            .getCollection("COLLECTION_NAME");

    // Example document fields
    String content =
        "This is the main text of the document. _id is generated from this field so that this field is never duplicated across documents.";
    String title = "Example article";
    String source = "https://example.com";

    // Derive a deterministic _id based on the "content" field
    String id =
        HexFormat.of()
            .formatHex(
                MessageDigest.getInstance("SHA-256")
                    .digest(content.getBytes(StandardCharsets.UTF_8)));

    Document document =
        new Document()
            .id(id)
            .append("title", title)
            .append("content", content)
            .append("source", source);

    try {
      CollectionInsertOneResult result = collection.insertOne(document);
      System.out.println("Inserted new document with _id: " + result.getInsertedId());
    } catch (DataAPIResponseException error) {
      // Check for DOCUMENT_ALREADY_EXISTS from the Data API error code
      String errorCode = error.getErrorCode();
      if ("DOCUMENT_ALREADY_EXISTS".equals(errorCode)) {
        System.out.println("Document already exists with this _id; skipping insert.");
      } else {
        // Re-throw for any other Data API error
        throw error;
      }
    }
  }
}

Other document identifiers

Regardless of the collection’s default ID type, you can use document identifiers of any type outside of the reserved _id field. The Data API does not force uniqueness across identifiers outside of the _id field.

Was this helpful?

Give Feedback

How can we improve the documentation?

© Copyright IBM Corporation 2025 | Privacy policy | Terms of use Manage Privacy Choices

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: Contact IBM