Document IDs

`_id` field

Documents in a collection are always identified by an ID that is unique within the collection. This identifier is stored in the reserved field _id.

When you create a collection, you can specify the default ID type for documents in the collection. The Data API supports Object IDs, version 4 UUIDs, version 6 UUIDs, and version 7 UUIDs. If you don’t specify the default ID type, the default type is a string form of a version 4 UUID.

If you don’t explicitly set the _id field when you insert a document into the collection, the Data API will automatically generate the _id field based on the default ID type for the collection.

For more information about setting the default ID type, see Create a collection.

Specifying document IDs

DataStax recommends using the automatically generated document ID instead of specifying the ID. This ensures uniqueness across the database and reduces the complexity of your code. However, you can use the reserved _id field to specify a document ID when you insert a document.

For examples, see Insert documents and specify the IDs.

If you try to insert a document with an _id field that is not unique in the collection, the Data API will throw an error.

Deduplicating documents

If you want to prevent duplicate documents, you can generate the document’s _id as a hash of one or more fields. Because the Data API enforces uniqueness on the _id field, attempting to insert a document with an existing _id results in an error. Your application can catch that error and skip inserting the duplicate.

The following example shows how to generate the _id as a stable hash of a single field named content. If a document’s identity depends on multiple fields, you can instead hash a canonicalized representation of those fields.

Python
TypeScript
Java

import hashlib

from astrapy import DataAPIClient
from astrapy.exceptions.data_api_exceptions import (
    DataAPIResponseException,
)

# Get an existing collection
client = DataAPIClient()
database = client.get_database(
    "API_ENDPOINT", token="APPLICATION_TOKEN"
)
collection = database.get_collection("COLLECTION_NAME")

# Example document
document = {
    "title": "Example article",
    "content": "This is the main text of the document. _id is generated from this field so that this field is never duplicated across documents.",
    "source": "https://example.com",
}

# Derive a deterministic _id based on the "content" field
document["_id"] = hashlib.sha256(
    document["content"].encode("utf-8")
).hexdigest()

try:
    result = collection.insert_one(document)
    print("Inserted new document with _id:", result.inserted_id)
except DataAPIResponseException as exception:
    # Check for DOCUMENT_ALREADY_EXISTS from the Data API error code
    is_duplicate = any(
        descriptor.error_code == "DOCUMENT_ALREADY_EXISTS"
        for descriptor in exception.error_descriptors
    )

    if is_duplicate:
        print("Document already exists with this _id; skipping insert.")
    else:
        # Re-raise for any other Data API error
        raise

import crypto from "crypto";
import { DataAPIClient, DataAPIResponseError } from "@datastax/astra-db-ts";

// Get an existing collection
const client = new DataAPIClient();
const database = client.db("API_ENDPOINT", {
  token: "APPLICATION_TOKEN",
});
const collection = database.collection("COLLECTION_NAME");

(async function () {
  // Example document
  const document = {
    title: "Example article",
    content:
      "This is the main text of the document. _id is generated from this field so that this field is never duplicated across documents.",
    source: "https://example.com",
  };

  // Derive a deterministic _id based on the "content" field
  const id = crypto
    .createHash("sha256")
    .update(document.content, "utf8")
    .digest("hex");

  const documentWithId = { ...document, _id: id };

  try {
    const result = await collection.insertOne(documentWithId);
    console.log("Inserted new document with _id:", result.insertedId);
  } catch (error) {
    if (error instanceof DataAPIResponseError) {
      const errors = error.rawResponse?.errors ?? [];
      // Check for DOCUMENT_ALREADY_EXISTS from the Data API error code
      const isDuplicate = errors.some(
        (e) => e.errorCode === "DOCUMENT_ALREADY_EXISTS",
      );

      if (isDuplicate) {
        console.log("Document already exists with this _id; skipping insert.");
        return;
      }
    }

    // Re-throw for any other error
    throw error;
  }
})();

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.collections.Collection;
import com.datastax.astra.client.collections.commands.results.CollectionInsertOneResult;
import com.datastax.astra.client.collections.definition.documents.Document;
import com.datastax.astra.client.exceptions.DataAPIResponseException;
import java.nio.charset.StandardCharsets;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.util.HexFormat;

public class Example {

  public static void main(String[] args) throws NoSuchAlgorithmException {
    // Get an existing collection
    Collection<Document> collection =
        new DataAPIClient("APPLICATION_TOKEN")
            .getDatabase("API_ENDPOINT")
            .getCollection("COLLECTION_NAME");

    // Example document fields
    String content =
        "This is the main text of the document. _id is generated from this field so that this field is never duplicated across documents.";
    String title = "Example article";
    String source = "https://example.com";

    // Derive a deterministic _id based on the "content" field
    String id =
        HexFormat.of()
            .formatHex(
                MessageDigest.getInstance("SHA-256")
                    .digest(content.getBytes(StandardCharsets.UTF_8)));

    Document document =
        new Document()
            .id(id)
            .append("title", title)
            .append("content", content)
            .append("source", source);

    try {
      CollectionInsertOneResult result = collection.insertOne(document);
      System.out.println("Inserted new document with _id: " + result.getInsertedId());
    } catch (DataAPIResponseException error) {
      // Check for DOCUMENT_ALREADY_EXISTS from the Data API error code
      String errorCode = error.getErrorCode();
      if ("DOCUMENT_ALREADY_EXISTS".equals(errorCode)) {
        System.out.println("Document already exists with this _id; skipping insert.");
      } else {
        // Re-throw for any other Data API error
        throw error;
      }
    }
  }
}

Document IDs

`_id` field

Default document IDs

Specifying document IDs

Deduplicating documents

Other document identifiers

Was this helpful?

Give Feedback

Document IDs

_id field

Default document IDs

Specifying document IDs

Deduplicating documents

Other document identifiers

Was this helpful?

`_id` field