Data validation with ORMs#

LaminDB implements data validation at the ORM level by fully integrating the SQLModel ORM with pydantic type checking.

Let’s take a look at data validation behavior in LaminDB.

import lamindb as ln
import lamindb.schema as lns
import pytest
from pydantic import ValidationError
✅ Loaded instance: testuser1/testdb

Missing required field#

Let’s create a User instance without the required email and handle fields.

with pytest.raises(ValidationError) as e:
    exception = e
    user_missing = lns.User(id="123")
print(exception.exconly())
pydantic.error_wrappers.ValidationError: 2 validation errors for User
email
  field required (type=value_error.missing)
handle
  field required (type=value_error.missing)

Field type error#

Let’s create a Transform instance with the wrong type for the optional field name.

from datetime import datetime

with pytest.raises(ValidationError) as e:
    exception = e
    invalid_transform = ln.Transform(name=datetime.now())
print(exception.exconly())
pydantic.error_wrappers.ValidationError: 1 validation error for Transform
name
  str type expected (type=type_error.str)

Invalid categorical#

Let’s pass an invalid categorical to the type field in Usage, which only accepts the values ‘ingest’, ‘insert’, ‘select’, ‘update’, ‘delete’, ‘load’, and ‘link’.

from lnschema_core._core import SQLModel
from lnschema_core._types import Usage as UsageType
from sqlmodel import Field
class Usage(SQLModel, table=True):  # type: ignore
    id: str = Field(default=None, primary_key=True)
    type: UsageType = Field(nullable=False, index=True)
assert Usage(type="ingest")
with pytest.raises(ValidationError) as e:
    exception = e
    invalid_usage = Usage(type="invalid categorical")
print(exception.exconly())
pydantic.error_wrappers.ValidationError: 1 validation error for Usage
type
  value is not a valid enumeration member; permitted: 'ingest', 'insert', 'select', 'update', 'delete', 'load', 'link' (type=type_error.enum; enum_values=[<Usage.ingest: 'ingest'>, <Usage.insert: 'insert'>, <Usage.select: 'select'>, <Usage.update: 'update'>, <Usage.delete: 'delete'>, <Usage.load: 'load'>, <Usage.link: 'link'>])

Special cases#

Data validation with the LaminDB ORM mirrors the standard Pydantic behavior, including variable casting (see example #1 below) and extra field behaviors (see example #2 below). These can be changed through Pydantic’s configuration.

The only difference in behavior between LaminDB and Pydantic is strict type checking for Relationship fields (see example #3 below), which is implemented in LaminDB.

Argument casting#

LaminDB mirrors Pydantic’s default behavior of casting input variables to conform to field types (see details in Pydantic’s documentation).

Let’s take a look at the default behavior by creating transform instances with int and bool inputs to the name field, which is string-typed in the schema.

# Name (int) is cast to str
transform_name_int_to_str = ln.Transform(name=1)
type(transform_name_int_to_str.name)
str
# Name (bool) is cast to str
transform_name_bool_to_str = ln.Transform(name=True)
type(transform_name_int_to_str.name)
str

Extra fields#

LaminDB also mirror’s Pydantic default behavior of accepting extra fields not defined in the schema.

# No error is raised for the extra field
transform = ln.Transform(name="Test", extra_field="This field is not defined in the schema")

Strict type checking for relationships#

Differently from Pydantic, LaminDB enforces strict type checking for Relationship fields.

Below is a simple example of Pydantic’s lenient type checking for Relationship fields. Rather than enforcing the Car type in the Wheel.car field, it only enforces type-checking on the attributes of the input object.

from sqlmodel import SQLModel, Field, Relationship
from typing import Optional, List


class Car(SQLModel, table=False):
    id: str = Field(primary_key=True, default=None)
    name: str

    wheels: List["Wheel"] = Relationship()


class Wheel(SQLModel, table=False):
    id: str = Field(primary_key=True, default=None)
    name: str

    car: Optional["Car"] = Relationship()


class Bird(SQLModel, table=False):
    id: str = Field(primary_key=True, default=None)
    name: str
# Pydantic does not raise a validation error for wrong type in the car field
wheel = Wheel(name="Test Wheel", car=Bird(name="Test"))

LaminDB, on the other hand, enforces strict type checking for Relationships.

with pytest.raises(TypeError) as e:
    exception = e
    run = lns.Run(name="Test Run", transform=Bird(name="This is not a Transform"))
print(exception.exconly())
TypeError: transform needs to be of type Transform