stac-pydantic Pydantic models for STAC Catalogs, Collections, Items, and the STAC API spec. Initially developed by arturo-ai. The main purpose of this library is to provide reusable request/response models for tools such as fastapi. For more comprehensive schema validation and robust extension support, use pystac. Installation ```shell python -m pip install stac-pydantic or python -m pip install stac-pydantic["validation"] ``` For local development: shell python -m pip install -e '.[dev,lint]' | stac-pydantic | STAC Version | STAC API Version | Pydantic Version | |--------------|---------------|------------------|-----------------| | 1.2.x | 1.0.0-beta.1 | <1 | ^1.6 | | 1.3.x | 1.0.0-beta.2 | <1 | ^1.6 | | 2.0.x | 1.0.0 | <1* | ^1.6 | | 3.0.x | 1.0.0 | 1.0.0 | ^2.4 | | 3.1.x | 1.0.0 | 1.0.0 | ^2.4 | * various beta releases, specs not fully implemented Development Install the pre-commit hooks: shell pre-commit install Testing Ensure you have all Python versions installed that the tests will be run against. If using pyenv, run: shell pyenv install 3.8.18 pyenv install 3.9.18 pyenv install 3.10.13 pyenv install 3.11.5 pyenv local 3.8.18 3.9.18 3.10.13 3.11.5 Run the entire test suite: shell tox Run a single test case using the standard pytest convention: shell python -m pytest -v tests/test_models.py::test_item_extensions Usage Loading Models Load data into models with standard pydantic: ```python from stac_pydantic import Catalog stac_catalog = { "type": "Catalog", "stac_version": "1.0.0", "id": "sample", "description": "This is a very basic sample catalog.", "links": [ { "href": "item.json", "rel": "item" } ] } catalog = Catalog(**stac_catalog) assert catalog.id == "sample" assert catalog.links[0].href == "item.json" ``` Extensions STAC defines many extensions which let the user customize the data in their catalog. stac-pydantic.extensions.validate_extensions gets the JSON schemas from the URLs provided in the stac_extensions property (caching the last fetched ones), and will validate a dict, Item, Collection or Catalog against those fetched schemas: ```python from stac_pydantic import Item from stac_pydantic.extensions import validate_extensions stac_item = { "id": "12345", "type": "Feature", "stac_extensions": [ "https://stac-extensions.github.io/eo/v1.0.0/schema.json" ], "geometry": { "type": "Point", "coordinates": [0, 0] }, "bbox": [0.0, 0.0, 0.0, 0.0], "properties": { "datetime": "2020-03-09T14:53:23.262208+00:00", "eo:cloud_cover": 25, }, "links": [], "assets": {}, } model = Item(**stac_item) validate_extensions(model, reraise_exception=True) assert getattr(model.properties, "eo:cloud_cover") == 25 ``` The complete list of current STAC Extensions can be found here. Vendor Extensions The same procedure described above works for any STAC Extension schema as long as it can be loaded from a public url. STAC API The STAC API Specs extent the core STAC specification for implementing dynamic catalogs. STAC Objects used in an API context should always import models from the api subpackage. This package extends Catalog, Collection, and Item models with additional fields and validation rules and introduces Collections and ItemCollections models and Pagination/ Search Links. It also implements models for defining ItemSeach queries. ```python from stac_pydantic.api import Item, ItemCollection stac_item = Item(**{ "id": "12345", "type": "Feature", "stac_extensions": [], "geometry": { "type": "Point", "coordinates": [0, 0] }, "bbox": [0.0, 0.0, 0.0, 0.0], "properties": { "datetime": "2020-03-09T14:53:23.262208+00:00", }, "collection": "CS3", "links": [ { "rel": "self", "href": "http://stac.example.com/catalog/collections/CS3-20160503_132130_04/items/CS3-20160503_132130_04.json" }, { "rel": "collection", "href": "http://stac.example.com/catalog/CS3-20160503_132130_04/catalog.json" }, { "rel": "root", "href": "http://stac.example.com/catalog" }], "assets": {}, }) stac_item_collection = ItemCollection(**{ "type": "FeatureCollection", "features": [stac_item], "links": [ { "rel": "self", "href": "http://stac.example.com/catalog/search?collection=CS3", "type": "application/geo+json" }, { "rel": "root", "href": "http://stac.example.com/catalog", "type": "application/json" }], }) ``` Exporting Models Most STAC extensions are namespaced with a colon (ex eo:gsd) to keep them distinct from other extensions. Because Python doesn't support the use of colons in variable names, we use Pydantic aliasing to add the namespace upon model export. This requires exporting the model with the by_alias = True parameter. Export methods (model_dump() and model_dump_json()) for models in this library have by_alias and exclude_unset st to True by default: python item_dict = item.model_dump() assert item_dict['properties']['landsat:row'] == item.properties.row == 250 CLI ```text Usage: stac-pydantic [OPTIONS] COMMAND [ARGS]... stac-pydantic cli group Options: --help Show this message and exit. Commands: validate-item Validate STAC Item ```
Latest version: 3.2.0 Released: 2025-03-20
pydantic-bind Table of Contents Overview Getting Started Why Not Protobufs ? No Copy Supported Types Inheritance Msgpack Namespaces Generated Code Other Languages Overview Python is the language of choice for finance, data science etc. Python calling C++ (and increasingly, Rust) is a common pattern, leveraging packages such as pybind11 . A common problem is a how best to represent data to be shared between python and C++ code. One would like idiomatic representations in each language and this may be necessary to fully utilise certain python packages. E.g., FastAPI is a popular way to create REST services, using Open API definitions derived from pydantic classes. Therefore, a data model authored using pydantic classes, or native python dataclasses, from which sensible C++ structs and appropriate marshalling can automatically be generated, is desirable. This package provides such tools: a cmake rule allows you to generate C++ structs (with msgpack serialisation) and corresponding pybind11 bindings. Python functions allow you to naviagte between the C++ pybind11 objects and the native python objects. There is also an option for all python operations to be directed to an owned pybind11 object (see No Copy). Note that the typcal python developer experience is now somewhat changed, in that it's necessary to build/install the project. I personally use JetBrains CLion, in place of PyCharm for such projects. For an example of the kind of behaviour-less object model this package is intended to help, please see (the rather nascent) fin-data-model Getting Started pydantic_bind adds a custom cmake rule: pydantic_bind_add_package() This rule will do the following: - scan for sub-packages - scan each sub-package for all .py files - add custom steps for generating .cpp/.h files from any of the following, encounted in the .py files: - dataclasses - classes derived from pydantic's BaseModel - enums C++ directory and namespace structure will match the python package structure (see Namespaces). You can create an instance of the pybind11 class from your original using get_pybind_instance(), e.g., my_class.py: from dataclasses import dataclass @dataclass clas MyClass: my_int: int my_string: str | None CMakeLists.txt: cmake_minimum_required(VERSION 3.9) project(my_project) set(CMAKE_CXX_STANDARD 20) find_package(python3 REQUIRED COMPONENTS Interpreter Development) find_package(pydantic_bind REQUIRED COMPONENTS HINTS "${python3_SITELIB}") pydantic_bind_add_package(my_package) my_util.py from pydantic_bind import get_pybind_value from my_package.my_class imnport MyClass orig = MyClass(my_int=123, my_string="hello") generated = get_pybind_value(orig) print(f"my_int: {orig.my_int}, {generated.my_int}") Why Not Protobufs? I personally find protobufs to be a PITA to use: they have poor to no variant support, the generated code is ugly and idiosyncratic, they're large and painful to copy around etc. AVRO is more friendly but generates python classes dynamically, which confuses IDEs like Pycharm. I do think a good solution is something like pydantic_avro where one can define the classes using pydantic, generate the AVRO schema and then the generateed C++ etc. I might well try and converge this project with that approach. I was inspired to some degree by this blog. No Copy One annoyance of multi-language representations of data objects is that you often end up copying data around where you'd prefer to share a single copy. This is the raison d'etre for Protobufs and its ilk. In this project I've created implementations of BaseModel and dataclass which allow python to use the underlying C++ data representation, rather than holding its own copy. Deriving from this BaseModel will give you equivalent functionality of as pydantic's BaseModel. The annotations are re-written using computed_field, with property getters and setters operating on the generated pybind class, which is instantiated behind the scenes in __init__. Note that this will make some operations (especially those that access dict) less efficient. I've also plumbed the computed fields into the JSON schema, so these objects can be used with FastAPI. dataclass works similarly, adding properties to the dataclass, so that the exisitng get and set functionality works seamless in accessing the generated pybind11 class (also set via a shimmed __init__). Using regular dataclass or BaseModel as members of classes defined with the pydantic_bind versions is very inefficient and not recommended. Supported Types The following python -> C++ mappings are supported (there are likely others I should consider): bool --> bool float --> double int --> int str --> std::string datetime.date --> std::chrono::system_clock::time_point datetime.datetime --> std::chrono::system_clock::time_point datetime.time --> std::chrono::system_clock::time_point datetime.timedelta --> std::chrono::duration pydantic.BaseModel --> struct pydantic_bind.BaseModel --> struct dataclass --> struct pydantic_bind.dataclass --> struct Enum --> enum class Inheritance I have tested single inheritance (see Generated Code). Multiple inheritance may work ... or it may not. I'd generally advise against using it for data classes. Msgpack A rather rudimentary msgpack implementation is added to the generated C++ structs, using a slightly modified version of cpppack. It wasn't clear to me whether this package is maintained or accepting submissions, so I copied and slightly modified msgpack.h (also, I couldn't work out how to add to my project with my rather rudimentary cmake skillz!) Changes include: Fixing includes Support for std::optional Support for std::variant Support for enums A likely future enhancement will be to use cereal and add a mgspack adaptor. The no-copy python objects add to_msg_pack() and from_msg_pack() (the latter being a class method), to access this functionality. Namespaces Directory structure and namespaces in the generated C++ match the python package and module names. cmake requires unique target names and pybind11 requires that the filename (minus the OS-speicific qualifiers) matches the module name. Generated Code Code is generated into a directory structure underneath /generated. Headers are installed to /include. Compiled pybind11 modules are installed into /__pybind__. For C++ usage, you need only the headers, the compiled code is for pybind/python usage only. For the example below, common_object_model/common_object_model/v1/common/__pybind__/foo.cpython-311-darwin.so will be installed (obviously with corresponding qualifiers for Linux/Windows). get_pybind_value() searches this directory. Imports/includes should work seamlessly (the python import scheme will be copied). I have tested this but not completely rigorously. common_object_model/common_object_model/v1/common/foo.py: from dataclasses import dataclass import datetime as dt from enum import Enum, auto from typing import Union from pydantic_bind import BaseModel class Weekday(Enum): MONDAY = auto() TUESDAY = auto() WEDNESDAY = auto() THURSDAY = auto() FRIDAY = auto() SATURDAY = auto() SUNDAY = auto() @dataclass class DCFoo: my_int: int my_string: str | None class Foo(BaseModel): my_bool: bool = True my_day: Weekday = Weekday.SUNDAY class Bar(Foo): my_int: int = 123 my_string: str my_optional_string: str | None = None class Baz(BaseModel): my_variant: Union[str, float] = 123. my_date: dt.date my_foo: Foo my_dc_foo: DCFoo will generate the following files: common_object_model/generated/common_object_model/v1/common/foo.h: #ifndef COMMON_OBJECT_MODEL_FOO_H #define COMMON_OBJECT_MODEL_FOO_H #include #include #include #include #include namespace common_object_model::v1::common { enum Weekday { MONDAY = 1, TUESDAY = 2, WEDNESDAY = 3, THURSDAY = 4, FRIDAY = 5, SATURDAY = 6, SUNDAY = 7 }; struct DCFoo { DCFoo() : my_string(), my_int() { } DCFoo(std::optional my_string, int my_int) : my_string(my_string), my_int(my_int) { } std::optional my_string; int my_int; MSGPACK_DEFINE(my_string, my_int); }; struct Foo { Foo(bool my_bool=true, Weekday my_day=SUNDAY) : my_bool(my_bool), my_day(my_day) { } bool my_bool; Weekday my_day; MSGPACK_DEFINE(my_bool, my_day); }; struct Bar : public Foo { Bar() : Foo(), my_string(), my_int(123), my_optional_string(std::nullopt) { } Bar(std::string my_string, bool my_bool=true, Weekday my_day=SUNDAY, int my_int=123, std::optional my_optional_string=std::nullopt) : Foo(my_bool, my_day), my_string(std::move(my_string)), my_int(my_int), my_optional_string(my_optional_string) { } std::string my_string; int my_int; std::optional my_optional_string; MSGPACK_DEFINE(my_string, my_bool, my_day, my_int, my_optional_string); }; struct Baz { Baz() : my_dc_foo(), my_foo(), my_date(), my_variant(123.0) { } Baz(DCFoo my_dc_foo, Foo my_foo, std::chrono::system_clock::time_point my_date, std::variant my_variant=123.0) : my_dc_foo(std::move(my_dc_foo)), my_foo(std::move(my_foo)), my_date(my_date), my_variant(my_variant) { } DCFoo my_dc_foo; Foo my_foo; std::chrono::system_clock::time_point my_date; std::variant my_variant; MSGPACK_DEFINE(my_dc_foo, my_foo, my_date, my_variant); }; } // common_object_model #endif // COMMON_OBJECT_MODEL_FOO_H common_object_model/generated/common_object_model/v1/common/foo.cpp: #include #include #include #include "foo.h" namespace py = pybind11; using namespace common_object_model::v1::common; PYBIND11_MODULE(common_object_model_v1_common_foo, m) { py::enum_(m, "Weekday").value("MONDAY", Weekday::MONDAY) .value("TUESDAY", Weekday::TUESDAY) .value("WEDNESDAY", Weekday::WEDNESDAY) .value("THURSDAY", Weekday::THURSDAY) .value("FRIDAY", Weekday::FRIDAY) .value("SATURDAY", Weekday::SATURDAY) .value("SUNDAY", Weekday::SUNDAY); py::class_(m, "DCFoo") .def(py::init<>()) .def(py::init, int>(), py::arg("my_string"), py::arg("my_int")) .def("to_msg_pack", &DCFoo::to_msg_pack) .def_static("from_msg_pack", &DCFoo::from_msg_pack) .def_readwrite("my_string", &DCFoo::my_string) .def_readwrite("my_int", &DCFoo::my_int); py::class_(m, "Foo") .def(py::init(), py::arg("my_bool")=true, py::arg("my_day")=SUNDAY) .def("to_msg_pack", &Foo::to_msg_pack) .def_static("from_msg_pack", &Foo::from_msg_pack) .def_readwrite("my_bool", &Foo::my_bool) .def_readwrite("my_day", &Foo::my_day); py::class_(m, "Bar") .def(py::init<>()) .def(py::init>(), py::arg("my_string"), py::arg("my_bool")=true, py::arg("my_day")=SUNDAY, py::arg("my_int")=123, py::arg("my_optional_string")=std::nullopt) .def("to_msg_pack", &Bazr:to_msg_pack) .def_static("from_msg_pack", &Bar::from_msg_pack) .def_readwrite("my_string", &Bar::my_string) .def_readwrite("my_int", &Bar::my_int) .def_readwrite("my_optional_string", &Bar::my_optional_string); py::class_(m, "Baz") .def(py::init<>()) .def(py::init>(), py::arg("my_dc_foo"), py::arg("my_foo"), py::arg("my_date"), py::arg("my_variant")=123.0) .def("to_msg_pack", &Baz::to_msg_pack) .def_static("from_msg_pack", &Baz::from_msg_pack) .def_readwrite("my_dc_foo", &Baz::my_dc_foo) .def_readwrite("my_foo", &Baz::my_foo) .def_readwrite("my_date", &Baz::my_date) .def_readwrite("my_variant", &Baz::my_variant); } Other languages When time allows, I will look at adding support for Rust. There is limited value in generating Java or C# classes; calling those VM-based lanagues in-process from python has never worked well, in my experience.
pypi package. Binary
Latest version: 1.0.5 Released: 2023-10-27
pydantic-zarr Pydantic models for Zarr. ⚠️ Disclaimer ⚠️ This project is under a lot of flux -- I want to add zarr version 3 support to this project, but the reference python implementation doesn't support version 3 yet. Also, the key ideas in this repo may change in the process of being formalized over in this specification (currently just a draft). As the ecosystem evolves I will be breaking things (and versioning the project accordingly), so be advised! Installation pip install -U pydantic-zarr Help See the documentation for detailed information about this project. Example ```python import zarr from pydantic_zarr import GroupSpec group = zarr.group(path='foo') array = zarr.create(store = group.store, path='foo/bar', shape=10, dtype='uint8') array.attrs.put({'metadata': 'hello'}) this is a pydantic model spec = GroupSpec.from_zarr(group) print(spec.model_dump()) """ { 'zarr_version': 2, 'attributes': {}, 'members': { 'bar': { 'zarr_version': 2, 'attributes': {'metadata': 'hello'}, 'shape': (10,), 'chunks': (10,), 'dtype': '|u1', 'fill_value': 0, 'order': 'C', 'filters': None, 'dimension_separator': '.', 'compressor': { 'id': 'blosc', 'cname': 'lz4', 'clevel': 5, 'shuffle': 1, 'blocksize': 0, }, } }, } """ ```
pypi package. Binary
Latest version: 0.7.0 Released: 2024-03-20
asdf-pydantic Create ASDF tags with pydantic models. ```py from asdf_pydantic import AsdfPydanticModel class Rectangle(AsdfPydanticModel): _tag = "asdf://asdf-pydantic/examples/tags/rectangle-1.0.0" width: float height: float After creating extension and install ... af = asdf.AsdfFile() af["rect"] = Rectangle(width=1, height=1) ``` ```yaml ASDF 1.0.0 ASDF_STANDARD 1.5.0 %YAML 1.1 %TAG ! tag:stsci.edu:asdf/ --- !core/asdf-1.1.0 asdf_library: !core/software-1.0.0 { author: The ASDF Developers, homepage: 'http://github.com/asdf-format/asdf', name: asdf, version: 2.14.3} history: extensions: - !core/extension_metadata-1.0.0 extension_class: asdf.extension.BuiltinExtension software: !core/software-1.0.0 { name: asdf, version: 2.14.3} - !core/extension_metadata-1.0.0 { extension_class: mypackage.shapes.ShapesExtension, extension_uri: 'asdf://asdf-pydantic/shapes/extensions/shapes-1.0.0'} rect: ! { height: 1.0, width: 1.0} ... ``` Features [x] Create ASDF tag from your pydantic models with batteries (converters) included. [x] Validates data models as you create them and not only when reading and writing ASDF files. [x] Preserve Python types when deserializing ASDF files. [x] All the cool things that comes with pydantic (e.g., JSON encoder, JSON schema, Pydantic types). [ ] Comes with ASDF schemas (TBD). Installation console pip install asdf-pydantic Usage Define your data model with AsdfPydanticModel. For pydantic fans, this has all the features of pydantic's BaseModel. ```py mypackage/shapes.py from asdf_pydantic import AsdfPydanticModel class Rectangle(AsdfPydanticModel): _tag = "asdf://asdf-pydantic/examples/tags/rectangle-1.0.0" width: float height: float ``` Then create an extension with the converter included with asdf-pydantic: ```py mypackage/extensions.py from asdf.extension import Extension from asdf_pydantic.converter import AsdfPydanticConverter from mypackage.shapes import Rectangle AsdfPydanticConverter.add_models(Rectangle) class ShapesExtension(Extension): extension_uri = "asdf://asdf-pydantic/examples/extensions/shapes-1.0.0" converters = [AsdfPydanticConverter()] tags = [*AsdfPydanticConverter().tags] ``` Install the extension either by entry point specification or add it to asdf.get_config(): ```py import asdf from mypackage.extensions import ShapeExtension asdf.get_config().add_extension(ShapesExtension()) af = asdf.AsdfFile() af["rect"] = Rectangle(width=1, height=1) with open("shapes.asdf", "wb") as fp: af.write_to(fp) ``` {toctree} :maxdepth: 1 model autoapi
Latest version: 1.1.0 Released: 2024-11-18
Project Name OCSF-Pydantic Description Pydantic validated models for OCSF schema events & objects Installation pip install ocsf-pydantic Usage all event & object models are available via ocsf ``` from ocsf.events.application.api import Api apievent: dict = ... # an OCSF normalized dict APIActivity(**apievent) ``` License This project is licensed under the MIT License.
Latest version: 0.0.6 Released: 2024-11-11
pydantic-mini pydantic-mini is a lightweight Python library that extends the functionality of Python's native dataclass by providing built-in validation, serialization, and support for custom validators. It is designed to be simple, minimalistic, and based entirely on Python’s standard library, making it perfect for projects, data validation, and object-relational mapping (ORM) without relying on third-party dependencies. Features Type and Value Validation: Enforces type validation for fields using field annotations. Includes built-in validators for common field types. Custom Validators: Easily define your own custom validation functions for specific fields. Serialization Support: Instances can be serialized to JSON, dictionaries, and CSV formats. Lightweight and Fast: Built entirely on Python’s standard library, no external dependencies are required. Supports Multiple Input Formats: Accepts data in various formats, including JSON, dictionaries, CSV, etc. Simple ORM Capabilities: Use the library to build lightweight ORMs (Object-Relational Mappers) for basic data management. Installation You can install pydantic-mini from PyPI once it's available: bash pip install pydantic-mini Alternatively, you can clone this repository and use the code directly in your project. Usage 1. Define a Dataclass with Validation ```python from pydantic_mini import BaseModel class Person(BaseModel): name: str age: int ``` 2. Adding Validators For Individual Fields You can define your own validators. ```python import typing from pydantic_mini import BaseModel, MiniAnnotated, Attrib from pydantic_mini.exceptions import ValidationError NOTE: All validators can be used for field values transformation by returning the transformed value from the validator function or method. NOTE: Validators must raise ValidationError if validation condition fails. NOTE: pydantic_mini use type annotation to enforce type constraints. Custom validation for not accepting name kofi def kofi_not_accepted(instance, value: str): if value == "kofi": # validators must raise ValidationError when validation fails. raise ValidationError("Age must be a positive number.") # If you want to apply a transformation and save the result into the model, # return the transformed result you want to save. For instance, if you want the names to be capitalized, # return the capitalized version. return value.upper() class Employee(BaseModel): name: MiniAnnotated[str, Attrib(max_length=20, validators=[kofi_not_accepted])] age: MiniAnnotated[int, Attrib(default=40, gt=20)] email: MiniAnnotated[str, Attrib(pattern=r"^\S+@\S+.\S+$")] school: str # You can define validators by adding a method with the name # "validate_" e.g to validate school name def validate_school(self, value, field): if len(value) > 20: raise ValidationError("School names cannot be greater than 20") # You can apply a general rule or transformation to all fields by implementing # the method "validate". it takes the argument value and field def validate(self, value, field): if len(value) > 10: raise ValidationError("Too long") implement model model_init class C(BaseModel): i: int j: typing.Optional[int] database: InitVar[typing.Optional[DatabaseType]] = None def model_init(self, database): if self.j is None and database is not None: self.j = database.lookup('j') ``` NOTE: All validators can applied transformations to a field when they return the transformed value. 3. Creating Instances from Different Formats From JSON: ```python import json from pydantic_mini import Basemodel class PersonModel(BaseModel): name: str age: int data = '{"name": "John", "age": 30}' person = PersonModel.loads(data, _format="json") print(person) ``` From Dictionary: python data = {"name": "Alice", "age": 25} person = PersonModel.loads(data, _format="dict") print(person) From CSV: python csv_data = "name,age\nJohn,30\nAlice,25" people = PersonModel.loads(csv_data, _format="csv") for person in people: print(person) 4. Serialization pydantic-mini supports serializing instances to JSON, dictionaries, or CSV formats. ```python Serialize to JSON json_data = person.dump(_format="json") print(json_data) Serialize to a dictionary person_dict = person.dump(_format="dict") print(person_dict) ``` 5. Simple ORM Use Case You can use this library to create simple ORMs for in-memory databases. ```python Example: Create a simple in-memory ORM for a list of "Person" instances people_db = [] Add a new person to the database new_person = Person(name="John", age=30) people_db.append(new_person) Query the database (e.g., filter by age) adults = [p for p in people_db if p.age >= 18] print(adults) ``` Supported Formats JSON: Convert data to and from JSON format easily. Dict: Instantiating and serializing data as dictionaries. CSV: Read from and write to CSV format directly. Contributing Contributions are welcome! If you'd like to help improve the library, please fork the repository and submit a pull request. License pydantic-mini is open-source and available under the GPL License.
Latest version: 1.0.4 Released: 2025-03-17
pydantic-marc pydantic-marc is a library for validating data against the MARC21 Format for Bibliographic Data. Installation Use pip: $ pip install pydantic-marc Features pydantic-marc uses pydantic, the popular data validation library, to define the valid components of a MARC record. The package expects users will employ pymarc to read MARC records from binary files. Basic usage: Validating a MARC record: ```python from pymarc import MARCReader from rich import print from pydantic_marc import MarcRecord with open("temp/valid.mrc", "rb") as fh: reader = MARCReader(fh) for record in reader: print(record) model = MarcRecord.model_validate(record, from_attributes=True) print(model.model_dump()) json { "leader": "00536nam a22001985i 4500", "fields": [ {"001": "123456789"}, {"008": "201201s2020 nyua 000 1 eng d"}, {"035": {"ind1": " ", "ind2": " ", "subfields": [{"a": "(OCoLC)1234567890"}]}}, {"049": {"ind1": " ", "ind2": " ", "subfields": [{"a": "NYPP"}]}}, { "245": { "ind1": "0", "ind2": "0", "subfields": [ {"a": "Fake :"}, {"b": "Marc Record"}, ] } }, { "264": { "ind1": " ", "ind2": "1", "subfields": [ {"a": "New York :"}, {"b": "NY,"}, {"c": "[2020]"} ] } }, { "300": { "ind1": " ", "ind2": " ", "subfields": [{"a": "100 pages :"}, {"b": "color illustrations;"}, {"c": "30 cm"}] } }, {"336": {"ind1": " ", "ind2": " ", "subfields": [{"a": "text"}, {"b": "txt"}, {"2": "rdacontent"}]}}, {"337": {"ind1": " ", "ind2": " ", "subfields": [{"a": "unmediated"}, {"b": "n"}, {"2": "rdamedia"}]}}, {"338": {"ind1": " ", "ind2": " ", "subfields": [{"a": "volume"}, {"b": "nc"}, {"2": "rdacarrier"}]}} ] } ``` If the record is invalid the errors can be returned as json, a dictionary, or in a human-readable format. JSON Error Message: ```python from pydantic import ValidationError from pymarc import MARCReader from pydantic_marc import MarcRecord with open("temp/invalid.mrc", "rb") as fh: reader = MARCReader(fh) for record in reader: print(record) try: MarcRecord.model_validate(record) except ValidationError as e: # errors as a dictionary print(e.errors()) # errors as json print(e.json()) json [ { "type": "non_repeatable_field", "loc": ("fields", "001"), "msg": "001: Has been marked as a non-repeating field.", "input": "001", "ctx": {"input": "001"} }, { "type": "missing_required_field", "loc": ("fields", "245"), "msg": "One 245 field must be present in a MARC21 record.", "input": "245", "ctx": {"input": "245"} }, { "type": "multiple_1xx_fields", "loc": ("fields", "100", "110"), "msg": "1XX: Only one 1XX tag is allowed. Record contains: ['100', '110']", "input": ["100", "110"], "ctx": {"input": ["100", "110"]} }, { "type": "control_field_length_invalid", "loc": ("fields", "006"), "msg": "006: Length appears to be invalid. Reported length is: 6. Expected length is: 18", "input": "p |", "ctx": {"tag": "006", "valid": 18, "input": "p |", "length": 6} }, { "type": "invalid_indicator", "loc": ("fields", "035", "ind1"), "msg": "035 ind1: Invalid data (0). Indicator should be ['', ' '].", "input": "0", "ctx": {"loc": ("035", "ind1"), "input": "0", "valid": ["", " "], "tag": "035", "ind": "ind1"} }, { "type": "non_repeatable_subfield", "loc": ("fields", "600", "a"), "msg": "600 $a: Subfield cannot repeat.", "input": [PydanticSubfield(code="a", value="Foo"), PydanticSubfield(code="a", value="Foo,")], "ctx": { "loc": ("600", "a"), "input": [PydanticSubfield(code="a", value="Foo"), PydanticSubfield(code="a", value="Foo,")], "tag": "600", "code": "a" } } ] Human-readable Error Message:python from pydantic import ValidationError from pymarc import MARCReader from pydantic_marc import MarcRecord with open("temp/invalid.mrc", "rb") as fh: reader = MARCReader(fh) for record in reader: print(record) try: MarcRecord.model_validate(record) except ValidationError as e: # errors in a human-readable format print(e.errors()) text 6 validation errors for MarcRecord fields.001 001: Has been marked as a non-repeating field. [type=non_repeatable_field, input_value='001', input_type=str] fields.245 One 245 field must be present in a MARC21 record. [type=missing_required_field, input_value='245', input_type=str] fields.100.110 1XX: Only one 1XX tag is allowed. Record contains: ['100', '110'] [type=multiple_1xx_fields, input_value=['100', '110'], input_type=list] fields.006 006: Length appears to be invalid. Reported length is: 6. Expected length is: 18 [type=control_field_length_invalid, input_value='p |', input_type=str] fields.035.ind1 035 ind1: Invalid data (0). Indicator should be ['', ' ']. [type=invalid_indicator, input_value='0', input_type=str] fields.600.a 600 $a: Subfield cannot repeat. [type=non_repeatable_subfield, input_value=[PydanticSubfield(code='a...code='a', value='Foo,')], input_type=list] ```
pypi package. Binary
Latest version: 0.1.0 Released: 2025-03-09
pydantic-pkgr 📦 apt brew pip npm ₊₊₊Simple Pydantic interfaces for package managers + installed binaries. It's an ORM for your package managers, providing a nice python types for packages + installers. This is a Python library for installing & managing packages locally with a variety of package managers. It's designed for when pip dependencies aren't enough, and your app has to check for & install dependencies at runtime. shell pip install pydantic-pkgr ✨ Built with pydantic v2 for strong static typing guarantees and json import/export compatibility 📦 Provides consistent cross-platform interfaces for dependency resolution & installation at runtime 🌈 Integrates with django >= 4.0, django-ninja, and OpenAPI + django-jsonform out-of-the-box 🦄 Uses pyinfra / ansible for the actual install operations whenever possible (with internal fallbacks) Built by ArchiveBox to install & auto-update our extractor dependencies at runtime (chrome, wget, curl, etc.) on macOS/Linux/Docker. [!WARNING] This is BETA software, the API is mostly stable but there may be minor changes later on. Source Code: https://github.com/ArchiveBox/pydantic-pkgr/ Documentation: https://github.com/ArchiveBox/pydantic-pkgr/blob/main/README.md ```python from pydantic_pkgr import * apt, brew, pip, npm, env = AptProvider(), BrewProvider(), PipProvider(), NpmProvider(), EnvProvider() dependencies = [ Binary(name='curl', binproviders=[env, apt, brew]), Binary(name='wget', binproviders=[env, apt, brew]), Binary(name='yt-dlp', binproviders=[env, pip, apt, brew]), Binary(name='playwright', binproviders=[env, pip, npm]), Binary(name='puppeteer', binproviders=[env, npm]), ] for binary in dependencies: binary = binary.load_or_install() print(binary.abspath, binary.version, binary.binprovider, binary.is_valid, binary.sha256) # Path('/usr/bin/curl') SemVer('7.81.0') AptProvider() True abc134... binary.exec(cmd=['--version']) # curl 7.81.0 (x86_64-apple-darwin23.0) libcurl/7.81.0 ... ``` ```python from pydantic import InstanceOf from pydantic_pkgr import Binary, BinProvider, BrewProvider, EnvProvider you can also define binaries as classes, making them usable for type checking class CurlBinary(Binary): name: str = 'curl' binproviders: List[InstanceOf[BinProvider]] = [BrewProvider(), EnvProvider()] curl = CurlBinary().install() assert isinstance(curl, CurlBinary) # CurlBinary is a unique type you can use in annotations now print(curl.abspath, curl.version, curl.binprovider, curl.is_valid) # Path('/opt/homebrew/bin/curl') SemVer('8.4.0') BrewProvider() True curl.exec(cmd=['--version']) # curl 8.4.0 (x86_64-apple-darwin23.0) libcurl/8.4.0 ... ``` ```python from pydantic_pkgr import Binary, EnvProvider, PipProvider We also provide direct package manager (aka BinProvider) APIs apt = AptProvider() apt.install('wget') print(apt.PATH, apt.get_abspaths('wget'), apt.get_version('wget')) even if packages are installed by tools we don't control (e.g. pyinfra/ansible/puppet/etc.) from pyinfra.operations import apt apt.packages(name="Install ffmpeg", packages=['ffmpeg'], _sudo=True) our Binary API provides a nice type-checkable, validated, serializable handle ffmpeg = Binary(name='ffmpeg').load() print(ffmpeg) # name=ffmpeg abspath=/usr/bin/ffmpeg version=3.3.0 is_valid=True ... print(ffmpeg.loaded_abspaths) # show all the ffmpeg binaries found in $PATH (in case theres more than one available) print(ffmpeg.model_dump_json()) # ... everything can also be dumped/loaded as json print(ffmpeg.model_json_schema()) # ... all types provide OpenAPI-ready JSON schemas ``` Supported Package Managers So far it supports installing/finding installed/~~updating/removing~~ packages on Linux/macOS with: apt (Ubuntu/Debian/etc.) brew (macOS/Linux) pip (Linux/macOS/Windows) npm (Linux/macOS/Windows) env (looks for existing version of binary in user's $PATH at runtime) vendor (you can bundle vendored copies of packages you depend on within your source) Planned: docker, cargo, nix, apk, go get, gem, pkg, and more using ansible/pyinfra... Usage bash pip install pydantic-pkgr BinProvider Implementations: EnvProvider, AptProvider, BrewProvider, PipProvider, NpmProvider This type represents a "provider of binaries", e.g. a package manager like apt/pip/npm, or env (which finds binaries in your $PATH). BinProviders implement the following interface: * .INSTALLER_BIN -> /opt/homebrew/bin/brew provider's pkg manager location * .PATH -> PATHStr('/opt/homebrew/bin:/usr/local/bin:...') where provider stores bins * get_packages(bin_name: str) -> InstallArgs(['curl', 'libcurl4', '...]) find pkg dependencies for a bin - install(bin_name: str) install a bin using binprovider to install needed packages - load(bin_name: str) find an existing installed binary - load_or_install(bin_name: str) -> Binary find existing / install if needed - get_version(bin_name: str) -> SemVer('1.0.0') get currently installed version - get_abspath(bin_name: str) -> Path('/absolute/path/to/bin') get installed bin abspath * get_abspaths(bin_name: str) -> [Path('/opt/homebrew/bin/curl'), Path('/other/paths/to/curl'), ...] get all matching bins found * get_sha256(bin_name: str) -> str get sha256 hash hexdigest of the binary ```python import platform from typing import List from pydantic_pkgr import EnvProvider, PipProvider, AptProvider, BrewProvider Example: Finding an existing install of bash using the system $PATH environment env = EnvProvider() bash = env.load(bin_name='bash') # Binary('bash', provider=env) print(bash.abspath) # Path('/opt/homebrew/bin/bash') print(bash.version) # SemVer('5.2.26') bash.exec(['-c', 'echo hi']) # hi Example: Installing curl using the apt package manager apt = AptProvider() curl = apt.install(bin_name='curl') # Binary('curl', provider=apt) print(curl.abspath) # Path('/usr/bin/curl') print(curl.version) # SemVer('8.4.0') print(curl.sha256) # 9fd780521c97365f94c90724d80a889097ae1eeb2ffce67b87869cb7e79688ec curl.exec(['--version']) # curl 7.81.0 (x86_64-pc-linux-gnu) libcurl/7.81.0 ... Example: Finding/Installing django with pip (w/ customized binpath resolution behavior) pip = PipProvider( abspath_handler={'': lambda self, bin_name, *context: inspect.getfile(bin_name)}, # use python inspect to get path instead of os.which ) django_bin = pip.load_or_install('django') # Binary('django', provider=pip) print(django_bin.abspath) # Path('/usr/lib/python3.10/site-packages/django/init.py') print(django_bin.version) # SemVer('5.0.2') ``` Binary This type represents a single binary dependency aka a package (e.g. wget, curl, ffmpeg, etc.). It can define one or more BinProviders that it supports, along with overrides to customize the behavior for each. Binarys implement the following interface: - load(), install(), load_or_install() -> Binary - binprovider: InstanceOf[BinProvider] - abspath: Path - abspaths: List[Path] - version: SemVer - sha256: str ```python from pydantic_pkgr import BinProvider, Binary, BinProviderName, BinName, ProviderLookupDict, SemVer class CustomBrewProvider(BrewProvider): name: str = 'custom_brew' def get_macos_packages(self, bin_name: str, **context) -> List[str]: extra_packages_lookup_table = json.load(Path('macos_packages.json')) return extra_packages_lookup_table.get(platform.machine(), [bin_name]) Example: Create a re-usable class defining a binary and its providers class YtdlpBinary(Binary): name: BinName = 'ytdlp' description: str = 'YT-DLP (Replacement for YouTube-DL) Media Downloader' binproviders_supported: List[BinProvider] = [EnvProvider(), PipProvider(), AptProvider(), CustomBrewProvider()] # customize installed package names for specific package managers provider_overrides: Dict[BinProviderName, ProviderLookupDict] = { 'pip': {'packages': ['yt-dlp[default,curl-cffi]']}, # can use literal values (packages -> List[str], version -> SemVer, abspath -> Path, install -> str log) 'apt': {'packages': lambda: ['yt-dlp', 'ffmpeg']}, # also accepts any pure Callable that returns a list of packages 'brew': {'packages': 'self.get_macos_packages'}, # also accepts string reference to function on self (where self is the BinProvider) } ytdlp = YtdlpBinary().load_or_install() print(ytdlp.binprovider) # BrewProvider(...) print(ytdlp.abspath) # Path('/opt/homebrew/bin/yt-dlp') print(ytdlp.abspaths) # [Path('/opt/homebrew/bin/yt-dlp'), Path('/usr/local/bin/yt-dlp')] print(ytdlp.version) # SemVer('2024.4.9') print(ytdlp.sha256) # 46c3518cfa788090c42e379971485f56d007a6ce366dafb0556134ca724d6a36 print(ytdlp.is_valid) # True ``` ```python from pydantic_pkgr import BinProvider, Binary, BinProviderName, BinName, ProviderLookupDict, SemVer Example: Create a binary that uses Podman if available, or Docker otherwise class DockerBinary(Binary): name: BinName = 'docker' binproviders_supported: List[BinProvider] = [EnvProvider(), AptProvider()] provider_overrides: Dict[BinProviderName, ProviderLookupDict] = { 'env': { # example: prefer podman if installed (falling back to docker) 'abspath': lambda: os.which('podman') or os.which('docker') or os.which('docker-ce'), }, 'apt': { # example: vary installed package name based on your CPU architecture 'packages': { 'amd64': ['docker'], 'armv7l': ['docker-ce'], 'arm64': ['docker-ce'], }.get(platform.machine(), 'docker'), }, } docker = DockerBinary().load_or_install() print(docker.binprovider) # EnvProvider() print(docker.abspath) # Path('/usr/local/bin/podman') print(docker.abspaths) # [Path('/usr/local/bin/podman'), Path('/opt/homebrew/bin/podman')] print(docker.version) # SemVer('6.0.2') print(docker.is_valid) # True You can also pass **kwargs to override properties at runtime, e.g. if you want to force the abspath to be at a specific path: custom_docker = DockerBinary(abspath='~/custom/bin/podman').load() print(custom_docker.name) # 'docker' print(custom_docker.binprovider) # EnvProvider() print(custom_docker.abspath) # Path('/Users/example/custom/bin/podman') print(custom_docker.version) # SemVer('5.0.2') print(custom_docker.is_valid) # True ``` SemVer ```python from pydantic_pkgr import SemVer Example: Use the SemVer type directly for parsing & verifying version strings SemVer.parse('Google Chrome 124.0.6367.208+beta_234. 234.234.123') # SemVer(124, 0, 6367') SemVer.parse('2024.04.05) # SemVer(2024, 4, 5) SemVer.parse('1.9+beta') # SemVer(1, 9, 0) str(SemVer(1, 9, 0)) # '1.9.0' ``` These types are all meant to be used library-style to make writing your own apps easier. e.g. you can use it to build things like: playwright install --with-deps) Django Usage The pydantic ecosystem helps us get auto-generated, type-checked Django fields & forms that support BinProvider and Binary. [!TIP] For the full Django experience, we recommend installing these 3 excellent packages: - django-admin-data-views - django-pydantic-field - django-jsonform pip install pydantic-pkgr django-admin-data-views django-pydantic-field django-jsonform Django Model Usage: Store BinProvider and Binary entries in your model fields bash pip install django-pydantic-field Fore more info see the django-pydantic-field docs... Example Django models.py showing how to store Binary and BinProvider instances in DB fields: ```python from typing import List from django.db import models from pydantic import InstanceOf from pydantic_pkgr import BinProvider, Binary, SemVer from django_pydantic_field import SchemaField class InstalledBinary(models.Model): name = models.CharField(max_length=63) binary: Binary = SchemaField() binproviders: List[InstanceOf[BinProvider]] = SchemaField(default=[]) version: SemVer = SchemaField(default=(0,0,1)) ``` And here's how to save a Binary using the example model: ```python find existing curl Binary in $PATH curl = Binary(name='curl').load() save it to the DB using our new model obj = InstalledBinary( name='curl', binary=curl, # store Binary/BinProvider/SemVer values directly in fields binproviders=[env], # no need for manual JSON serialization / schema checking min_version=SemVer('6.5.0'), ) obj.save() ``` When fetching it back from the DB, the Binary field is auto-deserialized / immediately usable: obj = InstalledBinary.objects.get(name='curl') # everything is transparently serialized to/from the DB, # and is ready to go immediately after querying: assert obj.binary.abspath == curl.abspath print(obj.binary.abspath) # Path('/usr/local/bin/curl') obj.binary.exec(['--version']) # curl 7.81.0 (x86_64-apple-darwin23.0) libcurl/7.81.0 ... For a full example see our provided django_example_project/... Django Admin Usage: Display Binary objects nicely in the Admin UI bash pip install pydantic-pkgr django-admin-data-views For more info see the django-admin-data-views docs... Then add this to your settings.py: ```python INSTALLED_APPS = [ # ... 'admin_data_views' 'pydantic_pkgr' # ... ] point these to a function that gets the list of all binaries / a single binary PYDANTIC_PKGR_GET_ALL_BINARIES = 'pydantic_pkgr.views.get_all_binaries' PYDANTIC_PKGR_GET_BINARY = 'pydantic_pkgr.views.get_binary' ADMIN_DATA_VIEWS = { "NAME": "Environment", "URLS": [ { "route": "binaries/", "view": "pydantic_pkgr.views.binaries_list_view", "name": "binaries", "items": { "route": "/", "view": "pydantic_pkgr.views.binary_detail_view", "name": "binary", }, }, # Coming soon: binprovider_list_view + binprovider_detail_view ... ], } `` *For a full example see our provided [django_example_project/`](https://github.com/ArchiveBox/pydantic-pkgr/tree/main/django_example_project)...* Note: If you override the default site admin, you must register the views manually... admin.py: class YourSiteAdmin(admin.AdminSite): """Your customized version of admin.AdminSite""" ... custom_admin = YourSiteAdmin() custom_admin.register(get_user_model()) ... from pydantic_pkgr.admin import register_admin_views register_admin_views(custom_admin) ~~Django Admin Usage: JSONFormWidget for editing BinProvider and Binary data~~ [!IMPORTANT] This feature is coming soon but is blocked on a few issues being fixed first: - https://github.com/surenkov/django-pydantic-field/issues/64 - https://github.com/surenkov/django-pydantic-field/issues/65 - https://github.com/surenkov/django-pydantic-field/issues/66 Expand to see more... ~~Install django-jsonform to get auto-generated Forms for editing BinProvider, Binary, etc. data~~ bash pip install django-pydantic-field django-jsonform For more info see the django-jsonform docs... admin.py: ```python from django.contrib import admin from django_jsonform.widgets import JSONFormWidget from django_pydantic_field.v2.fields import PydanticSchemaField class MyModelAdmin(admin.ModelAdmin): formfield_overrides = {PydanticSchemaField: {"widget": JSONFormWidget}} admin.site.register(MyModel, MyModelAdmin) ``` For a full example see our provided django_example_project/... Examples Advanced: Implement your own package manager behavior by subclassing BinProvider ```python from subprocess import run, PIPE from pydantic_pkgr import BinProvider, BinProviderName, BinName, SemVer class CargoProvider(BinProvider): name: BinProviderName = 'cargo' def on_setup_paths(self): if '~/.cargo/bin' not in sys.path: sys.path.append('~/.cargo/bin') def on_install(self, bin_name: BinName, **context): packages = self.on_get_packages(bin_name) installer_process = run(['cargo', 'install', *packages.split(' ')], capture_output = True, text=True) assert installer_process.returncode == 0 def on_get_packages(self, bin_name: BinName, **context) -> InstallArgs: # optionally remap bin_names to strings passed to installer # e.g. 'yt-dlp' -> ['yt-dlp, 'ffmpeg', 'libcffi', 'libaac'] return [bin_name] def on_get_abspath(self, bin_name: BinName, **context) -> Path | None: self.on_setup_paths() return Path(os.which(bin_name)) def on_get_version(self, bin_name: BinName, **context) -> SemVer | None: self.on_setup_paths() return SemVer(run([bin_name, '--version'], stdout=PIPE).stdout.decode()) cargo = CargoProvider() rg = cargo.install(bin_name='ripgrep') print(rg.binprovider) # CargoProvider() print(rg.version) # SemVer(14, 1, 0) ``` TODO [x] Implement initial basic support for apt, brew, and pip [x] Provide editability and actions via Django Admin UI using django-pydantic-field and django-jsonform [ ] Implement update and remove actions on BinProviders [ ] Add preinstall and postinstall hooks for things like adding apt sources and running cleanup scripts [ ] Implement more package managers (cargo, gem, go get, ppm, nix, docker, etc.) [ ] Add Binary.min_version that affects .is_valid based on whether it meets minimum SemVer threshold Other Packages We Like https://github.com/MrThearMan/django-signal-webhooks https://github.com/MrThearMan/django-admin-data-views https://github.com/lazybird/django-solo https://github.com/joshourisman/django-pydantic-settings https://github.com/surenkov/django-pydantic-field https://github.com/jordaneremieff/djantic
Latest version: 0.5.4 Released: 2024-10-21
pydantic-init Through-Pydantic: Initialize Python Objects from JSON, YAML, and More
pypi package. Binary
Latest version: 0.1.0 Released: 2024-05-15