Publish Service
Used by the CR8TOR service to transfer the data from the data source into the storage useable by the dynamic compute resources. This service does not expose any data directly to the CR8TOR service.
The Publish Service is based on FastAPI application and its key activities are:
- Retrieving the data from the source database, specifically Databricks Unity Catalog. Data is stored in the staging container.
- Publishing the data to the target production container.
The microservice has the following endpoints, each serving a specific function:
-
POST data-publish/package - Packages the data from the source database into a staging container. Returns the payload with the details of created data files in the staging container.
-
Example Request:
{ "project_name": "Pr004", "project_start_time": "20250205_010101", "destination_type": "LSC", "destination_format": "duckdb", "source": { "name": "MyDatabricksConnection", "type": "DatabricksSQL", "host_url": "https://my-databricks-workspace.azuredatabricks.net", "http_path": "/sql/1.0/warehouses/bd1395d4652aa599", "port": 443, "catalog": "catalog_name" }, "credentials": { "provider": "AzureKeyVault", "spn_clientid": "databricksspnclientid", "spn_secret": "databricksspnsecret" }, "metadata": { "schema_name": "example_schema_name", "tables": [ { "name": "person", "columns": [ { "name": "person_key" }, { "name": "person_id" }, { "name": "age" } ] }, { "name": "address", "columns": [ { "name": "address_key" }, { "name": "address" } ] } ] } } -
Example Response:
{ "status": "success", "payload": { "data_retrieved": [ { "file_path": "data/outputs/database.duckdb" } ] } }
-
-
POST data-publish/publish - Publishes the packaged data from the staging container to the production container.
-
Example Request:
{ "project_name": "Pr004", "project_start_time": "20250205_010101", "destination_type": "LSC" } -
Example Response:
{ "status": "success", "payload": { "data_published": [ { "file_path": "data/outputs/database.duckdb", "hash_value": "6ed6e817fb78953648324b0b9e44711bb55aa790e22e2353e8af6eae1f182bfdf10f88fc0e1a33c389cc3b73346dc513fde3fda594e3725ad1a3b568a55ff41c", "total_bytes": 1585152 } ] } }
-
Configuration
Configuration common for all services
See the main guide for the microservices, located here.
Environment Variables
Environment variables required:
TARGET_STORAGE_ACCOUNT_LSC_SDE_MNT_PATH, default =./outputs/lsc-sdePath to target storage account where datasets for LSC should be storedTARGET_STORAGE_ACCOUNT_NW_SDE_MNT_PATH, default =./outputs/nw-sdePath to target storage account where datasets for NW should be storedSECRETS_MNT_PATH, default =./secretsPath to the folder where secrets are mounted.DLTHUB_PIPELINE_WORKING_DIR, default =/home/appuser/dlt/pipelines. DltHub Pipeline working directory where dltHub state files, logs and extracted data is temporarily stored. See https://dlthub.com/docs/general-usage/pipeline#pipeline-working-directory
The authentication is static API key based and requires a secret
publishserviceapikey
stored in the secret vault, e.g. Azure Key Vault. When working locally, the secret file should be stored under SECRETS_MNT_PATH folder, e.g. secrets/publishserviceapikey.