Data Manipulation

Overview

This section explains the methods for managing operational and scientific data within a workflow. It covers data transfer to and from the workcell and the LINQ cloud, as well as data transformation processes mid-workflow.

Data store

The LINQ Data Store is the local storage system on the workcell, managing data flow during a workflow run. Each workflow has its own unique data store, which is available to the scheduler and tasks during execution.

How It Works

A new data store is generated on the workcell for each workflow, specific to that run. Previous data stores are retained on disk. The scheduler can access any data stored in the data store, using it to manage tasks and execution logic. Tasks within the workflow can push or pull data from the data store via various drivers like HTTP or S3.

The data store for each workflow run is stored by default at:

    /usr/local/automata/hub/data/maestro/database/maestro.db

It can be downloaded and viewed with any SQL database viewer.

Interacting with the Data Store

You can interact with the data store during a run using:

The CodeTask class to execute Lua or custom python scripts.
HTTP Driver: Push or pull data from external endpoints.
S3 Driver: Upload or download data to cloud storage.
Evaluator Driver: Perform evaluations on the data.
SFTP Data Connector: Push data from external endpoints

CodeTask

Code tasks in LINQ workflows allow custom data manipulation using either pre-defined Python functions or custom Lua scripts. These tasks are particularly useful for transforming data, integrating with external systems like LIMS, or performing custom calculations on data stored in the data store.

There are two actions that can be executed with a code task:

Pre-defined custom python functions
In-line Lua Functions

class linq.task.CodeTask(*, id: str = '', description: str = '', dependencies: list[Union[ForwardRef('ActionTask'), ForwardRef('CodeTask'), ForwardRef('Conditional')]] = ..., function: linq.task.LuaFunction | linq.task.PythonReferenceFunction, inputs: linq.task.Inputs = ...)

function: LuaFunction | PythonReferenceFunction: The function to be executed.

inputs: Inputs: The inputs to the function. The key is the name of the input. Can be a TaskOutputReference to the output of another task, a ReferenceData to some stored data, or a literal value.

out(output_name: str | None = None) → TaskOutputReference: Reference to the output of this task. If the task has only one output, output_name can be left as None.

Python functions

Custom Python functions must be pre-defined and deployed by Automata. Afterwards, they can be reused. To use a Python function in a workflow, define it using CodeTask() with a PythonReferenceFunction.

python_ref_calculation = CodeTask(
    id="python_ref_calculation",
    description="Calculate something using the output from a previous task",
    inputs=Inputs(value=transform_data_from_liconic.out()),
    function=PythonReferenceFunction("example_eval"),  # Function defined in the customer repository
)

class linq.task.PythonReferenceFunction

Note

Talk to your Automata Customer Success Manager to develop and deploy a PythonReferenceFunction.

Lua Functions

Lua functions are defined in-line in the workflow as strings and are ideal for quick operations that manipulate data from the data store. Define it using CodeTask() with a LuaFunction.

transform_data = CodeTask(
    description="Transform data using a simple Lua function",
    inputs=Inputs(some_input=read_barcode_from_hotel.out("barcode")),
    function=LuaFunction("function process_data(input_table) return {result = input_table.some_input + 1} end"),
)

class linq.task.LuaFunction

In addition to task outputs like read_barcode_from_hotel.out("barcode"), you can also use Reference Data to pass static or predefined values into Lua functions.

Here, the Lua function compares static data (x=1) with the output from a previous task (y=ReferenceData), demonstrating how to incorporate both static and task-based inputs. This could be useful for checking if a value is above or below a certain threashold.

calculate_something = CodeTask(
    description="Calculate something using static data and the output from the previous task",
    inputs=Inputs(
        x=1,  # Static data
        y=ReferenceData(key="python_ref_calculation_result")  # Reference to a previous task’s output
    ),
    function=LuaFunction("function check_equal(input_table) return {result = input_table.x == input_table.y} end")
)

Data Connector

A data connector is a function that integrates with external data storage systems, such as SFTP, HTML, S3, LIMS, and ELN. As additional connectors are developed, you will be able to connect to more external storage systems.

HTTP Driver

The HTTP driver enables the LINQ scheduler to communicate with external systems, such as LIMS or other cloud platforms, using HTTP API requests (GET or POST). This allows workflows to fetch and store data dynamically during a run, which can then be accessed by other tasks through the data store.

To enable HTTP requests in a workflow, you first need to include the HTTP driver as a configured instrument in your workcell:

http = Instrument(
    id="http",
    name="http",
    type="http",
    driver="automata_http_v1",
    config={
        "auth_type": 1,
        "auth_secret_key": "/secrets/API_AUTH"
    },
    bench="na"
)

The HTTP driver supports basic authentication. Credentials can be securely stored in AWS Secrets Manager using auth_secret_key.

Once the driver is defined in your workcell, you can make GET or POST requests by configuring them in an ActionTask in your workflow.

http_post_barcode = ActionTask(
    id="http_task",
    description="send barcode in post request",
    action="post", #use "get" for GET requests
    instrument_type="http",
    static_arguments={
        "base_request": {
            "url_path": "/url_path_for_http",
            "additional_headers": []
        }
    },
    shared_arguments={
        "body": "value",
        "response_store_key": "datastorekey"
    }
)

GET requests: Use parameters to fetch data from an external source. The response is stored in the data store under response_store_key.
POST requests: Use the body field to send data to an external system. The response can also be stored in the data store.

S3 Driver

The S3 driver allows the LINQ scheduler to upload data from the LINQ Data Store to Amazon S3 buckets. Automata’s platform team sets up an S3 bucket for each project or organization. Once configured, you can use the S3 driver to transfer data between the Data Store and S3.

Define the S3 instrument in your workcell:

s3= Instrument(
    id="s3",
    name="s3",
    type="s3",
    driver="automata_s3_v1",
    config = {
        "aws_bucket": "XX-workcell-data-XXX-dev-ew2-atmta006", # Request value from Automata
        "prefix": "s3_demo", # a folder. [Optional]
    }
),

To upload or download data to or from the S3 bucket, specify the action as upload_file or download_file in an ActionTask in your workflow.

s3_upload = ActionTask(
    id="s3_upload",
    action="upload_file", #Change to "download_file" to pull data from S3
    description="Upload data to S3",
    instrument_type=s3,
    labware_sources=[LabwareSource(labware=plate_1)],
    static_arguments={
        "data_key":"datastore_key", # Key in the Data Store
        "s3_key": "s3_demo/data_file.json" # Path in S3 for upload or download
    }
)

Note

Use linq driver list for all the arguments and commands available to a driver.

SFTP Data Connector

The SFTP data connector allows the LINQ scheduler to securely export data to an SFTP server using the SFTP driver. Once configured, the SFTP connector enables the transfer of files between the LINQ Data Store and the external SFTP server.

Define an SFTP Instrument in Your Workcell

To use the SFTP data connector in a workflow, you must define an SFTP instrument within your workcell. This instrument specifies the connection details required to interact with the SFTP server.

sftp = Instrument(
    id="sftp",
    name="sftp",
    type="sftp",
    bench="",
    driver="automata_sftp_v1",
    config={
        "host"="" # e.g. localhost
        "port"="" # Port the SFTP server runs on, this is usually 22
        "username"="" # SFTP username
        "password_secret_key"="" # Path in AWS secrets manager to the SFTP password
        "host_key"="" # Public RSA host key
        "host_key_type"="" # e.g. “ssh-rsa”
    }
)

Export data to SFTP

To export data to an SFTP server, use the SftpDataConnector task in your workflow. This task specifies the data to be exported and the destination file path on the SFTP server.

export_data = SftpDataConnector(
    id="export_data",
    description="Export data to SFTP server",
    action=SftpDataConnectorAction.EXPORT,
    input_task=read_barcode_from_hotel.out("barcode"),
    file_path=generate_dynamic_file_name.out("result"),
)

input_task: Refers to another task in the workflow that generates the data to be exported.

file_path: Specifies the location and name of the file to be exported. This can be either:

A static file path including file name and extension, for example data/file.txt
A dynamic file path, which could be generated by another task (e.g. a CodeTask generating a timestamped file name, see the following example).

SftpDataConnector

Example: Dynamic File Path Generation

You can use a CodeTask to generate a dynamic file path for the SFTP export:

generate_dynamic_file_name = CodeTask(
    id="generate_dynamic_file_name",
    description="Generate a dynamic filename for data connector",
    inputs=Inputs(),
    function=LuaFunction(
        "function file_name() return {result = 'data/file_' .. os.date('%Y%m%d_%H%M%S') .. '.txt'} end"
    ),
)

In this example, the CodeTask will return a file name containing the exact timestamp the CodeTask was run as part of the file name, e.g. data/file_20250401_120000.txt

Note

A static file name supplied in file_path may not be the final file name. If your workflow is batched, the file name will be appended with the batch iteration number to ensure no data is overwritten during a workflow run. For example, a file_path of data/file.txt will create a file called data/file_B1.txt in the first batch of a batched workflow.

Webhooks as data sources

For more information about webhooks, see Workflow Notifications.