Data Manipulation
Overview
This section explains the methods for managing operational and scientific data within a workflow. It covers data transfer to and from the workcell and the LINQ cloud, as well as data transformation processes mid-workflow.
Data store
The LINQ Data Store is the local storage system on the workcell, managing data flow during a workflow run. Each workflow has its own unique data store, which is available to the scheduler and tasks during execution.
How It Works
A new data store is generated on the workcell for each workflow, specific to that run. Previous data stores are retained on disk. The scheduler can access any data stored in the data store, using it to manage tasks and execution logic. Tasks within the workflow can push or pull data from the data store via various drivers like HTTP or S3.
The data store for each workflow run is stored by default at:
/usr/local/automata/hub/data/maestro/database/maestro.db
It can be downloaded and viewed with any SQL database viewer.
Interacting with the Data Store
You can interact with the data store during a run using:
The
CodeTask
class to execute Lua or custom python scripts.HTTP Driver: Push or pull data from external endpoints.
S3 Driver: Upload or download data to cloud storage.
Evaluator Driver: Perform evaluations on the data.
CodeTask
Code tasks in LINQ workflows allow custom data manipulation using either pre-defined Python functions or custom Lua scripts. These tasks are particularly useful for transforming data, integrating with external systems like LIMS, or performing custom calculations on data stored in the data store.
There are two actions that can be executed with a code task:
Pre-defined custom python functions
In-line Lua Functions
- class linq.task.CodeTask(*, id: str = '', description: str = '', dependencies: list[Union[ForwardRef('ActionTask'), ForwardRef('CodeTask'), ForwardRef('Conditional')]] = ..., function: linq.task.LuaFunction | linq.task.PythonReferenceFunction, inputs: linq.task.Inputs = ...)
- function: LuaFunction | PythonReferenceFunction
The function to be executed.
- inputs: Inputs
The inputs to the function. The key is the name of the input. Can be a TaskOutputReference to the output of another task, a ReferenceData to some stored data, or a literal value.
- out(output_name: str | None = None) TaskOutputReference
Reference to the output of this task. If the task has only one output, output_name can be left as None.
Python functions
Custom Python functions must be pre-defined and deployed by Automata. Afterwards, they can be reused. To use a Python function in a workflow, define it using CodeTask()
with a PythonReferenceFunction
.
python_ref_calculation = CodeTask(
id="python_ref_calculation",
description="Calculate something using the output from a previous task",
inputs=Inputs(value=transform_data_from_liconic.out()),
function=PythonReferenceFunction("example_eval"), # Function defined in the customer repository
)
- class linq.task.PythonReferenceFunction
Note
Talk to your Automata Customer Success Manager to develop and deploy a PythonReferenceFunction
.
Lua Functions
Lua functions are defined in-line in the workflow as strings and are ideal for quick operations that manipulate data from the data store. Define it using CodeTask()
with a LuaFunction
.
transform_data = CodeTask(
description="Transform data using a simple Lua function",
inputs=Inputs(some_input=read_barcode_from_hotel.out("barcode")),
function=LuaFunction("function process_data(input_table) return {result = input_table.some_input + 1} end"),
)
- class linq.task.LuaFunction
In addition to task outputs like read_barcode_from_hotel.out("barcode")
, you can also use Reference Data to pass static or predefined values into Lua functions.
Here, the Lua function compares static data (x=1
) with the output from a previous task (y=ReferenceData
), demonstrating how to incorporate both static and task-based inputs. This could be useful for checking if a value is above or below a certain threashold.
calculate_something = CodeTask(
description="Calculate something using static data and the output from the previous task",
inputs=Inputs(
x=1, # Static data
y=ReferenceData(key="python_ref_calculation_result") # Reference to a previous task’s output
),
function=LuaFunction("function check_equal(input_table) return {result = input_table.x == input_table.y} end")
)
External data I/O
The data store can also have data pushed to it from external sources, or data and files can be pushed to external sources.
HTTP Driver
The HTTP driver enables the LINQ scheduler to communicate with external systems, such as LIMS or other cloud platforms, using HTTP API requests (GET or POST). This allows workflows to fetch and store data dynamically during a run, which can then be accessed by other tasks through the data store.
To enable HTTP requests in a workflow, you first need to include the HTTP driver as a configured instrument in your workcell:
http = Instrument(
id="http",
name="http",
type="http",
driver="automata_http_v1",
config={
"auth_type": 1,
"auth_secret_key": "/secrets/API_AUTH"
},
bench="na"
)
The HTTP driver supports basic authentication. Credentials can be securely stored in AWS Secrets Manager using auth_secret_key
.
Once the driver is defined in your workcell, you can make GET or POST requests by configuring them in an ActionTask
in your workflow.
http_post_barcode = ActionTask(
id="http_task",
description="send barcode in post request",
action="post", #use "get" for GET requests
instrument_type="http",
static_arguments={
"base_request": {
"url_path": "/url_path_for_http",
"additional_headers": []
}
},
shared_arguments={
"body": "value",
"response_store_key": "datastorekey"
}
)
GET
requests: Use parameters to fetch data from an external source. The response is stored in the data store underresponse_store_key
.POST
requests: Use thebody
field to send data to an external system. The response can also be stored in the data store.
Webhooks as data sources
For more information about webhooks, see Workflow Notifications.
S3 Driver
The S3 driver allows the LINQ scheduler to upload data from the LINQ Data Store to Amazon S3 buckets. Automata’s platform team sets up an S3 bucket for each project or organization. Once configured, you can use the S3 driver to transfer data between the Data Store and S3.
Define the S3 instrument in your workcell:
s3= Instrument(
id="s3",
name="s3",
type="s3",
driver="automata_s3_v1",
config = {
"aws_bucket": "XX-workcell-data-XXX-dev-ew2-atmta006", # Request value from Automata
"prefix": "s3_demo", # a folder. [Optional]
}
),
To upload or download data to or from the S3 bucket, specify the action as upload_file
or download_file
in an ActionTask
in your workflow.
s3_upload = ActionTask(
id="s3_upload",
action="upload_file", #Change to "download_file" to pull data from S3
description="Upload data to S3",
instrument_type=s3,
labware_sources=[LabwareSource(labware=plate_1)],
static_arguments={
"data_key":"datastore_key", # Key in the Data Store
"s3_key": "s3_demo/data_file.json" # Path in S3 for upload or download
}
)
Note
Use linq driver list
for all the arguments and commands available to a driver.