Upload
cain-villarreal
View
55
Download
1
Embed Size (px)
DESCRIPTION
Process/data API. Process API - intro. The workflow engine runs applications Executable code in different languages API – methods Web services Applications require setup to run Where are they Where will they run (farm, local machine, specific machine Data IO Version etc. - PowerPoint PPT Presentation
Citation preview
Process/data API
Process API - intro
• The workflow engine runs applications– Executable code in different languages– API – methods– Web services
• Applications require setup to run– Where are they– Where will they run (farm, local machine, specific
machine– Data IO– Version etc
Process API - Intro
• We do this 2 ways• As a single object process
– We have defined a data object to hold things– We can use the same idea for the processAPI– Set up the object and “doIt”
• As setup calls and application call– Define setups for a process– Use a single call to run the process
ProcessAPI
• The following are the fields within the WFE process object. (ignoring WFE specific)– Name & Human-readable name : not impt.– type– File : Where, could be URL– Data : see later– Runtime/fail time : does the API monitor these– parameters
Process Object fields
• Type – Ie is this an exec, URL, and so on
• Process– The actual mapped process name. A Site specific mapping
will define the actual meaning of the process name
• Location : – Where is the application to run (client/server/farm), or
other things like URL.– Is it useful to have this in the WFE - XML file – or as a
separate process API XML setup. I would think the latter.
Process-API
• Data– The WFE data object defines input and output at
run time – only mutability is class (static)– We have to pass data to a process, then it might
be sensible to put the process object– See the data API definition for the object.– Some object containers are data in and some are
data out – they need to have the same structure though.
Process-API
• Runtime and failtime– These are WFE exception manager properties– It might not be a good idea reproduce the
exception outside the WFE as the WFE needs to handle any failure. Process failure must not be hidden from the WFE
Process API
• Parameters– Probably a python dictionary is best here.– Needs to be exposed to the WFE since different
parts of the workflow may need different parameters (consider MAXIT)
Process API
• The problem I have is defining which data object is which. The data object needs a definition so the program knows what the data – see process API.– Using python class object
ProcOb = ApiProcess()ProcOb.set( ‘name ‘,‘myAlignProg’)ProcObset(‘parameters’], ‘-P 33 –x ddd’)ProcOb.set(‘type’,‘exec’)ProcOb.add(‘input’, data.ob[‘D1’])ProcOb.add(‘input’, data.ob[‘D2’])ProcOb.add(‘output1’,data.ob[‘D3’])
These will of course be defined in the workflow engine variables.Note that adding of multiple data objects
Process API
• Program Exec– Executable– Process : Use a mapped
name for application – site specific
– Location : local/server/farm – mapped names
– How do we know which objects are which ?
ProcOb = ApiProcess()ProcOb.set(‘type’,‘exec’)ProcOb.set(‘process’,‘maxit’)ProcOb.set(‘location’,’server’)ProcOb.add(‘input’, data.ob[‘D1’])ProcOb.add(‘input’, data.ob[‘D2’])ProcOb.add(‘output1’,data.ob[‘D3’])processAPI.run (procOb)
Process API
• DataAPI copy– Copy data– Parameters = new version– Data objects – see later
ProcOb = ApiProcess()ProcOb.set(‘name ‘, ‘copy’)ProcOb.set(‘parameters’, ‘newVersion’)ProcOb.set(‘process’,‘method’)ProcOb.set(‘location’,’dataAPI’)ProcOb.add(‘input’, data.ob[‘D1’])ProcOb.add(‘output’,data.ob[‘D3’])processAPI.run (procOb)
Automated questions in XML• <wf:task taskID="TD3" name="SequenceOK" nextTask="J1" breakpoint="false">
<wf:description>Check whether the sequence align was OK</wf:description> <wf:decision type="AUTO"> <wf:dataObjectsLocation> <wf:location dataID="D6" type="input"/> </wf:dataObjectsLocation> <wf:nextTasks> <wf:nextTask taskID="TW4"> <wf:function dataID="D6" gte="20" less="200000000"/> </wf:nextTask> <wf:nextTask taskID="TM5"> <wf:function dataID="D6" gte="2" less="20"/> </wf:nextTask> <wf:nextTask taskID="T9"> <wf:function dataID="D6" gte="0" less="2"/> </wf:nextTask> </wf:nextTasks> </wf:decision> </wf:task>
Decision data object
Decision option
More complex functions will require python methods specific to the question
Detail description to technology
• A data object is pre-declared in the XML– Data place holder– Defines API object detail
• A task object can reference data objects– As input, output or both
• A process task :• API method• Exec program
<wf:dataObject dataID="D1" name="dataToCopy" type="Object" mutable="false"> <wf:description>General object to copy</wf:description> <wf:location namespace="__old_object" where="DM"/> </wf:dataObject>
<wf:task taskID="T2" name="copyData" nextTask="T9" breakpoint="false"> <wf:description>Run API task to copy data object</wf:description> <wf:process runTime="00:00:04" failTime="00:00:10"> <wf:detail name="APIcopy" type="method" where="API"/> <wf:dataObjectsLocation> <wf:location dataID="D1" type="input"/> <wf:location dataID="D2" type="output"/> </wf:dataObjectsLocation> </wf:process> </wf:task>
Creating data objects in WFE• # the data object ID'
self.object.set("deposition-dataset-ID",depID) self.object.set("workflow-class-ID",classID) self.object.set("workflow-instance-ID",instID)
self.type = data.getAttribute("type") self.object.set("return-type",data.getAttribute("type")) if (data.getAttribute("mutable")=="true"): self.object.set("access",data.getAttribute("read-write")) else: self.object.set("access",data.getAttribute("read-only"))
# internal workflow cross reference self.name = data.getAttribute("dataID") self.nameHumanReadable = data.getAttribute("name")
for detail in data.childNodes: if (detail.nodeName == "wf:description"): self.description = detail.firstChild.data elif (detail.nodeName == "wf:location"): self.nameSpace = detail.getAttribute("namespace") self.object.set("data-object-name",detail.getAttribute("namespace")) self.where = detail.getAttribute("where") self.object.set("data-object-location",detail.getAttribute("where"))
Each data XML statement is stored as a reference object
This object is a place holder which can be passed to processes
It contains information where to access data
The engine data object– May be a real or virtual payload of data– Where, what and type– Payload is passed between tasks– The WF is a data processing pipeline
• A real value can be examined to effect the WF• The path is dependent on data values (auto/manual
decisions are based on these values)• The data version is WF instance data
– Can be domain data (via dataAPI)– Can be WF data (via statusAPI) – scope defined by the
object the data is stored in
Engine process manager• def run(self):
self.status = 1; for key, value in self.inputObjects istat = myApi.do(value)
• if self.task.uniqueType == "test": # test method - just counts for 5 seconds for i = in (0,5): time.sleep(1.0) elif self.task.uniqueType == "method": # this is an API process if self.task.uniqueWhere == "API": # this is an API method call self.processAPI.runMethod(task.uniqueName) elif self.task.uniqueType == "exec": # this is an exec program found "where" self.processAPI.runExec(task.uniqueName, task.uniqueWhere)
• for key, value in self.outputObjects istat = myApi.do(value)
self.statusAPI.setStatus(“finished”)
This is a thread – running inside exception manager
Send the request data objects
Get the response data objects
What sort of process is it ?
Workflow granularity• It does not really matter• A process can be as complex as you like
– Depends on go-back granularity– Depends on “how much would loose if it crashed”
• Data is the problem !– The workflow is a flow of data – so hiding data from the engine will
collapse a workflow to nothing.– The pathway choice is all about data – the less visible the data – the less
choice in the workflow.– If a process decides what to do with data the consequence is :
• Loose go-back ability• Loose track of the data and what is going on• Loose plug and play on the process.• Loose exception management.
Engine design examples
Read XML – store objects
and tasks
Run tasks – follow path
Start/restart (maybe at go-
back point)
Exit
Send data object
requests
Run process
Get response data objects
Send data objects to interface
Wait for interface
Send actionable
events
Get return action from
interface
Process task
Interface task
John’s requirements 1• 1) Identify and copy and archive object
– Object declaration <wf:dataObject dataID="D1" name="dataToCopy" type="Object" mutable="false"> <wf:description>General object to copy</wf:description> <wf:location namespace="__old_object" where="DM"/> </wf:dataObject> <wf:dataObject dataID="D2" name="dataCopy" type="Object" dependence="D1" mutable="true"> <wf:description>General object - new copy of data</wf:description> <wf:location namespace="__new_object" where="DM"/> </wf:dataObject>
– Task declaration• <wf:task taskID="T2" name="copyData" nextTask="T9" breakpoint="false">
<wf:description>Run API task to copy data object</wf:description> <wf:process runTime="00:00:04" failTime="00:00:10"> <wf:detail name="APIcopy" type="method" where="API"/> <wf:dataObjectsLocation> <wf:location dataID="D1" type="input"/> <wf:location dataID="D2" type="output"/> </wf:dataObjectsLocation> </wf:process> </wf:task>
Name reference
The actual data
The process – a method within the API
John’s requirement 2make new data version
• Declare data– Input D1– Output D2
• Declare task– Method in API
<wf:dataObjects> <wf:dataObject dataID="D1" name="dataToAddNewVersion" type="Object" mutable="true"> <wf:description>General object to copy</wf:description> <wf:location namespace="__object" where="DM"/> </wf:dataObject> <wf:dataObject dataID="D2" name="dataNewVersion" type="Object" dependence="D1" mutable="true"> <wf:description>New version of data</wf:description> <wf:location namespace="__object" where="DM"/> </wf:dataObject> </wf:dataObjects>
<wf:task taskID="T2" name="copyData" nextTask="T9" breakpoint="false"> <wf:description>Run API task create a new version of an object</wf:description> <wf:process runTime="00:00:04" failTime="00:00:10"> <wf:detail name="APInewVersion" type="method" where="API"/> <wf:dataObjectsLocation> <wf:location dataID="D1" type="input"/> <wf:location dataID="D2" type="output"/> </wf:dataObjectsLocation> </wf:process> </wf:task>
John’s requirement 3Get version list and show
• Data – 3 objects– D1 – object target– D2 – Version list– D3 – Which one to use
• Some tasks– Get list from API– Interface to chose
(not shown)
<wf:dataObject dataID="D1" name="dataObjectTarget" type="Object" mutable="false"> <wf:description>target object to query on</wf:description> <wf:location namespace="__object_name" where="DM"/> </wf:dataObject> <wf:dataObject dataID="D2" name="VersionList" type="List" mutable="false"> <wf:description>Return version list</wf:description> <wf:location namespace="versionList" where="local"/> </wf:dataObject> <wf:dataObject dataID="D3" name="useVersion" type="Integer" mutable="true"> <wf:description>Version to use</wf:description> <wf:location namespace="version" where="WF"/> </wf:dataObject>
<wf:task taskID="T2" name="requestVersionList" nextTask="T3" breakpoint="false"> <wf:description>Run API to get the version list of an object</wf:description> <wf:process runTime="00:00:04" failTime="00:00:10"> <wf:detail name="APIversionList" type="method" where="API"/> <wf:dataObjectsLocation> <wf:location dataID="D1" type="input"/> <wf:location dataID="D2" type="output"/> </wf:dataObjectsLocation> </wf:process> </wf:task>
John’s requirement 4/5data selector
• A data object may need additional qualifiers to say what it is.– Selector value– “selection”
• It is likely that the qualifier will :– need to be a WF class (static) variable– Need to be a WF inst (dynamic) variable.
<wf:dataObject dataID="D2" name="dataToGetwithQualifier" type="String" mutable="true"> <wf:description>general object with qualifer</wf:description> <wf:location namespace="__object" qualifier="_entity.id=1" where="DM"/> </wf:dataObject>
<wf:dataObject dataID="D2" name="dataToGetwithQualifier" type="String" mutable="true"> <wf:description>general object with qualifer</wf:description> <wf:location namespace="__object" qualifier="set_entity.type='protein' where entity.id=1" where="DM"/> </wf:dataObject>
John’s requirement 6Length/size of object
• <wf:dataObject dataID="D1" name="dataTarget" type="Object" mutable="false"> <wf:description>General object to copy</wf:description> <wf:location namespace="__object" where="DM"/> </wf:dataObject> <wf:dataObject dataID="D2" name="dataLength" type="integer" dependence="D1" mutable="true"> <wf:description>Length of data object</wf:description> <wf:location namespace="dataLength" where="WF"/> </wf:dataObject>
<wf:process runTime="00:00:04" failTime="00:00:10"> <wf:detail name="APIObjectSize" type="method" where="API"/> <wf:dataObjectsLocation> <wf:location dataID="D1" type="input"/> <wf:location dataID="D2" type="output"/> </wf:dataObjectsLocation> </wf:process>
Define object and place holder for size value
Run task to input data to function, and return length
John’s requirement 7Format conversion
• <wf:dataObjects> <wf:dataObject dataID="D1" name="dataObjectPDB" type="Object" mutable="false"> <wf:description>General object to convert format</wf:description> <wf:location namespace="__object" where="DM"/> </wf:dataObject> <wf:dataObject dataID="D2" name="dataObjectMMCIF" type="Object" dependence="D1" mutable="true"> <wf:description>New data in different format</wf:description> <wf:location namespace="__object" where="DF"/> </wf:dataObject> <wf:dataObject dataID="D3" name="status" type="string" dependence="D1" mutable="true"> <wf:description>A status code return</wf:description> <wf:location namespace="__object" where="DF"/> </wf:dataObject> </wf:dataObjects>
• <wf:task taskID="T2" name="formatChange" nextTask="T9" breakpoint="false"> <wf:description>Run API task to change the format of data</wf:description> <wf:process runTime="00:00:04" failTime="00:00:10"> <wf:detail name="APIformatChangePDBtoPDBx" type="method" where="API"/> <wf:dataObjectsLocation> <wf:location dataID="D1" type="input"/> <wf:location dataID="D2" type="output"/> <wf:location dataID="D3" type="output"/> </wf:dataObjectsLocation> </wf:process> </wf:task>
Input and output formats
Place holder for status – this might be so intrinsic to all tasks that it should probably be pre-declared and always present
And the API function to do this