NuCalm uses a mix of Resource Intentful and Procedural APIs. This is due to the fact that all lifecycle operations on applications may not be intentful.
Note: The challenges and use-cases of intetful API’s are described in another document
The following top level entities will exist in Aplos:
Introducing “Locks”
The IntentSpec entity will require additional attributes:
These attributes can be sent in the opaque_data field of the intent_spec by the caller using the create_or_update_intent_spec method. Any component that wishes to perform a transaction (nested or otherwise that requires a lock) on any entity kind will be able to take a lock on the entity in the IntentSpec.
The Intent Gateway will respond to any PUT or POST requests on the entity with an appropriate response stating that “Entity is Locked” (423 Locked (WebDAV; RFC 4918) The resource that is being accessed is locked). The component post completion of the transaction should update the final spec and unlock the said entity in the IntentSpec by clearing all the fields. Any GET requests will be handled by Aplos/Aplos Engine with the current spec and status accordingly.
The component may not lock all the nested entities when the transaction is initiated. For e.g. the app spec when created in the intent_spec entity kind will be locked. The nested entity the VM will be locked as part of the create_action of the VM. The nested entities too shall have to be locked. Once the application create is successful the application entity and associated entities will be unlocked together along with the final spec. Batch APIs can be used to complete the unlocking operation.
In case of failure, the Component shall update the IntentSpec with the spec of the Entity as it exists at point of failure. Entity, Spec, and Status in such cases may or may not match. User will trigger the necessary actions to ensure that the Application finally results in a steady state.
Engine is comprised of 3 component: A Manager - Jove, set workers - Hercules, and a callback listener - Iris. There can be one manager and multiple workers per node. Across nodes, only one manager is active (or a leader) at a given point in time. All worker process in a PCVM connect to the local manager when they are brought up. The associated manager will maintain the “connected worker pool”. The styx/http endpoint will contact the elected manager for any run action processing. A request to engine can be of two types viz., run an action, abort an already running action. The engine will convert the action using contextual information available from application to workflows processable by orchestration engine and starts the run of corresponding workflows.
Here are the processing steps of request:
AppBlueprint will be derived from Resource class and all API calls will be procedural in nature. Users will be able to create, edit and delete app_blueprints. The blueprints can be in various states while it is being created/edited. Its various states can be Draft, Compiled, Active, InActive, Deleted, Error. Once the blueprint is successfully compiled and verified it can be launched to create an App.
app_blueprints → app.spec Generation
When the blueprint is launched using POST /app_blueprints/{uuid}/launch, following things happen:
app.spec parsing and runbook generation
Jove will generate the app spec and runbooks for its entities. It does so by carrying out the following:
Runbook → Epsilon workflow generation
The following steps be carried out by the workflow generator:
Note: The assumption here is that epsilon in PCVM would be able to reach the VMs in all deployments. This needs more discussion with the infra team.
LCA Execution
LCA Execution involves Epsilon and Iris:
Syntax
Ex: @@{my_vm.vm_ip}@@. Fetch “vm_ip” property from “my_vm” object which may either be any one of the supported object types.
Supported Objects
Property access across applications is not supported for now.
E.g.: @@{Deployment.vm_ip}@@
Fetch “vm_ip” from the deployment on which the current script is being executed without having to know the name of the deployment directly.
Similar to the previous implementation, prefix can be changed.
@@{calm_array_index}@@ . | Index of the object within the array |
@@{calm_blueprint_name}@@ | Name of the blueprint that the app was created from |
@@{calm_blueprint_uuid}@@ | UUID of the blueprint that the app was created from |
@@{calm_application_name}@@ | Name of the application |
@@{calm_application_uuid}@@ | UUID of the application |
@@{calm_project_name}@@ | Name of the project that the application belongs to |
@@{calm_is_within(“time1”, “time2”)}@@ | Return ‘1’ if the current time is within the supplied time1 and time2 range |
A translation layer in calm will expand these macros to either their actual current value or to epsilon macros to fetch their values at run time.
The decision of whether to evaluate at run time or before running will be made based on dependencies, task outargs and consolidated runbook outargs (in case of call runbook tasks).
For example:
How it works?
Pass the script to a Lexer which will have two states:
INITIAL
The only token here will be ‘@@{’ which will switch the lexer to ACTIVE state, the rest of the buffer is all ignored.
ACTIVE
Continue to grab everything from the buffer until ‘}@@’ which will switch the lexer back to INITIAL state.
The part of the buffer that we grabbed in ACTIVE state can be evaluated directly without any special grammar for now, later this can be changed when we have the need for more complex stuff.
Every calm object maps to a corresponding object in Epsilon.
Epsilon entities
Epsilon machines
Properties on Epsilon objects
Entity properties in epsilon are a consolidated list of values for each property on individual elements. Since the machine is shared between multiple elements, namespacing is used to maintain uniqueness of property names.
Namespace for a Calm object is: ‘<Calm_Object_Type>_<Calm_Object_Name>_’, for a deployment d1, properties on the epsilon machine would look like this: ‘Deployment_d1_<prop_name>’.
No namespacing is done for Entity properties, but the namespace is used to filter props belonging to that Group entity when consolidating props from machines.
Epsilon lib changes for partial updates (PATCH support)
Earlier, machines and entity apis only supported a full update. This meant that if two epsilon workflows running in parallel tried to update the same machine, at the same time, the last update would overwrite the previous one.
This behaviour has been changed for property updates (Entity props and machine props) by allowing partial updates. These changes are present in epsilon lib, zaffi, and pyepsilon.
Example
Now if Task 1 and Task 2 were to execute parallely, the resulting props on the machine would be:
[{‘name’: ‘a’, ‘value’: ‘1’}, {‘name’: ‘b’, ‘value’: ‘2’}]
The update_props api and Namespacing make it possible to have a shared Epsilon machine/entity between multiple Calm objects.
Macro Translation
To take care of property inheritance, namespacing, and dependencies we need to translate all macros to Epsilon readable macros.
How and where it happens
When building the workflow for any action (action.build_wf()), macro translation involves the following:
Macro context
The Macro context contains the relationship between calm objects. This is how it looks:
The reason Substrates sit above deployments is because all objects inherit properties of the substrate.
Parser
The parser has 2 modes - the first is a lexer switch to consume everything in the buffer till a trigger (‘@@{’) is encountered and the second is a parser that will parse everything from ‘@@{‘ upto ‘}@@’ to generate our ast. Everything other than properties, built-in macros and dependencies is ignored, Epsilon’s macro parser will take care of that.
Translation of properties
Epsilon does not understand Calm’s object model and property inheritance. Therefore, for a Task running on a certain service, if you need to access a certain property coming from some other task that ran on the substrate, for example, say @@{ip_address}@@, it needs to be translated to @@{Substrate_<substrate_name>_ip_address}@@ for epsilon to correctly expand the macro the the relevant one coming from the substrate.
Translation of dependencies
Despite entities inheriting properties, a dependency from one Calm object to another doesn’t generate any parent-child relationship on the corresponding epsilon entities, which is why macros need to be translated to something epsilon can evaluate, i.e. AZ_LIST(Entity(uuid=<uuid>).get(Property(“<prop_name>”))).
Dependencies in nuCalm only hold for system actions. They can be expressed in three ways:
Dependency Types
Inherent dependencies:
These are inherent to the model and no specification is necessary. For eg. Substrate has to be created before packages and services. Services have to be stopped before package is uninstalled etc. In terms of dependencies it translates into services depend on their packages which both depend on the underlying substrate. They are inherent to the system and used by system actions.
Explicit dependencies:
- They are expressed by depends_on list in the config section of different calm entities.
- Used only by system actions.
For eg. Theoretically depends_on list of S2 can be as follows [S1, SUB10, DEP4] where S1 and S2 are part of same deployment, SUB10 is a part of another deployment and DEP4 is yet another deployment. But in usage the depends_on list on a service will have only other services because it is the logical unit we want to expose.
In the application context S2 can be created/started only after service S1, substrate SUB10 and DEP4 are created/started and should be deleted/stopped after the other three entities are deleted/stopped.
In terms of system actions on application it means that create RB of S1, SUB10 and DEP4 will be run before the create RB of S2. A dependency edge from S2 to S1 translates into an orchestration edge from create/start CallRunbookTask of S1 to CallRunbookTask of S2.
When a system action is run in the context of a deployment, only the entities in the depends_on list which are a part of this deployment are used.
When an action is run in the context of the service S2 alone, these dependencies don’t hold. We will not enforce any of the dependencies.
Implicit dependencies calculated by the usage of macros in tasks:
These dependencies are created within the context of a system action. Tasks can use variables and certain attributes from other entities. Mere usage of a variable in a task does not translate to a dependency. When a macro is used in a task and another task in the action sets the same, it translates in a dependency and an orchestration edge (the reverse of the dependency edge) in an action. A getter and setter on a variable have to be a part of the action.
These dependencies are also possible only in system actions. The dependency from a task to another when they are not in the same callrunbooktask, translates into an orchestration edge of the callrunbook tasks they are a part of. Edges between tasks across calmrunbooktasks are not possible in system actions.
So when a dependency is created between tasks T1 (getter) and T2 (setter) which are are part of same callrunbooktask, the orchestration edge is created between T1 and T2 (T2 → T1)
When a dependency is created between two tasks T1 and T2 which are part of different callrunbooktasks CT1 and CT2, we have to traverse up the chain till we get the first callrunbooktask and create an orchestration edge CT2 → CT1
The implicit dependencies between callrunbooktasks of a create action will be promoted to the depends_on list so these can be used by other system actions.
Dependencies as presented to the user (status sections)
Dependencies in system actions are present to the user as a dependency list. This list is used to show dependency edges in the UI. This list has no use for generating the orchestration edges.
When actions are generated, the use of setter and getter tasks translates into orchestration edges between the parent callrunbooktasks and the dependency_list is generated for use in the topology view.
Dependencies and System generated Actions
We generate actions from different spec params which can change independently and the generated action itself can be edited independently. So we have to impose restrictions with this flexibility. Change in action edges directly replaces action to user edited. Changes in dependency list rebuilds the edges and marks the action as system generated.
To summarize, if the dependency list changes after user has edited any of the generated actions, user-edited actions should be thrown away and system should rebuild all actions.
Dependencies and User Actions
No orchestration or dependency edges will be created based on any type of dependency - inherent, explicit or implicit.
In a user action, the use of setter and getter task do not translate to orchestration edges between those tasks. The user will have to explicitly draw an edge.
API Format
Generic Secret String
Any “<secret string>” type in spec should follow the below structure:
{
"<secret_key_name>": { # e.g. - password in credentials, secret_access_key in accounts
"attrs": {
"is_secret_modified": "<boolean>",
"secret_reference": { # secret reference section to handle secrets exposed as top-level entities
"uuid": "<uuid>",
},
"value" : "string", # used only for POST/PUT. During PUT, modified bit in attrs should be set to True. A GET call does not fill in the value and resets the modified bit to False.
},
}
Later, if /secrets (or equivalent) are exposed as a top-level entity in Aplos, uuid in secret_reference can be used to update the secrets independently.
Secret Variable
Secret variables are supported in nuCalm. Any secret variable should follow the below structure:
{
"uuid": "<uuid>",
"description": "string",
"label": "string",
"name": "string",
"type": "secret",
"attrs": {
"is_secret_modified": "<boolean>",
"secret_reference": {
"uuid": "<uuid>",
},
},
"val_kind": "string",
"var_type": "local",
"value": "",
}
The attrs field in variables would be used to store secret related attributes. All secret variables should be available as macros like any other variables.
Credential
A credential is used in nuCalm to store sensitive information like passwords/keys. These details are used connect to a user/third party system. From the AppSpec doc, the way to define a credential is given as follows:
name: <string>
uuid: <uuid>
type: <enum> # passwd, key
username:
secret: <secret string>
where secret follows the generic secret type mentioned above. So, an expanded view would look like:
name: <string>
uuid: <uuid>
type: <enum> # passwd, key
username:
secret:
value: <string>
attrs:
is_secret_modified: <boolean>
secret_reference:
uuid: <uuid>
username and secret fields in a credential would be available as macros. A user should be able to use @@{<credential_name>.username}@@ and @@{<credential_name>.secret}@@.
Storing Secrets
All secrets should ideally be stored using a vault like service. As such a service does not exist yet, as a starting point all secrets should be encrypted (AES) and stored in a seperate collection in the db. Some points to be considered while storing secrets:
Packaging Mechanism - RPM Packages
The following rpm packages would be built as part of nucalm continuos integration process:
These packages would be stored in a private yum repository.
Delivery Mechanism - Docker Container Images
Docker images would be built using DockerFiles which would pull the respective rpm packages built in the previous step. These images would derive from a base container image provided by nutanix infrastructure. These images would be pushed to a private docker registry for consumption.
Dependencies
Services required by nucalm:
Services dependent on nucalm:
nucalm-engine and epsilon would register with service discovery when they are run. Similarly, the dependent services would be discovered using platform’s service discovery mechanism (The assumption now is that it would use zookeeper).