Node in the Network

Node's entire lifetime

This section describes the flow a node will follow during its entire lifetime.
When they join the network, they won't yet be a member of any section.
First,they will have to bootstrap with their proxy node, receive a RelocateInfo and attempt to join the section that this RelocateInfo is pointing to.
Once they have a full section, they will be able to operate as a full section member.

OWN_SECTION refers to this node's own section.
It is an Option<Prefix>.
While a node is being relocated, the value will be none. Once they get accepted into a section, it becomes Some(that_section_s_prefix)

This function gets called when a node just joined the Network.
At this stage, we are connected to a proxy node and they indicate to us which section we should join.

Once a node has joined a section (indicated by OWN_SECTION.is_some()), they will be able to perform as a member of that section until they are relocated away from it.
See StartTopMember graph for details.

First Create new identity with public key-pair
The node connects to a proxy with the new identity to use for joining the new section as a full node.

Once a node knows where to be relocated, they will follow this flow to become a full member of the section.
This covers resource proof from the point of view of the node being resource-proofed.
The output of this function is an Option. If we fail resource proof, it is none, which means we will have to bootstrap again. If it is Some, it contains the RelocateInfo we need to join this section.
See JoiningRelocateCandidate graph for details.

graph TB Start --> LoopStart style Start fill:#f9f,stroke:#333,stroke-width:4px LoopStart --> HasSection HasSection(("Check?")) HasSection --"OWN_SECTION.is_none()"--> BootstrapAndRelocate BootstrapAndRelocate["BootstrapAndRelocate:
Get RelocateInfo"] BootstrapAndRelocate --> ReBootstrapWithNewIdentity HasSection --"OWN_SECTION.is_some()"--> StartTopMember StartTopMember["StartTopMember
"] style StartTopMember fill:#f9f,stroke:#333,stroke-width:4px StartTopMember --> ReBootstrapWithNewIdentity ReBootstrapWithNewIdentity["Rebootstrap
with new relocated identity
output: RelocateInfo"] style ReBootstrapWithNewIdentity fill:#f9f,stroke:#333,stroke-width:4px ReBootstrapWithNewIdentity --> JoiningRelocateCandidate JoiningRelocateCandidate["JoiningRelocateCandidate(RelocateInfo)
output: JoiningApproved"] style JoiningRelocateCandidate fill:#f9f,stroke:#333,stroke-width:4px SetNodeApproaval["OWN_SECTION=JoiningApproved"] JoiningRelocateCandidate --> SetNodeApproaval SetNodeApproaval --> LoopEnd LoopEnd --> LoopStart

Becoming a full member of a section

This is from the point of view of a node trying to join a section as a full member.
This node is going to try to be accepted as candidate until it receive a NodeConnected RPC to complete this stage successfully.
This node is going to perform the resource proof until it receive a NodeApproval RPC to complete this stage successfully.
If the node is not accepted, after time out, it will try another section.
graph TB JoiningRelocateCandidate --> InitialSendConnectionInfoRequest JoiningRelocateCandidate["JoiningRelocateCandidate
(Take destination section nodes)"] style JoiningRelocateCandidate fill:#f9f,stroke:#333,stroke-width:4px EndRoutine["End of JoiningRelocateCandidate
"] style EndRoutine fill:#f9f,stroke:#333,stroke-width:4px InitialSendConnectionInfoRequest["send_rpc(CandidateInfo) to target NaeManger

schedule(TimeoutResendInfo)
schedule(TimeoutRefused)"] InitialSendConnectionInfoRequest-->LoopStart LoopStart --> WaitFor WaitFor(("Wait for 6:")) LocalEvent((Local
Event)) WaitFor --> LocalEvent LocalEvent -- ResourceProofForElderReady --> SendFirstResourceProofPartForElder SendFirstResourceProofPartForElder["send_rpc(
ResourceProofResponse)
with first part for elder"] SendFirstResourceProofPartForElder --> LoopEnd LocalEvent--"TimeoutResendInfo triggered
CONNECTED==false"--> ResendCandidateInfo ResendCandidateInfo["send_rpc(CandidateInfo)

schedule(TimeoutResendInfo)"] ResendCandidateInfo --> LoopEnd LocalEvent--"TimeoutRefused
triggered"--> EndRoutine Rpc((RPC)) WaitFor --> Rpc Rpc -- NodeApproval --> EndRoutine Rpc -- NodeConnected --> NodeConnected NodeConnected["CONNECTED=true

kill_scheduled(TimeoutRefused)"] NodeConnected-->LoopEnd Rpc -- ConnectionInfoRequest --> OnConnectionInfoRequest OnConnectionInfoRequest["send_rpc(
ConnectionInfoResponse)"] OnConnectionInfoRequest-->LoopEnd Rpc -- ResourceProofReceipt --> SendNextResourceProofPartForElder SendNextResourceProofPartForElder["send_rpc(
ResourceProofResponse)
part for elder"] SendNextResourceProofPartForElder --> LoopEnd Rpc -- ResourceProof --> StartComputeResourceProofForElder StartComputeResourceProofForElder["start_compute_resource_proof(source elder)

schedule(TimeoutRefused)"] StartComputeResourceProofForElder --> LoopEnd Rpc -- "ExpectCandidate
RefuseCandidate
RelocateResponse
..." --> VoteParsecRPC VoteParsecRPC["vote_for(the parsec rpc)
(cache for later)"] VoteParsecRPC --> LoopEnd LoopEnd --> LoopStart

Node as a member of a section

Once a node has joined a section, they need to be ready to take on multiple roles simultaneously:

All of these flows are happening simultaneously, but they share a common event loop. At any time, either all flows or all but one flows must be in a "wait" state.
If our section decides to relocates us, we will have to stop functioning as a member of our section and go back to the previous flow where we will "Rebootstrap" so we can become a member of a different section.

graph TB style StartTopMember fill:#f9f,stroke:#333,stroke-width:4px style EndRoutine fill:#f9f,stroke:#333,stroke-width:4px StartTopMember --> InitializeNodeInternalState InitializeNodeInternalState["intialise_node_internal_state()

(Parsed, Routing table...)"] InitializeNodeInternalState --> ConcurentStartElder ConcurentStartElder{"Concurrent
start elder"} ConcurentStartElder --> ConcurentStartSrc ConcurentStartSrc{"Concurrent
start src"} ConcurentStartSrc --> StartTopLevelSrc style StartTopLevelSrc fill:#f9f,stroke:#333,stroke-width:4px ConcurentStartSrc --> StartRelocateSrc style StartRelocateSrc fill:#f9f,stroke:#333,stroke-width:4px ConcurentStartElder --> ConcurentStartDst ConcurentStartDst{"Concurrent
start dst"} ConcurentStartDst --> StartTopLevelDst style StartTopLevelDst fill:#f9f,stroke:#333,stroke-width:4px ConcurentStartDst --> StartRelocatedNodeConnection StartRelocatedNodeConnection[Start RelocatedNodeConnection] style StartRelocatedNodeConnection fill:#f9f,stroke:#333,stroke-width:4px ConcurentStartDst --> StartResourceProof StartResourceProof[Start ResourceProof] style StartResourceProof fill:#f9f,stroke:#333,stroke-width:4px ConcurentStartElder --> StartCheckAndProcessElderMergeChange style StartCheckAndProcessElderMergeChange fill:#f9f,stroke:#333,stroke-width:4px ConcurentStartElder --> CheckOnlineOffline style CheckOnlineOffline fill:#f9f,stroke:#333,stroke-width:4px ConcurentStartElder --> WaitFor WaitFor(("Wait for 0:")) Rpc((RPC)) WaitFor --> Rpc Rpc -- "RelocatedInfo RPC"--> EndRoutine EndRoutine["EndRoutine: Kill all sub routines"]

Destination section

As a member of a section, our section will sometimes receive a node that is being relocated. These diagrams are from the point of view of one of the nodes in the section, doing its part to handle the node that is trying to relocate to this section.

Deciding when to accept an incoming relocation

This flow represents what we do when a section contacts us to relocate one of their nodes to our section.
The process starts as we receive an ExpectCandidate RPC from this node.
We vote for it in PARSEC to be sure all members of the section process it in the same order.
Once it reaches consensus, we are ready to process that candidate by letting them connect (see RelocatedNodeConnection) and then perform the resource_proof (see ResourceProof).
There are some subtleties, such as the fact that we only want to process one candidate at a time, but this is the general idea.

We receive this RPC from a section that wants to relocate a node to our section.
The node is not communicating with us yet, only once we sent RelocateResponse RPC to the originating section.
On receiving it, we vote for ParsecExpectCandidate to process it in the same order as other members of our section.
It kickstarts the entire chain of events in this diagram.
Note that we could also see consensus on ParsecExpectCandidate before we ourselves voted for it in PARSEC, as long as enough members of our section did.

If we know of a section that has a shorter prefix than ours, we prefer for them to receive this incoming node rather than ourselves as it will help keep the Network's sections tree balanced.
This shorter_prefix_section is a function that will return None if we are the shortest of any section we know, Some if there is a better candidate.
If it is Some, we will relay the RPC to them instead of accepting the rellocation to our own section.

We want to accept at most one incoming relocation at a time into our section.
waiting_proofing function return the list of nodes that we have yet to resource proof, (States from WaitingCandidateInfo until it reach OnlineState).
When waiting_proofing() is not empty and we reach consensus on ParsecExpectCandidate, we send a RefuseCandidate RPC to the would-be-incoming-node so they can try another section or try again later.

graph TB Start["StartTopLevelDst:
No exit - Need Killed"] style Start fill:#f9f,stroke:#333,stroke-width:4px Start --> LoopStart LoopEnd --> LoopStart LoopStart --> WaitFor WaitFor((Wait for 1:)) WaitFor --RPC--> RPC WaitFor --Parsec
consensus--> ParsecConsensus RPC((RPC)) RPC --ExpectCandidate--> VoteParsecExpectCandidate ParsecConsensus((Parsec
consensus)) ParsecConsensus --ParsecExpectCandidate--> Balanced VoteParsecExpectCandidate["vote_for(
ParsecExpectCandidate)
to handle the RPC consistently"] VoteParsecExpectCandidate --> LoopEnd Balanced(("Check?
(shared state)")) Balanced -- "shorter_prefix_section(
).is_some()" --> Resend Balanced -- "shorter_prefix_section().is_none()
!waiting_proofing().is_empty()" --> SendRefuse Balanced -- "shorter_prefix_section().is_none()
waiting_proofing().is_empty()" --> SendRelocateResponse Resend["send_rpc(
ExpectCandidate)
to shorter_prefix_section()"] Resend --> LoopEnd SendRefuse["send_rpc(
RefuseCandidate)
to source section"] SendRefuse --> LoopEnd SendRelocateResponse["add_node(NodeState=WaitingCandidateInfo)

send_rpc(RelocateResponse)
to source section"] SendRelocateResponse --> LoopEnd

Relocated node connection

Manage node with NodeState=WaitingConnectionInfo and connection info request/response RPCs.
The candidate sends CandidateInfo to NaeManager of the target interval, so it will reach the managing section after split/merge.
This target interval address could be the middle address of the interval, but the full interval should not be overlapping two section after a split.
When we complete, we either stop responding to CandidateInfo RPCs if it failed, or we send NodeConnected RPC on success,
(We could also omit this RPC, the node would continue sending CandidateInfo until ResourceProof or time out).
When the node reach WaitingProofing state, the section becomes responsible for managing communication with this node as it would any of its adults.

Return true if the given CandidateInfo is a valid CandidateInfo and match a one of our node and it is in WaitingCandidateInfo state
and the given message_src of the current RPC match the CandidateInfo new_public_id.

Return true if the given message_src of the current RPC is a one of our node and is in WaitingConnectionInfo state.

Provides and external entry point to reset the currently processed nodes: Do not reject a node because it took longer than expected.
This will be called for example after merge/split as the new nodes would become voters.

graph TB RelocatedNodeConnection["Start RelocatedNodeConnection"] style RelocatedNodeConnection fill:#f9f,stroke:#333,stroke-width:4px RelocatedNodeConnection --> StartCheckRelocatedNodeConnectionTimeout StartCheckRelocatedNodeConnectionTimeout["schedule(
CheckRelocatedNodeConnectionTimeout)"] StartCheckRelocatedNodeConnectionTimeout --> LoopStart WaitFor(("Wait for 2:")) LoopStart-->WaitFor WaitFor -- Consensus--> ParsecConsensus ParsecConsensus((Consensus)) ParsecConsensus -- "ParsecCheckRelocatedNodeConnection" --> CleanCandidates CleanCandidates["for candidate in both
-waiting_node_connecting()
-CANDIDATES:

purge_node_info(candidate)"] CleanCandidates --> ReStartCheckRelocatedNodeConnectionTimeout ReStartCheckRelocatedNodeConnectionTimeout["CANDIDATES=waiting_node_connecting()

schedule(
CheckRelocatedNodeConnectionTimeout)"] ReStartCheckRelocatedNodeConnectionTimeout --> LoopEnd ParsecConsensus -- "ParsecCandidateInfo
and
is_valid_waited_info(
CandidateInfo,
message_src)" --> SetCandidateInfo SetCandidateInfo["update_node(
CandidateInfo,
NodeState=WaitingConnectionInfo)"] SetCandidateInfo --> LoopEnd ParsecConsensus -- "ParsecCandidateConnected
and
is_valid_waited_connection(
message_src)" --> SetConnected SetConnected["set_node_state(
candidate,
WaitingProofing)

send_rpc(
NodeConnected)

(All communications now
managed by elder like
for other adults)"] SetConnected --> LoopEnd RPC((RPC)) WaitFor --"RPC"--> RPC RPC -- "CandidateInfo
and
none of the other path" --> DiscardRPC DiscardRPC[Discard RPC] DiscardRPC --> LoopEnd RPC -- "CandidateInfo
and
is_valid_waited_info(
CandidateInfo)" --> VoteForCandidateInfo VoteForCandidateInfo["vote_for(
ParsecCandidateInfo)"] VoteForCandidateInfo --> LoopEnd RPC -- "CandidateInfo
and
is_valid_waited_connection(
message_src)" --> SendConnectionInfoRequest SendConnectionInfoRequest["send_rpc(
ConnectionInfoRequest)"] SendConnectionInfoRequest --> LoopEnd RPC -- "ConnectionInfoResponse
and
is_valid_waited_connection(
message_src)" --> ConnectCandidate ConnectCandidate["connect_to_candidate(
ConnectionInfoResponse info)

vote_for(
ParsecCandidateConnected)"] ConnectCandidate --> LoopEnd WaitFor --Event--> Event Event((Event)) VoteParsecCheckRelocatedNodeConnection["vote_for(
ParsecCheckRelocatedNodeConnection)"] Event -- CheckRelocatedNodeConnectionTimeout
expire --> VoteParsecCheckRelocatedNodeConnection VoteParsecCheckRelocatedNodeConnection --> LoopEnd LoopEnd --> LoopStart
graph TB Reset["RelocatedNodeConnection_Reset"] style Reset fill:#19f,stroke:#333,stroke-width:4px EndReset["End RelocatedNodeConnection_Reset"] style EndReset fill:#19f,stroke:#333,stroke-width:4px Reset --> ClearCandidates ClearCandidates["CANDIDATES.clear()

(Give time for new elder to catch up)"] ClearCandidates --> EndReset

Resource proof from a destination's point of view

In the previous diagram, we ensured an incoming candidate is connected and in our peer_list with state WaitingProofing, and elders are ready to communicate with it.
This is maintained accross merge/split as for any adult node, so we are ready to resource proof any node in WaitingProofing state

This leads us here: to the resource proof.
We only process one resource proof at a time (only schedule a new proofing when completing the current one).
When we periodicaly decide to resource proof a node, we check if any node is ready for it: WaitingProofing state, and pick the best candidate (There may be mutliple after a merge).
As an elder, I will send the candidate a ResourceProof RPC. This gives them the "problem to solve". As they solve it, they will send me ResourceProofResponses. These will be parts of the proof. On receiving valid parts, I must send a ResourceProofReceipt. Once they finally send me the last valid part, they passed their resource proof and I vote for ParsecOnline (essentially accepting them as a member of my section).
At any time during this process, they may timeout (The whole process it taking longer than expected), in which case I will decided to reject them and vote for ParsecPurgeCandidate.
This process ends once I reach consensus on either accepting or the candidate (ParsecOnline) or refusing them (ParsecPurgeCandidate).
It is possible that both reach the quorum consensus, in which case the second one will be discarded in the flow above. It won't cause issues as consistency is the only property that matters here: if we accept someone who then went Offline, we will be able to detect they are Offline later with the standard Offline detection mechanism. But it is more likely that they took close to the time limit to complete their proof.

Option<(Candidate, Nonce)> we are currently ressource proofing.
To handle re-doing resource proof after merge/split so we can fairly measure a node, we use the Nonce to differentiate the tries.

Once we've voted a node online, we don't care to handle further ResourceProofResponses from them.
This local variable helps us with this.

These RPCs may continue to be sent by a node we have not accepted, even if consensus was reached to add it.
It's OK to discard the RPC in this case as they are no longer relevant RPCs.

The same node could be accepted by some nodes who would vote ParsecOnline, but also time out for some other nodes who would vote for ParsecPurgeCandidate.
If it's the case, we only want to process the first of these two events and discard the other one.

Provides and external entry point to cancel the currently processed nodes: Restart the resource proofing with all involved voters.
This will be called for example after merge/split as the new nodes would become voters.

graph TB ResourceProof["Start ResourceProof"] style ResourceProof fill:#f9f,stroke:#333,stroke-width:4px ResourceProof --> StartCheckResourceProofTimeout StartCheckResourceProofTimeout["schedule(
CheckResourceProofTimeout)"] StartCheckResourceProofTimeout --> LoopStart WaitFor(("Wait for 2:")) LoopStart-->WaitFor WaitFor -- Consensus--> ParsecConsensus ParsecConsensus((Consensus)) DiscardParsec["Discard
Parsec
event"] ParsecConsensus --ParsecOnline
ParsecPurgeCandidate
not for CANDIDATE--> DiscardParsec DiscardParsec --> LoopEnd ParsecConsensus -- ParsecPurgeCandidate
for CANDIDATE --> RemoveNode ParsecConsensus -- ParsecOnline
for CANDIDATE --> MakeOnline MakeOnline["set_node_state(
Candidate,
OnlineState)

send_rpc(
NodeApproval)"] RemoveNode["purge_node_info(
candidate node)"] RemoveNode --> ReStartCheckResourceProofTimeout MakeOnline --> ReStartCheckResourceProofTimeout ParsecConsensus -- "ParsecCheckResourceProof" --> SetCandidate SetCandidate["CANDIDATE=resource_proof_candidate()

(Best node with state=WaitingProofing)"] SetCandidate -->CheckRequestRP CheckRequestRP((Check)) CheckRequestRP --"CANDIDATE.is_none()" --> ReStartCheckResourceProofTimeout ReStartCheckResourceProofTimeout["CANDIDATE=None>
VOTED_ONLINE==no

schedule(
CheckResourceProofTimeout)"] ReStartCheckResourceProofTimeout --> LoopEnd CheckRequestRP --"CANDIDATE.is_some()"-->RequestRP RequestRP["send_rpc(
ResourceProof)
to CANDIDATE

schedule(TimeoutAccept)"] RequestRP --> LoopEnd RPC((RPC)) WaitFor --RPC--> RPC RPC -- ResourceProofResponse
from CANDIDATE

VOTED_ONLINE==no --> ProofResponse((Proof)) ProofResponse((Check)) SendProofReceit["send_rpc(
ResourceProofReceipt)
for proof"] ProofResponse -- Valid Part --> SendProofReceit VoteParsecOnline["vote_for(
ParsecOnline)

VOTED_ONLINE=yes"] ProofResponse -- Valid End --> VoteParsecOnline DiscardRPC[Discard RPC] RPC -- ResourceProofResponse --> DiscardRPC DiscardRPC --> LoopEnd WaitFor --Event--> Event Event((Event)) VoteParsecPurgeCandidate["vote_for(
ParsecPurgeCandidate)"] Event -- TimeoutAccept
expire --> VoteParsecPurgeCandidate VoteParsecCheckResourceProofTimeout["vote_for(ParsecCheckResourceProof)"] Event -- CheckResourceProofTimeout
expire --> VoteParsecCheckResourceProofTimeout VoteParsecCheckResourceProofTimeout --> LoopEnd SendProofReceit-->LoopEnd VoteParsecOnline --> LoopEnd VoteParsecPurgeCandidate --> LoopEnd LoopEnd --> LoopStart
graph TB Cancel["ResourceProof_Cancel"] style Cancel fill:#19f,stroke:#333,stroke-width:4px EndCancel["End ResourceProof_Cancel"] style EndCancel fill:#19f,stroke:#333,stroke-width:4px Cancel --> CancelCheckResourceProofTimeout CancelCheckResourceProofTimeout["CANDIDATE=None>
VOTED_ONLINE==no

schedule(
CheckResourceProofTimeout)"] CancelCheckResourceProofTimeout --> EndCancel

Source section

As members of a section, each node must keep track of how many "work units" other nodes have performed.
Once a node has done accumulated enough work units to gain age, the section must work together to relocate that node to a new section where they can become 1 age unit older.
These diagrams detail how this happens.

Deciding that a member of our section should be relocated away

In these diagrams, we are modeling the simple version of node ageing that we decided to implement for Fleming:
Work units are incremented for all nodes in the section every time a timeout reaches consenus. Because a quorum of elders must have voted for this timeout, the malicious nodes can't arbitrarily speed up the ageing of their nodes.
Once a node has accumulated enough work units to be relocated, if no other node is currently relocating we set its state to RelocatingState. This node will then be actually relocated in StartRelocateSrc (see StartRelocateSrc).

In the context of Fleming, nodes (especially adults) aren't doing meaningful work such as handling data.
As a proxy, we use a time based metric to estimate how much work nodes have done (i.e: how long they remained Online and responsive).
A local timeout wouldn't do here as it would allow malicious nodes to artificially age nodes in their sections faster. However, by reaching quorum on the fact a timeout happened, we ensure that at least one honest node has voted for the timeout.
All nodes start the WorkUnitTimeout. On expiry, they vote for a WorkUnitIncrement in PARSEC and restart the timer.

This function increments the number of work units for all members of my peer_list (remember that n_work_units is a member of the PeerState struct).

The collection of nodes currently relocating: With node state RelocatingState or WaitRelocateResponseState.
It will most often be empty or has one element, unless a merge occurs in which case it can be bigger.

This function will return the best candidate for relocation, if any.
First, it will only consider members of our peer_list that has the state: OnlineState
We use the condition with nodes_relocating() to control how many OnlineState node we choose to relocate (i.e 1).
It can return for instance the node with the largest number of work units for which the number of work units is greater than 2^age.

This function mutates our peer_list to set the state (for example set RelocatingState for the node).
inputs:
- node
- state
side-effect:
- mutates peer_list

graph TB Start["StartTopLevelSrc:
No exit - Need Killed"] style Start fill:#f9f,stroke:#333,stroke-width:4px Start --> StartWorkUnitTimeOut StartWorkUnitTimeOut["schedule(WorkUnitTimeOut)"] StartWorkUnitTimeOut --> LoopStart LoopEnd --> LoopStart LoopStart --> WaitFor WaitFor((Wait for 3:)) WaitFor --Event--> Event WaitFor --Parsec
consensus--> ParsecConsensus Event((Event)) Event -- WorkUnitTimeOut
Trigger --> VoteParsecRelocationTrigger VoteParsecRelocationTrigger["vote_for(WorkUnitIncrement)
schedule(WorkUnitTimeOut)"] VoteParsecRelocationTrigger --> LoopEnd ParsecConsensus((Parsec
consensus)) ParsecConsensus -- WorkUnitIncrement consensused --> IncrementWorkUnit IncrementWorkUnit["increment_nodes_work_units()"] IncrementWorkUnit-->AlreadyRelocating AlreadyRelocating(("Check?")) AlreadyRelocating --"nodes_relocating().is_empty()"--> SetRelocatingNodeState AlreadyRelocating --"Otherwise"--> LoopEnd SetRelocatingNodeState["set_node_state(get_node_to_relocate(), RelocatingState)"] SetRelocatingNodeState --> LoopEnd

Relocating a member of our section away from it

At this stage, we have decided that one of our section members should be relocated.
The process is quite simple: we send an ExpectCandidate RPC to the destination section. That section will eventually either pass the node to a section with a shorter prefix, send us a RefuseCandidate RPC or eventually send us a RelocateResponse RPC. If the destination picked another section with a shorter prefix, that section will send us one of these RPCs (or suggest another section with a shorter prefix, but this recursion will end at some point).
All in all, we have the certainty that eventually, exactly one of these two RPCs will make it to our section.
When it does, it will be voted in PARSEC, regardless of the order of operations, so it will eventually make it through PARSEC.

If a node in RelocatingState is a non-elder adult node, we can try to relocate it. If they are an elder node, we will first wait for the Adult/Elder promotion/demotion flow to kick in and demote them to an adult. This will happen because we changed their state from OnlineState to RelocatingState, which means that they will get demoted.

On sending ExpectCandidate RPC, we set the node state to WaitRelocateResponseState when so we know after split or merge that we are waiting for response.
If they refused our candidate, our job ends there: We reset the state to RelocatingState and the candidate will simply be a candidate for relocation again later, so we will try to relocate it then.
If they accepted the node and sent us a RelocateResponse RPC, so the ParsecRelocateResponse event reached consensus, we will send to the node that is being relocated the RelocatedInfo the will need.
At this point, we will purge their information since this node isn't a member of our section anymore.
Also of note: While we are preventing OnlineState node from relocating when we are already relocating node, we may be actively relocating more nodes away from our section.
However, we will only do one at a time per CheckRelocateTimeOut event.

Returns the best node to relocate and the target address to send it to
There may be mutliple nodes relocating, for example because of a merge. Take the best one (oldest), and choose a target address.
The target address is one managed by one of our neighbour. This could be random, or the current old_public_id with a single bit of the prefix flipped.
This would would help ensure that source and destination remain neighbours, even if the source split.
Using a target address instead of a section allow to ensure we deliver the message even if the destination splits or merge.

Exactly one of these RPCs will be sent to us from the destination section as a response to our section's ExpectCandidate RPC (see TryRelocating for more context).
When this happens, we will immediately vote for it in PARSEC as we need to act in the same order as anyone else in our section.

graph TB Start["StartRelocateSrc:
No exit - Need Killed"] style Start fill:#f9f,stroke:#333,stroke-width:4px Start --> StartCheckRelocateTimeOut StartCheckRelocateTimeOut["schedule(CheckRelocateTimeOut)"] StartCheckRelocateTimeOut --> LoopStart LoopEnd --> LoopStart LoopStart --> WaitFor WaitFor((Wait for 3:)) WaitFor --Event--> Event WaitFor --RPC--> RPC WaitFor --Parsec
consensus--> ParsecConsensus Event((Event)) Event -- CheckRelocateTimeOut
Trigger --> VoteParsecCheckRelocate VoteParsecCheckRelocate["vote_for(ParsecCheckRelocate)
schedule(CheckRelocateTimeOut)"] VoteParsecCheckRelocate --> LoopEnd RPC((RPC)) RPC --RefuseCandidate--> VoteParsecRefuseCandidate RPC --RelocateResponse--> VoteParsecRelocateResponse VoteParsecRefuseCandidate["vote_for(
ParsecRefuseCandidate)"] VoteParsecRefuseCandidate --> LoopEnd VoteParsecRelocateResponse["vote_for(
ParsecRelocateResponse)"] VoteParsecRelocateResponse --> LoopEnd ParsecConsensus((Parsec
consensus)) ParsecConsensus -- ParsecCheckRelocate
consensused --> CheckNeedRelocate CheckNeedRelocate((Check?)) CheckNeedRelocate--"Otherwise" -->LoopEnd CheckNeedRelocate--"get_best_relocating_node_and_target().is_some()" --> SendExpectCandidate SendExpectCandidate["(node, target)=
get_best_relocating_node_and_target()

send_rpc(
ExpectCandidate(node))
to target NaeManager

set_node_state(
node,
WaitRelocateResponseState)"] SendExpectCandidate --> LoopEnd ParsecConsensus --"ParsecRefuseCandidate
for our node"--> RefusedCandidate RefusedCandidate["set_node_state(
node,
RelocatingState)"] RefusedCandidate --> LoopEnd ParsecConsensus --"ParsecRelocateResponse
for our node"--> SendPovableRelocateInfo SendPovableRelocateInfo["send_rpc(RelocatedInfo)
to node

Includes RelocationResponse
content and consensus
Node may be already gone"] SendPovableRelocateInfo-->PurgeNodeInfos PurgeNodeInfos["purge_node_info(
node)"] PurgeNodeInfos--> LoopEnd

Elder-only

Process for Adult/Elder promotion and demotion including merge

This flow updates the elder status of our section nodes if needed.
Because it is interlinked, it also handles merging section: When merging, no elder change can happen.
However other flows continue, so relocating to and from the section is uninterupted:
We have to be careful that the section follows up on relocation once merged, so we may want to avoid active relocation when merging.

As for incrementing work units, we want to update the eldership status of all nodes in a section on a syncrhonised, regular basis.
For this reason, it makes sense to have a timer going through Parsec.
Note that this timer has to be only as fast as needed so that it remains highly unlikely that 1/3 of the elders in any section would go offline within one timer's duration.

A section sends a Merge RPC to their neighbour section when they are ready to merge at the given SectionInfo digest.
In this flow, we handle both situations:

We vote for this Parsec event on receiving a Merge RPC from our neighbour section.
It contains the information about them that we need for merging. When this PARSEC event reaches consensus in PARSEC, we store that information by calling store_merge_infos.

This function is used to store the merge information from a neighbour section locally.
Once it has been stored, has_merge_infos will return true and we will be ready to enter the ProcessMerge flow.

This function indicates that we received sufficient information from our neighbour section needing a merge, and reached consensus on it.
We are ready to start the merging process with them.

This function indicates that we need merging (as opposed to a merge triggered by our neighbour's needs).
The details for the trigger are still in slight flux, but here are some possibilities:

If any of our elders is not Online, they must be demoted to a plain old adult.
If this happens, the oldest adult must be promoted to the elder state as a replacement.
Alternatively, if any of our Online adult nodes is older than any of our elders, the youngest elder must be demoted and this adult must be promoted.
Note that elder changes are only processed when the section is not in the middle of handling a merge.

graph TB CheckAndProcessElderChange["StartCheckAndProcessElderMergeChange:
No exit - Need Killed"] style CheckAndProcessElderChange fill:#f9f,stroke:#333,stroke-width:4px CheckAndProcessElderChange --> StartCheckElderTimeout StartCheckElderTimeout["schedule(
CheckElderTimeout)"] StartCheckElderTimeout --> LoopStart WaitFor(("Wait for 5:")) LoopStart --> WaitFor WaitFor -- Event --> Event Event((Event)) Event-- CheckElder
Timeout--> VoteCheckElderTimeout VoteCheckElderTimeout["vote_for(
ParsecCheckElderTimeout)"] VoteCheckElderTimeout--> LoopEnd RPC((RPC)) WaitFor -- RPC --> RPC RPC --Merge--> VoteParsecNeighbourMerge VoteParsecNeighbourMerge["vote_for(
ParsecNeighbourMerge)"] VoteParsecNeighbourMerge --> LoopEnd Consensus((Consensus)) WaitFor-- Parsec
consensus --> Consensus Consensus -- "ParsecNeighbourMerge" --> SetNeighbourMerge SetNeighbourMerge["store_merge_infos(ParsecNeighbourMerge info)"] SetNeighbourMerge-->LoopEnd Consensus--"ParsecCheckElderTimeout"-->CheckMergeNeeded CheckMergeNeeded(("Check")) CheckMergeNeeded--"!merge_needed()
and
!has_merge_infos()"-->CheckElderChange CheckElderChange(("Check")) CheckElderChange -- "No
changes" --> RestartTimeout RestartTimeout["schedule(
CheckElderTimeout)"] RestartTimeout-->LoopEnd CheckElderChange --"check_elder_change()

Has elder changes: elder first ordered by:
State=Online then age then name."--> Concurrent0 Concurrent0{"Concurrent
paths"} Concurrent0 --> ProcessElderChange Concurrent0 --> LoopEnd ProcessElderChange["ProcessElderChange(changes)"] style ProcessElderChange fill:#f9f,stroke:#333,stroke-width:4px ProcessElderChange -->CancelResourceProof ResetRelocatedNodeConnection["ResourceProof_Cancel"] style CancelResourceProof fill:#19f,stroke:#333,stroke-width:4px CancelResourceProof --> ResetRelocatedNodeConnection ResetRelocatedNodeConnection["RelocatedNodeConnection_Reset"] style ResetRelocatedNodeConnection fill:#19f,stroke:#333,stroke-width:4px ResetRelocatedNodeConnection --> RestartTimeout CheckMergeNeeded --"merge_needed()
or
has_merge_infos()"-->Concurrent1 Concurrent1{"Concurrent
paths"} Concurrent1 --> ProcessMerge Concurrent1 --> LoopEnd ProcessMerge["ProcessMerge"] style ProcessMerge fill:#f9f,stroke:#333,stroke-width:4px ProcessMerge --> CancelResourceProof LoopEnd --> LoopStart

Process Adult/Elder promotion and demotion needed from last check

Vote for Add for new elders, Remove for no longer elders and NewSectionInfo
This handles any change, it does not care whether one or all elders are changed, this is decided by the calling function.

At any time, there must be exactly NUM_ELDERS (say 10) elders per section.
To maintain this invariant, we must handle multiple eldership changes atomically
We accomplish this by voting for all the membership changes needed at once and waiting for all these votes to reach consensus before reflecting the status change in our chain.

A list of PublicId.
The content of the NewSectionInfo parsec event that reached consensus.

This function updates the eldership status of each node in the chain based on the new section info: the nodes with their public id in new_section_info are the exact set of current elders.
Input:

Side-effect:

graph TB ProcessElderChange["ProcessElderChange
(Take elder changes)"] style ProcessElderChange fill:#f9f,stroke:#333,stroke-width:4px EndRoutine["End of ProcessElderChange
(shared state)"] style EndRoutine fill:#f9f,stroke:#333,stroke-width:4px ProcessElderChange --> MarkAndVoteSwapNewElder MarkAndVoteSwapNewElder["vote_for(Add) for new elders
vote_for(Remove) for now adults nodes
vote_for(NewSectionInfo)

WAITED_VOTES.insert(all votes)"] MarkAndVoteSwapNewElder --> LoopStart WaitFor(("Wait for 5:")) LoopStart --> WaitFor Consensus((Consensus)) WaitFor-- Parsec
consensus --> Consensus Consensus -- "WAITED_VOTES.contains(vote)" --> OneVoteConsensused OneVoteConsensused["WAITED_VOTES.remove(vote)"] OneVoteConsensused --> WaitComplete WaitComplete(("Check?")) WaitComplete--"WAITED_VOTES
.is_empty()
(Wait complete)"-->MarkNewElderAdults MarkNewElderAdults["update_elder_status(new_section_info)"] MarkNewElderAdults--> EndRoutine WaitComplete--"!WAITED_VOTES
.is_empty()
(Wait not complete)"--> LoopEnd LoopEnd --> LoopStart

Handling merges

Vote for ParsecOurMerge, and take over handling any ParsecNeighbourMerge.
Complete when one merge has completed, and a NewSectionInfo is consensused.
If multi-stage merges are required, they will require calling this function again
While in this sanctuary, our SectionInfo shall not be disturbed by elder changes. This stops us from changing our SectionInfo after indicating to our neighbour the last SectionInfo before merge.

This Parsec event indicates to our section members that we are ready to start merging.
We vote for it on entering ProcessMerge.
Reaching consensus on it will lead us to send a Merge RPC to our neighbour section.
We vote for ParsecOurMerge irrespective of whether we are the section that triggered the merge. This allows all sections involved in the merge to receive a Merge RPC, which is how ParsecNeighbourMerge gets voted for.

This PARSEC event indicates that our neighbour section is ready to merge with us.
It is voted for in the StartCheckAndProcessElderMergeChange flow, on receipt of a Merge RPC.
It contains their SectionInfo.

Indicates that we reached consensus on the ParsecOurMerge event.

Store the neighbour's merge info, may not be sibling in case of multi merge

Did we store the neighbour's merge info for our sibling

Remove the stored sibling's merge info and return the NewSectionInfo.

Once we are ready to merge, have received our neighbour's SectionInfo through their Merge RPC, and subsequently reached consensus on the ParsecNeighbourInfo we voted for, we have all the information needed to decide on the membership of our post-merge section.
This is the NewSectionInfo.

With the NewSectionInfo in hands, completing the merge process consists on joining the newly formed section and leaving the old one behind.

graph TB ProcessMerge["ProcessMerge
"] EndRoutine["End of ProcessMerge
(shared state)"] style ProcessMerge fill:#f9f,stroke:#333,stroke-width:4px style EndRoutine fill:#f9f,stroke:#333,stroke-width:4px ProcessMerge --> VoteOurMerge VoteOurMerge["vote_for(
ParsecOurMerge)"] VoteOurMerge --> LoopStart WaitFor(("Wait for 5:")) LoopStart --> WaitFor Consensus((Consensus)) WaitFor-- Parsec
consensus --> Consensus Consensus -- "NewSectionInfo" --> CompleteMerge CompleteMerge["complete_merge()
(Start parsec with new genesis...)"] CompleteMerge --> MarkNewElderAdults MarkNewElderAdults["update_elder_status(new_section_info)"] MarkNewElderAdults--> EndRoutine Consensus -- "ParsecNeighbourMerge" --> SetNeighbourMerge SetNeighbourMerge["store_merge_infos(ParsecNeighbourMerge info)"] SetNeighbourMerge --> CheckMerge CheckMerge((Check)) CheckMerge -- "OUR_MERGE
and
has_sibling_merge_info()" --> VotForNewSectionInfo VotForNewSectionInfo["OUR_MERGE=false
merge_sibling_info_to_new_section()
vote_for(
NewSectionInfo)"] VotForNewSectionInfo--> LoopEnd CheckMerge -- "!OUR_MERGE
or
!has_sibling_merge_info()" --> LoopEnd Consensus--"ParsecOurMerge"-->SendMergeRpc SendMergeRpc["OUR_MERGE=true
send_rpc(Merge)"] SendMergeRpc -->CheckMerge LoopEnd --> LoopStart

Handling members of our section going Online or Offline

graph TB CheckOnlineOffline["CheckOnlineOffline:
No exit - Need Killed"] style CheckOnlineOffline fill:#f9f,stroke:#333,stroke-width:4px CheckOnlineOffline --> LoopStart WaitFor(("Wait for 5:")) LoopStart --> WaitFor LocalEvent((Local
Event)) WaitFor --event--> LocalEvent LocalEvent -- Node detected offline --> VoteNodeOffline VoteNodeOffline["vote_for(
ParsecOffline)"] VoteNodeOffline --> LoopEnd LocalEvent -- Node detected back online --> VoteNodeBackOnline VoteNodeBackOnline["vote_for(
ParsecBackOnline)"] VoteNodeBackOnline --> LoopEnd Consensus((Consensus)) WaitFor-- Parsec
consensus --> Consensus Consensus--"ParsecOffline"-->SetOfflineState SetOfflineState["set_node_state(
node,
OfflineState)"] SetOfflineState -->LoopEnd Consensus -- "ParsecBackOnline" --> SetRelocating SetRelocating["set_node_state(
node,
RelocatingState)"] SetRelocating --> LoopEnd LoopEnd --> LoopStart

Node relocation overview

Successfully relocate a node from source to destination section

sequenceDiagram participant Src as Source Section participant Node as Relocating Node participant Dst as Destination Section Src->>+Dst: Routing RPC: ExpectCandidate Dst-->>-Src: Routing RPC: RelocateResponse Src->>Node: Direct node-to-node RPC: RelocatedInfo loop NodeConnection Node->>Dst: Proxied Routing RPC to group: CandidateInfo Dst->>+Node: Proxied Routing RPC: ConnectionInfoRequest Node-->>-Dst: Proxied Routing RPC: ConnectionInfoResponse end Dst->>Node: Unproxied Group RPC: NodeConnected Dst->>Node: Direct node-to-node RPC: ResourceProof loop ResProof Node->>+Dst: Direct node-to-node RPC: ResourceProofResponse Dst-->>-Node: Direct node-to-node RPC: ResourceProofReceipt end Dst->>Node: Unproxied Group RPC: NodeApproval