Arvo is built on a foundation where all communication occurs through specialised events called ArvoEvent
, which are basically CloudEvent
with a few extension fields to facilitate routing and distributed telemetry. These events flow through a system of services, each acting as an event handler that processes incoming events and potentially generates new ones in response. At its heart, Arvo treats every service as a function with a consistent signature: ArvoEvent => Promise<ArvoEvent[]>
.
The architecture connects all event handlers through a central event broker, creating a service and event mesh where each service maintains its own bounded context and state management. This flat structure enables the broker to route events based on the to
field in the event body, ensuring efficient communication across the system.
Arvo implements a sophisticated error handling mechanism where service-level issues are communicated through specialise "system error" events. These events are strictly scoped to handler operations, while infrastructure or environment-related problems are managed through execution errors for appropriate handling.
Following Martin Fowler's tolerant reader pattern, event handlers in Arvo are designed to receive all events but only process those relevant to their bounded context. When handlers encounter events outside their scope, they emit system error events rather than failing. This approach promotes resilience and clear separation of concerns across the system.
Arvo's event handlers follow the signature ArvoEvent => Promise<ArvoEvent[]>
, enabling powerful service coordination through event chaining. Complex business processes can be modelled as sequences of events flowing through multiple services, while each service maintains its independence and bounded context.
To manage these event chains effectively, Arvo introduces the orchestrator, a specialised event handler responsible for coordinating and defining event chains. It follows a simple yet powerful execution pattern. When it receives an event, it:
While Arvo remains implementation-agnostic about orchestration, it provides a robust implementation ArvoOrchestrator
through the arvo-xstate
TypeScript package. This package leverages xstate
's state machine capabilities to create an ArvoMachine
, allowing developers to model complex workflows while maintaining loose coupling and independent service evolution.
The state-machine-driven approach ensures event chains remain predictable and manageable as system complexity grows. This makes the ArvoOrchestrator
particularly effective for implementing business processes that require coordinated actions across multiple services while preserving the event-driven nature of the system.
For more detailed information about orchestration capabilities, refer to the dedicated ArvoOrchestrator
documentation.
In event-driven architectures, reliable communication between services is crucial yet challenging. Arvo tackles this challenge through a thoughtful approach to coupling and contracts.
As Kent Beck observes, most software issues arise from coupling, and the cost of software maintenance is primarily driven by the cost of change. Changes typically fall into two categories:
Both types of changes can cause issues due to unmanaged coupling. Following Larry Constantine and Ed Yourdon's definition from Structured Design, coupling occurs when a change in one component necessitates changes in other components that depend on it.
Arvo takes a nuanced view of coupling: rather than treating it as inherently negative, it believes coupling should be managed strategically. Even in event-driven architectures, coupling naturally emerges when:
To manage coupling effectively, Arvo introduces a contract-first approach inspired by Meyer's Design by Contract. Instead of services coupling directly to each other, they couple to contracts - simple data structures without business logic. This approach offers several advantages:
In Arvo, every ArvoEvent
contains three essential fields for reliable communication and can be validated by contracts:
type
: The event type (e.g., com.increment.number
)data
: The payload objectdataschema
: Reference to the schema version (e.g., #/service/number/increment/1.0.0
)Services must declare their contracts upfront and validate all communication through them. Following Martin Fowler's Tolerant Reader pattern and Postel's Law, services should:
This structured approach to service communication provides several key benefits:
By treating contracts as first-class citizens in the system, Arvo provides a robust foundation for reliable service communication while managing the inevitable coupling that exists in distributed systems.
ArvoContract
The TypeScript implementation of ArvoContract
in arvo-core
builds upon established software engineering principles and modern technologies. Its design is influenced by seminal works in software engineering:
The implementation also incorporates practical lessons from existing contract and specification technologies:
In Arvo's TypeScript implementation, contracts are elevated to first-class citizens in service development. This makes the contract-first design more approachable and easy to manage. Unlike traditional approaches that separate specifications from implementation, Arvo implements contracts directly in TypeScript using Zod. This integration provides several key benefits:
.describe
method for all the fields in the schema.This approach solves several common challenges in contract-driven development:
toJSON
methodArvo recommends distributing contracts as separate packages, either published independently or as part of a monorepo. This approach ensures contracts serve as a single source of truth, fostering genuine collaboration between services rather than becoming producer-centric documentation like traditional OpenAPI specifications.
Consider a service that interfaces with OpenAI's ChatGPT model. The contract definition would look like this:
import { createArvoContract, InferVersionedArvoContract } from 'arvo-core'
import z from 'zod'
export const openaiCompletions = createArvoContract({
uri: "#/services/openai/completions",
type: "com.openai.completions",
versions: {
'1.0.0': {
accepts: z.object({
model: z.enum(['gpt-4', 'gpt-4o']),
messages: z.object({
role: z.enum(['user', 'assistant']),
content: z.string()
}).array(),
system_command: z.string()
}),
emits: {
'evt.openai.completions.success': z.object({
response: z.string()
}),
'evt.openai.completions.error': z.object({
error: z.string()
})
}
}
}
})
The contract system provides comprehensive type inference capabilities. You can select specific versions using the .version()
method, which offers IDE-assisted version selection. The system allows for inference of both accept and emit data types, providing a robust type-safe development experience.
The
createArvoContact
is a factory function which creates theArvoContract
.
ArvoContract provides sophisticated type inference capabilities that "just works" and make working with contracts intuitive and type-safe. The system offers multiple ways to work with contract types, each suited to different use cases.
When working with contracts, developers can select specific versions using the .version()
method:
const version1 = openaiCompletions.version('1.0.0');
This method leverages TypeScript's type inference to provide IDE-assisted version selection, making it impossible to select non-existent versions. Once a version is selected, you can access its schema and infer types directly:
const version1Accept = version1.accepts.schema;
type DirectAcceptType = z.infer<typeof version1Accept>;
This approach yields fully typed interfaces representing the contract's accept types:
type DirectAcceptType = {
model: 'gpt-4' | 'gpt-4o'
messages: { role: 'user' | 'assistant', content: string }[]
system_command: string
}
For more comprehensive type information, ArvoContract provides the InferVersionedArvoContract
utility type:
type ContractType = InferVersionedArvoContract<typeof version1>;
type AcceptType = ContractType['accepts']['data'];
This utility provides access to all contract-related types, including emitted event types and system error types:
// Access emit types for specific events
type EmitType = ContractType['emits']['evt.openai.completions.success']['data']
// Access system error types
// The default system error type can be resolved via
type SystemErrorType = ContractType['systemError']['type'];
// = sys.${contract.type}.error = sys.com.openai.completions.error
type SystemErrorType = ContractType['systemError']['data'];
// = ArvoErrorType
dataschema
ConstructionThe dataschema
field in ArvoEvent
plays a crucial role in version management and event validation within the Arvo system. This field is constructed by combining two essential pieces of information: the contract's URI and its version number. Let's explore how this works and why it matters.
The dataschema
field follows a simple yet powerful construction pattern:
${ArvoContract.uri}/${version}
For example, if we have a contract with:
"#/services/openai/completions"
"1.0.0"
The resulting dataschema
would be:
#/services/openai/completions/1.0.0
This construction pattern serves several important purposes in the Arvo ecosystem:
Unique Identification: By combining the contract URI with the version, each schema gets a unique identifier. This prevents any ambiguity about which version of a contract an event is using.
Version Validation: Event handlers can easily extract the version information from the dataschema
field to ensure they're capable of processing that specific version of the event.
Contract Evolution: As contracts evolve, new versions can be added while maintaining backward compatibility. The dataschema
field makes it clear which version of the contract an event adheres to.
Consider a payment processing system:
const paymentContract = createArvoContract({
uri: "#/services/payments/transaction",
versions: {
'1.0.0': {
accepts: z.object({
amount: z.number(),
currency: z.string(),
}),
emits: {},
},
'2.0.0': {
accepts: z.object({
amount: z.number(),
currency: z.string(),
metadata: z.record(z.string(), z.string())
/**
* Adding new non-optional field makes the contract
* more restricted and hence adds a breaking change
* and warrents a new contract version.
*/
}),
emits: {}
}
}
});
When events are created using this contract:
dataschema
: "#/services/payments/transaction/1.0.0"
dataschema
: "#/services/payments/transaction/2.0.0"
This clear identification helps services understand exactly which version of the contract they're dealing with, enabling proper handling and validation of the event data structure.
While createArvoContract
provides complete flexibility for contract creation, many services follow simpler patterns. The arvo-core
package includes specialized factory functions to streamline contract creation for common use cases.
createSimpleArvoContract
For services that follow a basic request-response pattern, createSimpleArvoContract
offers a more concise way to define contracts. This factory:
com.{type}
evt.{type}.success
sys.com.{type}.error
One powerful aspect of Arvo's TypeScript-first approach is the ability to reuse schema definitions. This is particularly valuable when multiple services share similar interfaces within the same bounded context.
Schema reuse should be approached carefully to avoid creating unintended coupling between services. Consider reusing schemas when:
Let's look at a practical example where schema reuse makes sense - Generative AI services that share common parameters and response structures.
First, we define our base schemas in commons/schema.base.genai.ts
:
import { z } from 'zod';
/**
* Base input schema for GenAI operations.
* Defines common parameters used across different AI service providers.
*/
export const accept = z.object({
max_tokens: z.number()
.min(10, 'The minimum number of completion tokens must be 10')
.max(4096, 'The maximum number of completion tokens must be 4096')
.describe('The maximum output tokens.')
.default(4096),
system_command: z.string()
.describe('The persona the language model should assume')
.default('You are a helpful assistant'),
temperature: z.number()
.min(0)
.max(1)
.describe('Controls output randomness')
.default(0.5),
json_response: z.boolean()
.describe('Request JSON formatted response')
.default(false),
messages: z.array(
z.object({
role: z.enum(['user', 'assistant']),
content: z.string()
})
).min(1)
});
/**
* Base output schema for GenAI operations.
* Defines common response structure and metadata.
*/
export const emit = z.object({
json_valid: z.boolean().nullable()
.describe('Indicates if output is valid JSON when requested'),
message: z.object({
role: z.literal('assistant'),
content: z.string()
}),
usage: z.object({
tokens: z.object({
prompt: z.number(),
completion: z.number(),
total: z.number()
}),
time_ms: z.object({
to_first_token: z.number(),
average_token: z.number(),
total: z.number()
})
}),
stop_reason: z.enum(['stop', 'length', 'content_filter'])
});
Then, we can create specific service contracts that extend these base schemas:
import { createSimpleArvoContract, type InferVersionedArvoContract } from 'arvo-core';
import { z } from 'zod';
import * as BaseGenAISchema from '../commons/schema.base.genai';
// Defining Anthropic service contract
export const anthropic = z.enum([
'claude-3-5-sonnet-20240620',
'claude-3-sonnet-20240229',
'claude-3-opus-20240229',
'claude-3-haiku-20240307',
]);
export const anthropicCompletions = createSimpleArvoContract({
uri: '#/services/anthropic/completions',
type: 'anthropic.completions', // = 'com.anthropic.completions'
versions: {
'1.0.0': {
accepts: BaseGenAISchema.accept.merge(
z.object({
model: anthropic.default('claude-3-haiku-20240307'),
}),
),
emits: BaseGenAISchema.emit,
// type = 'evt.anthropic.completions.success'
},
},
});
// system error type = 'sys.com.anthropic.completions.error'
// Defining OpenAI service contract
export const openai = z.enum(['gpt-4o', 'gpt-4-turbo', 'gpt-4o-mini']);
export const openaiCompletions = createSimpleArvoContract({
uri: '#/services/openai/completions',
type: 'openai.completions', // = 'com.openai.completions'
versions: {
'1.0.0': {
accepts: BaseGenAISchema.accept.merge(
z.object({
model: openai.default('gpt-4o-mini'),
}),
),
emits: BaseGenAISchema.emit,
// type = 'evt.openai.completions.success'
},
},
});
// system error type = 'sys.com.openai.completions.success'
This approach provides several benefits:
The resulting contracts maintain individual service boundaries while sharing common structures where appropriate. Service-specific additions (like model selection) are cleanly merged with the base schema.
An orchestrator in Arvo is a special type of event handler that coordinates complex workflows across multiple services. Rather than performing tasks directly, it emits command events to other services and manages their responses. For example, in a document processing system, an orchestrator might coordinate between a text extraction service, a translation service, and a storage service.
What makes orchestrators unique is their ability to maintain state and make decisions based on the responses they receive. They follow Arvo's event handler pattern (ArvoEvent => Promise<ArvoEvent[]>
), but they're designed specifically for coordination rather than direct task execution.
Each orchestrator execution is uniquely identified by an event subject. When orchestrators need to coordinate, they use the parentSubject$$
field to establish execution context relationships. The root orchestrator sets parentSubject$$
to null
, while child orchestrators receive the parent's subject as their parentSubject$$
.
Note: An orchestrator defines a workflow, while an orchestration execution is a specific instance of that workflow. Each execution is identified by a unique subject string that's included in all related ArvoEvents. The orchestrator uses this subject to track execution state in storage (memory or database). When processing events, the orchestrator optimistically locks the execution state to handle parallel events or multiple service responses safely.
When an orchestrator completes its execution, it determines its completion event's subject based on the parentSubject$$
received during initialization. For root executions (parentSubject$$ = null
), it uses its own subject. For child executions, it uses the parentSubject$$
value, effectively returning control to the parent orchestrator.
The parentSubject$$
field is strictly for orchestrator coordination and should never be used in communication with regular services. This separation maintains clean boundaries between orchestration logic and service implementation.
The createArvoOrchestratorContract
factory simplifies creating contracts for orchestration services. It automatically handles common orchestration patterns and establishes consistent event naming:
import { createArvoOrchestratorContract, ArvoErrorSchema } from 'arvo-core';
import { z } from 'zod';
import * as LLMs from '../commons/genai.llms';
import * as BaseGenAISchema from '../commons/schema.base.genai';
export const createOrchestratorCompletionSchema = <T extends z.AnyZodObject>(schema: T) => {
return z.object({
status: z.enum(['success', 'error']),
errors: ArvoErrorSchema.array().nullable(),
result: schema.nullable(),
});
}
// Define which AI models are supported
export const llmModelSchema = z.union([
// Anthropic models
z.object({
provider: z.literal('anthropic'),
model: LLMs.anthropic.default('claude-3-haiku-20240307'),
}),
// OpenAI models
z.object({
provider: z.literal('openai'),
model: LLMs.openai.default('gpt-4o-mini'),
}),
]);
// Create the orchestrator contract
export const llmOrchestrator = createArvoOrchestratorContract({
uri: '#/orchestrators/llm',
name: 'llm',
versions: {
'1.0.0': {
// Initial event schema
init: BaseGenAISchema.accept.merge(
z.object({
model: llmModelSchema,
})
),
// Completion event schema
complete: createOrchestratorCompletionSchema(
BaseGenAISchema.emit.merge(
z.object({
model: llmModelSchema,
})
)
),
},
},
});
The factory automatically generates consistent event types:
arvo.orc.{name}
(e.g., arvo.orc.llm
)arvo.orc.{name}.done
(e.g., arvo.orc.llm.done
)sys.arvo.orc.{name}.error
(e.g., sys.arvo.orc.llm.error
)The orchestrator contract provides full TypeScript type inference. You can access the contract's types for both initialization and completion events:
const version1 = llmOrchestrator.version('1.0.0');
type ContractType = InferVersionedArvoContract<typeof version1>;
// Get the initialization event type and data structure
type InitType = ContractType['accepts']['type']; // "arvo.orc.llm"
type InitDataType = ContractType['accepts']['data']; // Full data structure
// Get the completion event type and data structure
type CompleteType = ContractType['emits'][ContractType['metadata']['completeEventType']]['type'];
type CompleteDataType = ContractType['emits'][ContractType['metadata']['completeEventType']]['data'];
// The metadata are additional data inserted via the factory
This type safety ensures that orchestration events are properly structured throughout your system, making it easier to maintain and evolve complex workflows over time.
Let me help improve the Event Creation section by restructuring it to be clearer and more comprehensive. I'll organize it to build understanding from fundamental concepts to practical implementation.
Event creation is a core feature of the Arvo system that ensures type safety and validation through the ArvoContract
. Let's understand how event creation works and explore the tools provided by arvo-core
to make this process robust and developer-friendly.
The event creation system in Arvo is built around three key elements:
ArvoEvent
- The base event type that represents all events in the systemArvoContract
- A validation layer that ensures events conform to specified schemasThe primary tool for creating events is the ArvoEventFactory
, which provides two main methods for event creation:
.accepts()
- Creates events that a service can receive.emits()
- Creates events that a service can send.systemError()
- Creates the system error eventLet's explore each usage pattern in detail.
When creating events that your service will receive, use the accepts()
method. Here's a complete example:
import { createArvoEventFactory } from 'arvo-core';
// Initialize the factory with a specific contract version
const factory = createArvoEventFactory(anthropicCompletions.version('1.0.0'));
// Create an event the service can accept
const receivableEvent = factory.accepts({
source: 'com.test.test',
subject: 'test-subject',
data: {
messages: [
{
role: 'user' as const,
content: 'Hello World',
},
],
},
});
// dataschema = '#/services/anthropic/completions/1.0.0'
The factory automatically handles several important aspects:
dataschema
fieldFor events that your service will emit, use the emits()
method. This requires additional configuration since services can emit multiple event types:
import { createArvoEventFactory } from 'arvo-core';
// Initialize the factory with a specific contract version
const factory = createArvoEventFactory(anthropicCompletions.version('1.0.0'));
// Create an event the service can emit
const emittableEvent = factory.emits({
source: 'com.test.test',
subject: 'test-subject',
type: 'evt.anthropic.completions.success', // Specify the event type
data: {
json_valid: null,
message: {
role: 'assistant' as const,
content: 'Hello World',
},
stop_reason: 'stop',
usage: {
time_ms: {
to_first_token: 0,
average_token: 0,
total: 0
},
tokens: {
prompt: 0,
completion: 0,
total: 0
}
}
},
});
// dataschema = '#/services/anthropic/completions/1.0.0'
The ArvoEventFactory
provides a specialized .systemError()
method for handling system-level errors in a standardized way. This method creates error events that follow Arvo's system-wide error handling conventions.
import { createArvoEventFactory } from 'arvo-core';
// Initialize the factory with a specific contract version
const factory = createArvoEventFactory(anthropicCompletions.version('1.0.0'));
const errorEvent = factory.systemError({
source: 'com.test.test',
subject: 'test-subject',
error: new Error("Some Error")
});
// dataschema = '#/services/anthropic/completions/0.0.0'
Key features of system error events:
0.0.0
in the dataschema
regardless of the contract versionerror
field instead of data
to properly capture error informationLet me rewrite the ArvoOrchestratorEventFactory section to make it clearer and more comprehensive.
ArvoOrchestratorEventFactory
While regular events can be created using the standard ArvoEventFactory
, orchestrator events have unique characteristics that warrant their own specialized factory. The ArvoOrchestratorEventFactory
is designed specifically for creating events that coordinate complex workflows across multiple services.
.init()
The .init()
method creates events that start new orchestration workflows. These events are particularly special because they can establish parent-child relationships between different orchestration processes. Here's how it works:
import { createArvoOrchestratorEventFactory } from 'arvo-core';
// Create a factory for a specific version of our LLM orchestrator
const factory = createArvoOrchestratorEventFactory(llmOrchestrator.version('1.0.0'))
// Create an initialization event
const initEvent = factory.init({
source: 'com.test.test',
data: {
// null parentSubject$$ indicates this is starting a new workflow
parentSubject$$: null,
model: {
provider: 'openai'
},
messages: [
{
role: 'user',
content: "Hello world"
}
],
}
})
The initialization event sets up the initial state and parameters for the workflow. The parentSubject$$
field is particularly important:
null
, it starts a new independent workflowThis enables building complex, nested orchestration patterns while maintaining clear relationships between workflows.
.complete()
The .complete()
method creates events that signal the completion of orchestrated workflows. These events carry the final results or any error information:
import { createArvoOrchestratorEventFactory } from 'arvo-core';
const factory = createArvoOrchestratorEventFactory(llmOrchestrator.version('1.0.0'))
// Create a completion event with successful results
const completionEvent = factory.complete({
subject: 'test-subject',
source: 'com.test.test',
data: {
status: 'success',
errors: null, // No errors in this case
result: {
model: {
provider: 'openai',
model: 'gpt-4-turbo',
},
json_valid: null,
message: {
role: 'assistant',
content: 'Hello World',
},
stop_reason: 'stop',
usage: {
time_ms: {
to_first_token: 0,
average_token: 0,
total: 0,
},
tokens: {
prompt: 0,
completion: 0,
total: 0,
},
},
},
},
});
Completion events are structured to provide comprehensive information about the workflow's outcome:
This structure ensures that orchestrators can make informed decisions about workflow progression.
.systemError()
The .systemError()
method creates standardized error events for orchestration-specific failures:
import { createArvoOrchestratorEventFactory } from 'arvo-core';
const factory = createArvoOrchestratorEventFactory(llmOrchestrator.version('1.0.0'));
// Create an error event for orchestration failures
const errorEvent = factory.systemError({
source: 'com.test.test',
subject: 'test-subject',
error: new Error("Orchestration timeout exceeded")
})
System error events in orchestration contexts are particularly important because they can affect multiple services and workflows. The factory ensures these errors are properly structured and can be handled appropriately by the orchestration system.
These three event types work together to create a complete orchestration lifecycle:
systemError
event indicates a workflow failureThis structured approach to event creation helps maintain clear and predictable orchestration patterns, even in complex distributed systems.
Contract evolution in Arvo follows a simple but powerful principle: services remain independent by evolving behind stable contracts, while the contracts themselves evolve through semantic versioning. While ArvoContract
provides the technical mechanism for versioning, successful contract evolution requires thoughtful strategy beyond just code.
Drawing from Meyer's Design by Contract and Fowler's Event Sourcing work, Arvo adopts a clear rule for contract versioning: create a new version for any breaking change that makes the contract more restrictive or changes its restrictions, while non-breaking changes that maintain or reduce restrictions can be handled within the existing version. This approach ensures predictable system evolution while maintaining backward compatibility where possible.
Let's explore how different components of ArvoContract
evolve:
uri
introduce breaking changes, but since ArvoEventFactory
handles URI updates automatically, developers rarely need to manage this directly.type
creates a breaking change requiring service updates.Note: When creating a new
ArvoContract
, begin by modeling your contracts after your existing working code, API endpoints, or the general idea of your service's inputs and outputs. Once you have this working foundation, you can gradually refine the contract to be more precise. Remember - a good contract evolves from practical use rather than perfect upfront design. You are required to follow the following evolution patterns only when your first version goes to production after all.
accepts
data schemaThe evolution of accepts
schemas must be managed carefully to maintain system reliability. Here's a comprehensive breakdown of different schema changes and their implications:
Change Type | Breaking Change? | Least effort version update | Explanation |
---|---|---|---|
Adding Required Field | Yes | New Version | Makes contract more restrictive by requiring additional data |
Adding Optional Field | No | Same Version | Maintains existing contract restrictions while allowing additional data |
Adding Union Type | No | Same Version | Increases permissiveness while preserving existing functionality |
Removing Union Type | Yes | New Version | Makes contract more restrictive by limiting allowed types |
Changing Field Type | Yes | New Version | Changes nature of contract restrictions |
Removing Any Field | Limited Breaking | Same Version | Following the tolerant reader pattern, removing fields from 'accepts' schema is not a breaking change. Clients can continue sending removed fields, while handlers simply ignore them. TypeScript's type safety ensures handlers don't accidentally reference removed fields during compilation. |
emits
data schemaThe emits
field in Arvo contracts defines the event schemas a service can produce. Each contract version can specify multiple emit event types, and their evolution follows specific patterns to maintain system reliability:
Change Type | Example | Impact | Least effort version update | Rationale |
---|---|---|---|---|
Adding New Event Type | Adding evt.payment.refunded |
Non-Breaking | Same Version | The new event type doesn't affect existing event handlers since they'll ignore events they don't recognize. Consumers and producers can add support gradually. |
Adding New Event Type (High Throughput) | Adding evt.payment.refunded |
Breaking (Tradeoff) | New Version | If a service might be used by an orchestrator, multiple event types in the same version will cause locking in that orchestrator, regardless of service statelessness. Only put high-throughput events (like your 100K/sec) in the same version if you're absolutely certain no orchestrator will ever use this service. Otherwise, create a new version to avoid orchestrator bottlenecks. While this duplicates code, it enables independent optimization. |
Removing Event Type | Removing evt.payment.failed |
Breaking | New Version | Existing consumers rely on this event for their business logic. Removal breaks their functionality and requires rewrite. |
Adding Required Field | Adding transaction_id: string |
Limited Breaking | Special Case | Only producer needs changes to provide new field. Consumers following tolerant reader pattern continue working. |
Adding Optional Field | Adding metadata?: object |
Non-Breaking | Same Version | Optional fields don't break existing consumers since they can ignore unknown fields. Allows gradual adoption of new capabilities. |
Expanding Union Type | status: 'success' | 'fail' → adds 'pending' |
Non-Breaking | Same Version | Existing code handles known values and ignores new ones by design. Maintains backward compatibility. |
Changing Field Type | amount: number → amount: string |
Breaking | New Version | Fundamental change in data representation. Existing parsing, validation, and business logic would fail. |
Removing Union Value | status: 'success' | 'fail' | 'pending' → removes 'pending' |
Breaking | New Version | Consumers using 'pending' in business logic or state machines would break. Cannot safely remove without coordination. |
Removing Field | Removing timestamp |
Breaking | New Version | Any consumer logic depending on this field would fail. Cannot assume field is unused without breaking consumer contract. |
Creating robust contracts requires careful consideration of field requirements. Two common anti-patterns to avoid:
ArvoContract
takes a distinct approach to versioning that differs from traditional semantic versioning systems. Each version in ArvoContract exists as a completely isolated, standalone definition. When version 1.0.0 exists and a version 1.1.0 is created, these are treated as entirely separate contracts with no implicit relationship or compatibility between them.
This isolation has important implications for how we should approach contract evolution:
When to Stay in Current Version: If you're adding capabilities that don't break existing consumers - like optional fields, new event types, or expanded enum values - these can be added to the current version. The key test is: will existing consumers continue working without any changes? If yes, enhance the current version.
When to Create New Version: Any breaking change requires a new version. This includes adding required fields, changing field types, removing fields, or altering how existing fields work. The new version is treated as completely separate from previous versions.
This approach provides absolute clarity about contract behavior. Each version is self-contained, making it impossible to have subtle compatibility issues. While this might seem overly simple, it actually handles the complexity of distributed systems better than elaborate versioning schemes.
Key Implementation Strategy:
This versioning philosophy acknowledges that in distributed systems, clear boundaries and explicit contracts are more valuable than complex compatibility promises.
Arvo represents a thoughtful evolution in event-driven system design, addressing the fundamental challenges of building reliable and maintainable distributed systems. At its core, Arvo's contract-first approach transforms how services communicate by making contracts first-class citizens in the development process. This isn't just a technical choice – it's a strategic decision that brings clarity to service boundaries, ensures type safety, and provides a clear path for system evolution.
What makes Arvo particularly powerful is its pragmatic approach to managing change. Rather than treating coupling as inherently negative, it acknowledges that some coupling is inevitable and provides tools to manage it effectively. The combination of TypeScript-based contracts, semantic versioning, and clear evolution patterns creates a framework where services can evolve independently while maintaining system-wide reliability. The framework's sophisticated event creation and validation mechanisms, paired with its orchestration capabilities, enable developers to build complex distributed systems that remain maintainable as they grow.
Perhaps most importantly, Arvo's design reflects a deep understanding of real-world challenges in distributed systems. By drawing inspiration from established patterns like Design by Contract and Event Sourcing, while incorporating modern tools like TypeScript and Zod, it provides a practical solution that balances theoretical correctness with developer experience. The result is a framework that not only helps build reliable distributed systems today but also provides a clear path for evolving them tomorrow.
Historically, service contracts have played a crucial role in distributed systems development. As systems grew more complex and distributed, the need for formal service contracts became increasingly apparent, leading to several significant developments in contract definition and management.
WSDL emerged in the early 2000s as part of the SOAP/XML web services ecosystem, representing one of the first comprehensive approaches to formal service contracts. At its core, WSDL provided a standardized way to describe network services as collections of endpoints operating on messages. The XML-based WSDL documents served as formal contracts between service providers and consumers, defining everything from data types to operation signatures.
A typical WSDL contract included:
The WSDL approach introduced several important concepts that influenced future contract systems:
However, WSDL faced significant challenges in practice. The XML-based specifications were verbose and difficult to maintain. The code generation approach, while ensuring type safety, created a rigid development workflow that struggled to adapt to changing requirements. Development teams found themselves managing complex build processes to regenerate code whenever contracts changed, leading to deployment coordination challenges across services.
The separation between contract specification (WSDL) and implementation code also created friction in the development process. Developers had to context-switch between XML specifications and implementation code, making it harder to understand and evolve services. The tooling-dependent workflow often led to version mismatches and integration problems.
Protocol Buffers, developed at Google, emerged as a transformative approach to service contracts and data serialization. Unlike its predecessors, Protocol Buffers introduced a simple Interface Definition Language (IDL) that struck a balance between human readability and machine efficiency. The system's core innovation was its approach to contract definition through .proto
files, which serve as the single source of truth for data structures and service interfaces. These definitions could then be compiled into type-safe code for multiple programming languages, ensuring consistency across different parts of a distributed system.
What truly set Protocol Buffers apart was its approach to versioning and compatibility. By assigning unique numbers to each field in a message definition, Protocol Buffers created a robust system for schema evolution. These field numbers, once assigned, become a permanent part of the contract, ensuring that older clients can still read newer messages and newer clients can process older messages. This backward and forward compatibility was achieved without requiring complex transformation layers or version negotiation protocols. The system also introduced clear rules about what changes were safe to make: fields could be added if they were optional, required fields couldn't be removed, and field numbers could never be reused.
The binary serialization format of Protocol Buffers provided significant performance advantages over text-based formats like XML or JSON. Messages were smaller, faster to serialize and deserialize, and type-safe by design. This efficiency made Protocol Buffers particularly well-suited for high-performance systems and internal service communication. The combination of clear contract definitions, efficient serialization, and strong compatibility guarantees made Protocol Buffers a standard tool in distributed system development, influencing how future contract systems would approach these challenges.
Protocol Buffers, developed at Google, emerged as a transformative approach to service contracts and data serialisation. They introduced a simple Interface Definition Language (IDL) that struck a balance between human readability and machine efficiency. Consider this typical proto definition:
syntax = "proto3";
package payment;
message PaymentRequest {
string payment_id = 1;
double amount = 2;
string currency = 3;
PaymentType type = 4;
map metadata = 5;
}
enum PaymentType {
CREDIT_CARD = 0;
BANK_TRANSFER = 1;
CRYPTO = 2;
}
service PaymentService {
rpc ProcessPayment (PaymentRequest) returns (PaymentResponse);
}
While this looks straightforward, the development workflow becomes complex. Teams need to:
.proto
filesprotoc
to generate codeWhat truly set Protocol Buffers apart was its sophisticated approach to versioning and compatibility through field numbers. However, this created its own challenges:
// Version 1
message User {
string name = 1;
string email = 2;
}
// Version 2 - Can't remove email or reuse field 2
message User {
string name = 1;
// email removed but number 2 is forever reserved
string phone = 3; // Must use new number
Address address = 4; // New field
}
// This creates "holes" in field numbers over time
message User {
string name = 1;
// 2 reserved forever
string phone = 3;
Address address = 4;
string preferred_name = 8; // Added later
// 5,6,7 used in abandoned features
}
The binary serialisation format provided significant performance advantages but introduced practical challenges:
I'll help improve the practicality section to be more balanced and better structured. Here's a revised version:
While Protocol Buffers offer robust service contracts and efficient serialization, several practical considerations affect their adoption in modern microservices architectures:
These practical challenges often lead teams to carefully evaluate whether Protocol Buffers' benefits outweigh their complexity, particularly in smaller microservices architectures or when interoperability with external systems is a priority. However, for large-scale distributed systems with strong consistency requirements, the investment in Protocol Buffers can pay off through improved type safety and performance. I'll write a comprehensive section on OpenAPI that covers all our discussion points.
OpenAPI Specification has emerged as a dominant standard for describing RESTful APIs, evolving from its origins as Swagger. It provides a language-agnostic approach to API documentation and contract definition using YAML or JSON format.
openapi: 3.0.0
info:
title: Payment API
version: 1.0.0
paths:
/payments:
post:
summary: Process a payment
requestBody:
required: true
content:
application/json:
schema:
type: object
properties:
payment_id:
type: string
amount:
type: number
minimum: 0
currency:
type: string
enum: [USD, EUR, GBP]
responses:
'200':
description: Payment processed successfully
The specification offers several advantages:
Modern frameworks have significantly improved OpenAPI's developer experience:
# FastAPI example of automatic spec generation
from fastapi import FastAPI
app = FastAPI()
@app.post("/payments")
async def create_payment(payment: Payment):
# OpenAPI spec automatically generated
# Validation included
return {"transaction_id": "123"}
This integration provides:
Versioning remains one of OpenAPI's most significant challenges, stemming from its lack of built-in version management mechanisms. Unlike Protocol Buffers with its explicit field numbering system, OpenAPI leaves versioning strategies up to implementation teams, leading to inconsistent approaches across the industry. Teams often struggle between URL-based versioning (/v1/resource) and header-based versioning, each bringing its own complications. This absence of standardised versioning makes it particularly challenging to maintain multiple API versions simultaneously or to implement automated compatibility checking. Without clear guidelines for breaking versus non-breaking changes, teams often find themselves maintaining complex documentation to track version differences and struggling to ensure backward compatibility.
The producer-centric nature of modern OpenAPI implementations presents another significant limitation, particularly with contemporary frameworks and tools. While frameworks like FastAPI and HonoJS have made OpenAPI more developer-friendly, they've inadvertently shifted the specification away from its contract-first origins. The contract now typically emerges as a byproduct of implementation code, with data models (like Pydantic or Zod) serving double duty as both application models and contract definitions. This coupling of application data with contracts creates a one-sided dynamic where producers unilaterally define and modify contracts without meaningful input from consumers. The result is a less collaborative contract environment where consumers have limited guarantees about contract stability and often discover changes only after they're implemented.
Technical limitations further constrain OpenAPI's effectiveness as a complete contract solution. Being exclusively focused on REST/HTTP APIs, it lacks the versatility of protocol-agnostic solutions like Protocol Buffers. The specification's runtime validation approach, while flexible, doesn't provide the same level of safety as compile-time checks. Teams must also grapple with the perpetual challenge of keeping specifications and implementation code synchronised, or in the case of modern tooling, accept that OpenAPI becomes more of a documentation tool than a true contracting mechanism. This documentation-first approach, while valuable for API understanding, falls short of providing the robust contract enforcement and evolution capabilities needed in complex distributed systems.
Despite its limitations, OpenAPI is a powerful tool for API documentation and contract definition, especially when combined with modern frameworks. Its success in the REST API space demonstrates the value of standardised API descriptions, even with the challenges of maintaining true contract-first development practices.
AsyncAPI emerged as a response to the growing adoption of event-driven architectures and the limitations of OpenAPI in describing asynchronous APIs. While OpenAPI excelled at documenting RESTful endpoints, it couldn't adequately capture the complexities of message-based systems, WebSocket connections, or pub/sub patterns. AsyncAPI filled this gap by providing a specification designed specifically for asynchronous APIs while maintaining familiar concepts from OpenAPI.
asyncapi: 2.5.0
info:
title: Payment Events API
version: 1.0.0
channels:
payment/processed:
publish:
message:
payload:
type: object
properties:
payment_id:
type: string
status:
type: string
enum: [SUCCESS, FAILED]
timestamp:
type: string
format: date-time
subscribe:
message:
payload:
type: object
properties:
payment_id:
type: string
amount:
type: number
currency:
type: string
AsyncAPI brings several unique advantages to asynchronous service contracts:
Message Flow and Correlation: AsyncAPI provides basic support for message correlation through the Correlation ID Object (as defined in the specification), which allows for message tracing and correlation using runtime expressions. However, the specification focuses primarily on describing individual message structures and channels rather than complex message flows. While correlation IDs provide a way to link related messages, the specification itself does not provide explicit constructs for documenting complex event chains or choreography patterns. This is actually a fair limitation of the AsyncAPI. There has been discussion on this front in Github Issue
Protocol-Specific Features: The AsyncAPI specification aims to be protocol-agnostic, but this creates challenges in representing protocol-specific features. As documented in the AsyncAPI bindings section, while the specification provides binding objects for different protocols, these don't fully capture all protocol-specific behaviours. Again from a contracts stand point that does not seem to be a very problematic issue.
Tooling Ecosystem: The AsyncAPI tooling ecosystem, while growing, is still maturing. The official AsyncAPI Generator and AsyncAPI Studio provide basic functionality, but as noted in the AsyncAPI community discussions, there are gaps in areas like:
Contract Implementation Model: The AsyncAPI specification's current trajectory seems mirrors OpenAPI's evolution in the REST space, focusing primarily on documentation capabilities. While this approach excels at API documentation and discovery, it can lead to a producer-centric implementation model where contracts become descriptive rather than prescriptive. This challenges the original contract-first design philosophy, potentially impacting the specification's effectiveness as a strict contract enforcement mechanism in event-driven architectures.
Despite these challenges, AsyncAPI represents a crucial evolution in service contract specifications, particularly as systems increasingly adopt event-driven architectures. Its ability to document asynchronous interactions in a standardised way provides valuable insights into system behaviour and helps teams maintain consistency across distributed systems. As the specification and its ecosystem continue to mature, it's likely to become an essential tool in the modern API landscape, complementing existing specifications like OpenAPI and Protocol Buffers.