Temporal is a company/project to watch. The founders know the problem space with as much depth as might be possible. They were responsible for AWS Simple Workflow, then built Cadence Workflow at Uber to solve Uber's workflow/orchestration issues - and later formed Temporal and forked Cadence to build a company around.
This is super interesting but I have what feels like a basic/dumb question...
Let's say I have some workflow which executes over a period of days or weeks, and I want to deploy a new version of it. How does that work? What happens to workflow instances that are currently executing?
Your question is actually is pretty important as upgrading long running applications is indeed non trivial problem.
The standard approach of versioning the whole workflow is OK for introducing new features but doesn't really support bug fixing. For example, a workflow is expected to run for three months. A bug is found at the end of this workflow definition. The bug is fixed and deployed as a new version, but all workflows that started up to the fix and used previous versions will keep failing for the next three months. So there is a need to patch workflow without changing its version.
The approach Temporal takes is that every part of the code is versioned independently. This allows deploying changes at any time, even for libraries shared by multiple workflows, and doesn't require running multiple worker versions.
I get the idea but it seems clear that it will result in unmanageable spaghetti code in very little time. This model forces the developer to entangle versioning with logic. Since a big part of Temporal's value proposition is dis-entangling logic from state management and other orchestration, I expect you can see why this is a problem.
Put another way, it's a great big leak in the abstraction.
I see your point. But so far we didn't hear many complaints about this feature from the users. The main problem is in forgetting to version code that changed. But the system has protections against such cases.
It doesn't end up in spaghetti code as old branches are proactively removed after they are not needed anymore. Temporal provides APIs that allow to count number of workflows using each version.
I don't think it is a leaky abstraction as it is indeed a part of the business logic of deciding how old version of the state should become a new one. I don't think there is a generic solution that allows implicitly migrate old state to a new state on any long running computation.
A workflow is not just externally stored data with a well-defined schema. It is the computation's whole state, including threads blocked on API calls and local variables on the stack. So it is not possible to define translations between different versions the way Edit Lenses do.
That seems liked quite a good way to handle things, because it lets you do things like cancel, restart, or migrate an in-flight instance on any way including dynamically based on its state. It also lets you handle any number of versions.
The main difference is that Temporal is using general-purpose programming languages to implement workflow code. This gives developers unlimited flexibility and doesn't require learning a new specialized language just to add resiliency to their program. Some other benefits of using programming language instead of XML/JSON based language:
* Strongly typed (for languages like Java or Go)
* Uses standard error handling of the language of choice. For example, Java SDK throws exceptions, and Go one returns errors.
* Use standard tools for development like IDEs, Debuggers, Linters, Unit Testing frameworks, etc.
* Allows using the same language for both activities and workflows.
* Allows programs of practically unlimited complexity through standard programming language techniques like OO, functions, etc.
* Easily supports handling of asynchronous events
* Supports updating definitions of already running workflows
* Allows reuse of standard libraries. For example, if workflow needs to keep a state in a priority queue, it can use an existing one.
My personal opinion that BPMN tries to cater to two distinct groups of people. Non-technical domain experts and software developers. And it is essentially a compromise that cannot serve both of them well. Non-technical domain experts still cannot implement production-ready workflows, and developers have to use limited complicated UI/XML-based technology that is inferior to the programming languages and environments they are used to.
The Temporal approach gives developers the best tool for their job and lets them decide how to interact with non-technical users. In some domains, it is possible to create DSL for non-technical users to use. Temporal is an excellent technology for implementing such domain-specific languages. BPMN's problem is that it is not domain specific and tries to serve as a general-purpose language without being one.
I am super excited about Temporal. I think one of the biggest underacknowledged problems in business software is the way executional details pervade business logic. As soon as your critical logic doesn't simply exist in a request-response logic, it becomes fragmented over queues, scheduled jobs, ETLs, microservices, functions, etc. The actual processes that matter become Rube Goldberg machines that are hard to understand, maintain, observe, debug, and analyze. Not to mention hard to depict to the actual end user.
When I read the article "Why the Serverless Revolution Has Stalled"  on here a couple weeks ago, my reaction was that the reason is that serverless doesn't solve the business logic issue. Serverless removes operational details of servers, but often exacerbates process fragmentation. And the need to have "serverful" operational expertise is replaced by the need for serverless expertise. At least right now, this new expertise is far from trivial.
I haven't yet used Temporal, but I've spent a lot of time evaluating it, and its predecessor, Cadence. The idea is to model long-running business logic essentially as procedures, in ordinary code. These are called Workflows, and must be free of external effects. External effects are carried out by Activities, and the scheduling and tracking of results are handled by the Temporal runtime. The upshot is that you get to write workflows as though they have no time constraints. It's like async programming, but liberated from the confines of a OS process or machine.
If it turns out to be a useful home for business logic (I understand that it has at Uber), I think the next frontier is integrating it within UI frameworks. I'm imagining the next Rails being something like Next.js + Temporal. I still have a bunch of questions in my mind, like how to decide which data lives in Temporal vs. a OLTP database. Someone with more experience using Temporal probably could better answer this.
One of the reasons I'm particularly interested in this topic is because my company, Better.com, uses our own homebuilt workflow engine to model the days-long, multi-user business project of mortgage origination. In our case, our workflow engine is actually built as part of a full-stack framework that goes well beyond Temporal in scope, but it's not built as a generic platform, and we look at Temporal for inspiration on where things might be going.
The visual interface for workflow definition is more hassle than benefit. The majority of such applications' complexity is not in the sequencing of operations but in state management, expressions, data manipulation, parameter passing, and retry policies. None of this is visible in the diagram. So engineers have to use UI/DSL to design the flow, but then implement most of the complex logic in non strongly typed code as parameter passing is usually using a map.
Temporal represents all the business logic in one place in the programming language of your choice. All parameter passing is strongly typed.
I see people advocating visual representation for workflows. But if it is so great for programming in general, why they don't advocate the same for systems programming, for example? Linux kernel in JSON/XML anyone?
I've heard of Camunda from reading this excellent article , but I havent' looked too deeply into it yet.
BPMN vs code is something we have considered. Our workflow engine works more like state machines as code, which is kind of the worst of both worlds. The business processes get fragmented into logic for each state, and mixed with imperative effects, so its neither easy to see the process nor maintain the code.
The nice thing about BPMN is that it's probably a bit more learnable by the subject matter experts who ultimately determine what the correct business process should be. However, having seen some pretty monstrous process flowcharts, my guess is that it probably doesn't scale past a certain level of process complexity. I suspect that workflow-as-code is ultimately the more scalable option.
Look at how Temporal represents workflows as code. It doesn't use any intermediate representation like DAG or state machine. It executes your code directly as a synchronous program with blocking operations taking as long as necessary to execute. For example the following code would be absolutely valid as a Temporal workflow:
Camunda is pretty nice and I have proposed it for projects, but it was shot down almost every time because 'BPM visual modelling is old-world' (it kind of says that on the homepage as well; 'modernize your legacy BPM systems'). We did a prototype for a big insurer in the UK and that went well, but covid+brexit killed the project altogether. It is nice to work with and to just toss it aside because of BPM modelling (and the charts belonging to them that make people from the industry who encountered them early 2000s cringe) is shortsighted (considering you can do everything in code). It is just that when people read the homepage, I guess from comments partners/clients made, it looks like going back 15+ years when you had fat java applet clients doing BPM and it always turning into a mess in the end.
> Serverless removes operational details of servers, but often exacerbates process fragmentation. And the need to have "serverful" operational expertise is replaced by the need for serverless expertise.
I totally agree. As a consultant working with multiple companies, I see a slowly increasing number of Serverless applications. The code always looked bloated, with no chance of ever being migrated off the original cloud platform due to all the proprietary APIs in use. There was also no way to run any of them locally without extensive configuration. Serverless has a future, but this is not it.
It's interesting that you mention workflows and serverless together. In this regard, have you looked at things like AWS Step Functions and Azure Logic Apps? You get the best of both worlds - state machines with excellent workflow management tools along with the flexibility and cost effectiveness of AWS Lambda in the background.
Talking of "Serverless Revolution", I think we are going to see more of such abstractions as things evolve. Abstractions and tools built upon serverless functions that are going to cater more closely to problems being solved rather than worrying about managing new-found complexity of the functions themselves.
Before I'd ever heard of the concept of a "workflow engine", I looked into step functions, thinking that was exactly the solution I was looking for. But when I started studying them, it was apparent to me that they suffer from the exact same problem of fragmentation of logical processes. In my ideal world, the breaks between steps in a process look much more like `await`s in a process that is modeled by a single function, and not like a hard split between steps.
The short answer is that Temporal is a fork of Cadence by the original founders of the project. The Temporal fork also has a VC backed company behind it. The Temporal is still an open-source project under MIT License. We believe the success of the open source project is essential for the success of the company, so we are putting significant resources into its development.
We spent almost year working on various improvements before declaring the first production release. The most important technical difference is that Temporal uses gRPC/Protobuf when Cadence is TChannel/Thrift. One implication is that TChannel did not support any security and Temporal supports mTLS. We are currently working on a comprehensive security story.
Awesome! What's the plan for Cadence? Do you plan to maintain both projects moving forward or you'll start shifting community to Temporal and deprecate Cadence?
I ask because internally we just deployed our first Cadence cluster a few weeks ago -- it'd be good to know what to expect and what the founders of the project suggest to do now that Temporal is officially released to production?
So the short answer is that we love Cadence but it is no longer our project. Uber team has full control and we are not part of Uber basically. Keep in mind Cadence is a project for Uber first and foremost. But Cadence and the team are super great.
The long answer is that we see Cadence as a great part of our lineage. We felt that our vision for the open-source of Temporal was much different than what Uber sees for the project.
Now that we have a production grade release, we are slowly phasing out Cadence support. We will still provide free support to any Cadence users who need help migrating to Temporal.
I can’t speak to your situation without knowing more. I will say if you would ever like a hosted version of this tech, Temporal is the way to go. I am happy to answer any specific questions too!
It’s a very valid question but it’s just a bit timely since Mitchell tweeted a few days ago about how most of HashiCorp cloud is built on Temporal tech.
Disclaimer: head of product at Temporal. Temporal is not container orchestration and is not an infrastructure management tool. In most cases users run Temporal on top of Kubernetes.
Temporal provides a distributed experience which is decoupled from the reliability of any specific piece of hardware. We provide a programming model for writing distributed applications without needing to code around all points of failure. We still are working on what to call the tech exactly, it’s not something that is widely known by any name today. It’s sort of like virtualized distributed computing.
So instead of say Kubernetes I could use Nomad for orchestration of my todo list MVC and Temporal on that same Nomad host, and others, and non-Nomad hosts, for my distributed data pipelines, or cleanup jobs, or email marketing etc right? What I mean to say is: is that scenario one valid/proper use of Temporal as you all envision it especially within the context of tools HashiCorp provides too.
Temporal doesn't have an opinion on how you manage your infrastructure. Most users consume our docker images but there is zero reason you can't compile binaries and run on bare metal.
That being said, Temporal backend consists of a few stateless and horizontally scalable services (matching service, frontend service etc). Because these roles experience load differently it often makes sense to scale them separately. Due to this design, users often find it convenient to use an orchestration solution such as Kubernetes, ECS etc. HashiCorp themselves run our technology using Nomad to directly answer your question.
The only thing we are strongly opinionated about is that you run the underlying database in a production-grade manner. Throwing a MySQL container into a helm chart isn't going to cut it for serious usage.
I agree about the buzzwords.
We have deployed a core service developed with Temporal recently and it seems to deliver on its promises.
I am looking forward to using it more.
Once it 'clicks' you begin to see how the workflow model matches many business processes, which cannot be handled within a single request, or an event, very well.