Sharing data between (macro) microservices w replication or fetch-on-demand?

I’m working on a complex system that follows the principles of microservices, but with one exception – my services are more ‘generalized’ and big (I refer to them as macroservices further down the post).

For example, instead of having very precise services:

  • users
  • news
  • articles
  • posts
  • threads
  • matches
  • items

I have the following generalized services instead:

  • portal (users, news, articles)
  • forum (users (from portal), posts, threads)
  • game (users (from portal), matches, items)

I understand that it affects scalability, flexibility and coupling – but this design choice has its reasons, and it’s not the problem I’m facing either.

The problem

As you can see above – all of the macroservices use users resource originating from the portal service.

Generally speaking, once you register on the portal – you can use the same account to play the game or post on forums.

Question:

Should I implement cross-service communication, so game and forum fetches user from portal when needed (quite often, as every post, thread and match has user(s)) – which means they can’t function once portal dies, but the data will always be up to date.

OR

Should I implement data duplication/replication, so whenever UserCreated, UserUpdated event fires – game and forum stores a duplicate of user in their own database – which means they can still function if portal dies, but there’s a bit of coupling due to synchronization.

microservices – How can we design view count on videos website like youtube which needs to be highly optimized

As per my understanding :

We can have a backend scalable microservice MS1 having an API. Client calls the API in case of a user plays a video. This microservice will be using a sharded cache C1 and a message broker MB1. The cache C1 will manage video count for videos<VideoId,VideoCount>, for every new request it will increment the count in cache C1 and will add the request<VideoId,UserId> to message broker MB1.
On the other side of message broker MB1 a service MS2 will update the request in Database DB1. The sharded cache C1 will fetch data in case of unavailability of data, from the MS2.

Recently, I was in a interview where the interviewer asked me to design this view count which should be scalable. He was only concerned with millions of connections created with cache C1 in case of millions of request.
And I was under impression that the cache C1 is scalable so it’s not an issue.

I have designed similar thing before in some project including like and dislike count. So same way I tried to explain him but he wasn’t convinced .
I tried to find some standard approach or algorithm to optimize it further but unable to find any over google, so finally here I am here. Kindly help me, have I done anything wrong.

microservices – Inner join between separate databases

Assume there is a large corporation with multiple SW systems that all use the same CRM system via a REST API. All systems have their own databases as does the CRM system. Now, system A only holds references to the CRM customer in the form of customer IDs but no other customer data. System A’s database holds 10,000 customers, CRM holds 1,000,000 customers. There’s a requirement for system A to allow searching customers by name or partial name and a role held in system A. If CRM and system A were in the same database this would be trivial, a simple inner join. But since both systems have separate databases, this is not an option. Option 1 would be that there’s a REST service that accepts 10,000 customer IDs as input parameters (as well as the searched name), does the inner join against its own database, and returns the result set. In our case this is not possible as the REST service accepts at most 100 customer IDs and returns at most 300 customers. Option 2 would be that the CRM system keeps track of which customers are in system A and can then limit name searches to those customers. Option 3 would be that system A replicates some of the CRM data (names) in its own database, but that comes with a its own problems. How do you propagate changes in CRM to system A?

Any other options? I’m not too familiar with how MDM systems would handle this. It seems to me that with the current microservices craze similar problems are all the more prevalent.

deployment – How to deploy python microservices app updates?

Device Management is quite hard to achieve so we’ve seen the rise of many SaaS, from big providers and small startup, which automate the deploy of edge services onto IoT devices fleet. Some of them rely on containerization (and Docker is pushing towards a top level support on ARM archs) some other else act in a “serverless fashion” which means that let you load some script in some language and then copy it through your fleet

Basically what you can do is

  1. Evaluate these tools (eg. Azure IoT Edge)
  2. Work With some configuration management tool (eg. Saltstack)
  3. Roll you own solution

It’s clear that this is the safest choice since you have to do nothing but some benchmark and then integrate your pipeline. But as with all cloud services, they come with their costs and their constraint

Yes, I’m not crazy. We know config management tools (Ansible, Terraform etc) since we use them to provision hundreds of cloud VMs, but there is not so much difference between a cloud VM and a linux device accessible through SSH and VPN. You just have to make sure that you are using a tool that is scalable enough and has the needed resiliency to work over unreliable channels. Saltstack dose this job quite good, it uses ZeroMQ as event bus and have small footprint agents. You can define your desired state through versioned files and change them accordingly to requirements, or take control of some devices for some specific maintenance tasks. Pay attention on managing all Ops aspects (security, scalability, availability) that are the major burden that this solution carries to you project

If you have a very simple use case, you wouldn’t be eager to pay cloud bills or to manage large scale configuration application for High Avaliability and so on…. You are able to communicate with your devices in a bidirectional way, you could write some platform service able to send events to the edge whenever a config update is available. Then the edge send back some tracking event to understand whether you should retry on some unavaliable device, rollback the deployment or perform some deployment strategy such as canary. But this worth only with the simplest scenario, because building a full fledged management solution takes a huge effort and distract your team for the real valuable activities

microservices – Event sourcing: Update local data before posting an event?

We are introducing events to our system (I would hesitate to call it ‘event sourcing’, but we have started down this road). To do this we are still maintaining the same HTTP public ‘CRUD’ API’s, but on each mutation request (POST/PATCH/PUT) the server(s) will now create an event and post it to an ‘event store’ service.
We are wondering whether a service receiving a request should update its local data first then send the event, or send the event and, having subscribed to the event store service, only update its’ local data when it receives the event back?

Here is what currently happens:
A POST request comes in to API 1 and it updates its’ local data then creates an event.

API 2 is listening to these events as it needs to update its data accordingly. It receives the event and it updates its data. Great! It works quite nicely (We are also enjoying having this log of all ‘mutations’ in one place – great for debugging).

If we make API 1 subscribe to the event, rather than directly update its data, I can think of the following advantages:

  • If for some reason the event never makes it to the event queue service, then the data between the two API’s is still in sync (API 1 did not prematurely update its local data)
  • The client request can be potentially quicker, since it is not waiting on a DB update before responding
  • We could put the part of API 1 that subscribes to events and updates local data in its own service, separating read and write (i.e. moving towards CQRS).

A few problems:

  • API 1 cannot send a response to the client saying that the POST request is all good, since it won’t know until it receives the event and updates its data. The best we can do is say that it was accepted.
  • With that in mind, we now need a way of informing the client of failures – the implication seems to be we need some kind of notification service with SSE or websockets in order to push info to the client.
  • The overall latency to write data goes up (its doing a full round trip)
  • Often, the write API needs to access the database anyway to verify a user is allowed to do whatever they are asking (e.g. do they own the resource they want to mutate?). This can negate the advantage I said of requests being quicker; it also poses another question – should this access/permissions check be done in the initial request or when the subscriber receives the event?

Thanks for any ideas!

(I should also note that while my examples only have 2 services in play, in reality we have a number of simple CRUD API’s, each managing their own data and sitting behind an API gateway. Until introducing events, we were simply calling API’s directly from one another, which has become difficult to manage/debug/maintain)

In picture form, here is what we currently have, with API’s updating their data then posting an event:

enter image description here

And here is what we are considering, with API’s only updating data in response to events:
enter image description here

architecture – Monolith to microservices – Staging / UAT environments

While things are hybrid

If your monolith is difficult to deploy, requires a lot of resources, or has licensing costs even for development environments; then you’ll need to limit the number of instances where the monolith lives. Your option is to create a simulator to manually do things the monolith would do so you can test the microservices, or directly manipulate data stores using your tests.

However, if your monolith is pretty easy to deploy, you can automate it’s deployment as well. It would be a “bigger” service, so to speak.

Ideal Environment

You’ll find that one of the key factors of success in microservices is automating the deployment as much as possible. Whether you use Chef, Salt, TerraForm, CloudFormation, containerize with an orchestration layer like Kubernetes is a choice you’ll have to decide on.

There are many ways of solving this problem, but the heart of it is that you need to make your deployment and configuration as automated as possible. Some of the ways to make that easier include:

  • Externalizing configuration: The deployment system pushes the configuration to the services
  • Service Discovery: Either use a dedicated service discovery component, or leverage your infrastructure (i.e. DNS entries or some of the many ways that Kubernetes makes it easier to find a service)
  • Protect Secrets: Secrets like usernames and passwords, or client id and secret ids for OAuth 2 authentication shouldn’t be passed in the clear. You can leverage your externalized configuration if you have a means of encrypting and decrypting on the fly.
  • Continuous Integration/Continuous Delivery: Every commit to the right branch or tag should build and deploy the software to the right environment.

By having deployment as part of the whole process, your question answers itself. Once you’ve gone through the hassle of automating the deployment and configuring the different environments appropriately, why wouldn’t you just deploy your service when changes are made in all environments? That makes automated and ad hoc testing easier to do.

Team Responsibilities

When you talk to large software teams like Amazon (the store front), NetFlix, AirBnB, etc. there is a common mantra. I.e. there should be one team per service. That team is responsible for everything related to the service, including deployment, testing, recovery testing, monitoring, etc. The teams would ideally be a 2 pizza team (roughly 5-8 people depending on how hungry they are).

For smaller development teams like mine, that’s just not something that most companies have the luxury of doing. For example, I have two teams working for me, integrating and building out two applications into one. To handle coordination efforts, we do incorporate planning meetings with our normal scrum cadence.

  • Every week we have a Scrum of Scrums, where each team lead works with the architect (me in this case), and business folks so we can resolve issues related to technology, schedule, or business priorities.
  • At release planning we identify the areas we need to coordinate more tightly.

Our releases are typically 4 sprints worth of work, and then deployment to production. However, our customer has a lot more bureaucracy to release software than if we were a commercial group. Your experience may likely be different. However, I can say from experience, the more often you deploy the more critical automating that deployment is.

How to ensure data consistency between 2 microservices while both having write permissions

I’m currently building my first ever microservice using NodeJS, but I stumbled upon a problem.

TL;DR
~ How to ensure data consistency between 2 microservices while both having write permissions

Setup

I currently have 2 independent microservices: session & recovery

  • Session: Handles authentication (sign up, sign in, sign out, current user)
  • Recovery: Handles password resets (request a reset, reset the password)

Both services contain an identical copy of the user table. Every record has a version field to ensure data consistency.

I’m using a message broker (NATS) in between the 2 services.

Example

A brief example of how the application flow works.

Let’s say we want to do a password reset:
Recovery Service:

  1. Request password reset
  2. Generate token
  3. Specify new password (legitimized by token)
  4. Increase the version of the user
  5. Send UserUpdatedEvent

Session service

  1. Receives event with data (user id, user version, new password)
  2. Search for a user with specified id and version - 1
  3. Found? Update the password
  4. Increase version number

Important:

  • The recovery service doesn’t hold all the user data. In fact, it’s only used to check if the user exists.
  • The password is not stored inside the recovery service, it’s send to the session service via the UserUpdatedEvent.

These are the 2 tables inside the recovery service:

  1. Recovery table: Contains the user reference , a generated token, the status (pending, expired, completed)
  2. User table: Contains the user id and the user its version

This works fine, but what if you update the same user simultaneously in both services?

The Problem

What if 2 microservices, that hold a copy of the same table, both modify different a property at the same time?

Example:

  1. Recovery service handles password modification for a user
  2. Session service handles all other modifications for a user (first name, last name,…)

They will both, at the same time, modify their tables. Therefor increase the version count and then emit an event to the other services. When the UserUpdateEvent is received by the other services it won’t be recognized as a ‘legit’ event since there is a version mismatch between session & recovery.

  • Does this mean I should have kept this functionality in one service?
  • Perhaps only one service should be able to modify a table?
  • Maybe I shouldn’t keep a version inside the recovery table at all?

Hopefully a more experienced person can help me out!

domain driven design – How far to go when decoupling Microservices by use of Integration Events (Messages)?

I am reading the architecture guide from the .net core project. They state:

The integration events can be defined at the application level of each microservice, so they are decoupled from other microservices, in a way comparable to how ViewModels are defined in the server and client. What is not recommended is sharing a common integration events library across multiple microservices; doing that would be coupling those microservices with a single event definition data library. You do not want to do that for the same reasons that you do not want to share a common domain model across multiple microservices: microservices must be completely autonomous.

There are only a few kinds of libraries you should share across microservices. One is libraries that are final application blocks, like the Event Bus client API, as in eShopOnContainers. Another is libraries that constitute tools that could also be shared as NuGet components, like JSON serializers.

There is a reference implementation, the eShopOnContainers repo on Github. Digging around a little bit, I found that they duplicated the messages in both services. Example: the OrderPaymentSucceededIntegrationEvent appears in the publishing payment service as well as in the subscribing order service.

My feelings vary on this approach. Sure, it is decoupled in the sense of no compile time dependency. But any change of the message might break the application at runtime, since the compiler does not check the compatibility of the message sent matching the message received. Would it be illegal to publish a kind of “Contracts” assembly providing all the messages published by a micro service to be compile time bound by the subscriber? I’d rather think about such messages as “common knowledge”, somehow like he base class library is common knowledge for all .net core programs.

architecture – Sharing user related data in microservices

I’m having a problem choosing the right approach to sharing some user related data throughout my microservices-based application.

Imagine the following scenario:

Users microservice handles creation of users, but also management of the hierarchy of those users. It has the information regarding which users have Manager role and which User role, but also which Users are subordinates of which Managers.

There’s also a Books microservice which allows for creation and management of Book and related entities. Users can create and manage their own Book, however, their managers should be able to update their Book, too. The authorization of Update endpoint of BooksController should check if the User trying to do the update is the owner, or in the case when he’s not, does he have a Manager role and also if the owner of the Book in question is his direct subordinate. This information is only available in the Users microservice.

I’m considering following solutions to this problem:

  1. A request/response pattern implementation to get the subordinates of the Manager in question – Feels like a very bad option for this, as it’s essentially creating a tight coupling between the services and also creating a single point of failure for both services.

  2. Sharing the Users database with Books microservice (read-only) – another case of tight coupling, however, with no dependency of the Users service to be alive to retrieve the information.

  3. Merging the microservices together – Maybe the line was drawn in a wrong place and those two microservices should become one. However, looking at the Users service, it feels like this sort of scenario can reappear for other microservices that will be introduced to the application. Solving it this way sets a precedence for just merging it all back together into a monolithic application.

  4. Adding the data regarding users hierarchy to the access token as a custom claim and using that data to do Authorize in the Books service – I think it could work. My worry is that I’d be misusing custom claims for passing data that isn’t really ‘part’ of the user.

  5. Other?

java – Structuring my Monolith into Microservices

i am looking how to split my Monolith into several Micoroservices. My Main Goals are better maintaining, stable production and faster development.

I would like to use Spring Boot (as i already use).

Let me try what my software will do and what i have until now:

The Program is a data central for car repair companies, in my programm we store spare parts, with actual prices, stock and other important data. The most important thing is to update this data (stock and price) every hour.

We are connected to 40 spare parts producers who deliver us the data by csv, API, TXT etc. Every manufacturer has its own data structure.

Every user has its own “master” datbase schema (tenant) in our software. The user can insert different settings, the most important are:

Pricecalculation:

every manufacturerer can have different price calculation which are entered by the user
Other Settings like:

update stock? true/false
update descriptions) true / False and so on. Those settings are product data related.
Right now we do it like so: Every one of this 40 manufacturers has its own Spring Boot Service, the “Pulls” are scheduled on every hour.

There is also a master-Service who comunicate with the database and deliver data to the manufacturer services by REST

On a scheduled event, every services starts working:

Getting the maufacturer settings from the master service for every tenant who uses this specific manufacturer

1.Calling the external API to retrieve the product data

2.Put the raw data into a “general format” which every manufacturer and also the master service knows

3.Sending the product data to the master

The Master will do: If the product already exists it updates the stock and recalculate the price and safe it. If a stock or price changed, a database trigger will trigger those changes to a “log table” New products will be created and also be triggert to the log table

After the import is done, the master will inform other services like webshop-service. This webshop service pulls the “log-table” and add new product to the shop or even update stock/prices which are in the log table. Every log-Record will be marked as “done” after its done.

I have the problem that i cannot update the master service while updates are running, and there are workloads almost all the time. thats why i will decouple my service into independend services.

Has anyone a good idea how to break down my services?