> ## Documentation Index
> Fetch the complete documentation index at: https://docs.lerian.studio/llms.txt
> Use this file to discover all available pages before exploring further.

# Midaz production best practices

> Set up Midaz for production with multi-AZ deployment, autoscaling, managed services, and the Kubernetes resilience patterns Lerian recommends.

Midaz is built for scale, security, and operational clarity. This guide helps you set it up right from the start, so you can minimize downtime, protect your data, and confidently handle high-volume workloads in production.

## Best-fit setup

***

To get the most out of Midaz in production, start with:

* Deploying across multiple availability zones.
* Using at least 3 worker nodes with autoscaling.
* Separating workloads (app vs. database).
* Leveraging managed services like RDS, ElastiCache, and MongoDB Atlas.
* Applying Kubernetes best practices for resilience, security, and observability.
* Automating backups and alerting from day one.

## Infrastructure planning

***

### Cluster architecture

To ensure resilience and performance:

* **Deploy across multiple availability zones**.
* **Use at least 3 worker nodes** for high availability.
* **Enable node autoscaling** to absorb workload spikes.
* **Separate application and database workloads** when possible.

### Resource sizing

* **Match node sizes to expected workloads**.
* **Prioritize critical services** with sufficient resources.
* **Apply resource quotas** to avoid contention.
* **Continuously monitor and tune** based on usage.

### Storage

* **Use SSD-backed storage** for all database components.
* **Define proper storage classes** per cloud provider.
* **Provision volumes with headroom** for growth.
* **For critical data**, use replicated or highly durable storage options.

## Database architecture and high availability

***

Midaz uses **CQRS (Command Query Responsibility Segregation)** to cleanly separate reads from writes. This helps you scale efficiently and build fault-tolerant services.

### PostgreSQL

* Use a **dedicated primary for writes**, and **replicas for reads**.
* Enable **synchronous replication** for critical data.
* Configure **automatic failover** (e.g., Patroni, AWS RDS).
* Monitor **replication lag** and consistency.
* Prefer **managed services** like AWS RDS or GCP Cloud SQL for resilience and automation.

### Redis / Valkey

* Deploy in **cluster mode** across multiple zones.
* Enable **automatic failover** (e.g., Redis Sentinel or native clustering).
* Use **managed services** like AWS ElastiCache or GCP Memorystore for simplicity and uptime.

### MongoDB

* Use **replica sets** with members across zones.
* Monitor **role transitions** and lag.
* Schedule **regular backups**.
* **Avoid writing to secondaries** unless it is intentional.
* Use **managed services** like MongoDB Atlas or AWS DocumentDB for observability, scaling, and resilience.

## Messaging infrastructure

***

**RabbitMQ** is essential for **decoupling services** and enabling **eventual consistency** in Midaz’s CQRS architecture:

* Command services **publish events** after processing writes.
* RabbitMQ **routes events** to interested consumers through exchanges and queues.
* Consumers **update read models**, **trigger workflows**, or **integrate with external systems** based on the received events.

We recommend using a **managed RabbitMQ service** (such as AWS MQ or CloudAMQP) in production to streamline operations and improve reliability.

## High availability strategies

***

### Service redundancy

* Deploy multiple replicas for every service.
* Use anti-affinity rules to spread services across zones.
* Apply Pod Disruption Budgets to reduce downtime during updates.

### Load balancing

* Use **ingress controllers** with health checks.
* Avoid **session affinity** unless required.
* Enable **connection draining** for smooth rollouts.

## Security considerations

***

### Network security

* Apply **Kubernetes network policies** to control traffic.
* Assign **minimal permissions** to each service account.
* Secure **external access with TLS**.
* Restrict **admin interfaces** with IP allowlists.

### Secret management

* Use **Kubernetes Secrets** for credentials and tokens.
* **Rotate secrets regularly**.
* **Never hardcode secrets** in containers or config files.
* Use **external secret managers** for a stronger security posture.

## Monitoring and observability

***

### Metrics

* Monitor **key application and infrastructure KPIs**.
* Set **actionable alert thresholds**.
* Use **dashboards** for real-time visibility.

### Logging

* **Centralize logs** across services.
* Use **structured formats** for better filtering.
* Apply **log retention and rotation policies**.
* Define **log-based alerts** for critical events.

### Tracing

* Enable **distributed tracing** across services.
* **Sample traces** to balance performance.
* Correlate **traces with logs and metrics** for complete visibility.

### Alerting

* Create **clear, reliable alerts**.
* Tune thresholds to **reduce noise**.
* **Route alerts** through the right channels.
* Maintain **runbooks** for recurring issues.

## Backup strategy

***

* Automate **regular backups** for critical systems.
* Store **backups in multiple locations or regions**.
* Test **restoration procedures** regularly.
* Keep **backup documentation** up to date and accessible.

## Idempotency

***

In production, always protect critical operations against duplicate processing:

* **Include idempotency keys on all transaction creation requests** using the `X-Idempotency` header.
* **Use explicit, deterministic keys** tied to your business process IDs (e.g., order IDs, payment references) rather than relying on auto-generated keys.
* **Validate the `X-Idempotency-Replayed` response header** to distinguish new transactions from cached replays.
* **Set appropriate TTL values** that match your retry window — 60–120s for synchronous flows, 300–600s for asynchronous ones.

<Note>
  All Lerian products support idempotency through their own header conventions. For implementation details, retry strategies, and a comparison across products, see [Retries and idempotency](/en/reference/retries-idempotency).
</Note>

## Final notes

***

Midaz is production-ready by design. When you align your infrastructure to its architecture, you gain:

* Clean **read/write separation** with CQRS.
* Plug-and-play compatibility with **managed cloud services**.
* Built-in support for **observability, failover, and secure operations**.

Review your setup regularly, apply these best practices proactively, and you'll have a robust foundation ready to grow with your business.

## What’s next?

***

Want help scaling, migrating, or hardening your production environment?

* Read the [Midaz deployment guide](/en/midaz/deployment).
* [Contact our team](https://lerian.studio/contact) for tailored support.
