问题
什么是Saga?
答案
1. 核心概念
Saga是一种长事务解决方案,最早由Hector Garcia-Molina和Kenneth Salem在1987年提出。Saga将一个分布式事务拆分成多个本地事务,每个本地事务都有对应的补偿事务(Compensating Transaction)。
核心思想:
- 将长事务拆分为多个短事务
- 每个子事务在本地提交
- 如果某个子事务失败,通过补偿事务回滚已完成的操作
与TCC对比:
- TCC:Try预留资源 → Confirm/Cancel
- Saga:直接执行 → 失败时执行补偿
2. Saga执行流程
成功场景
订单服务 库存服务 账户服务 物流服务
| | | |
|---T1: 创建订单--> ✓
| |
| |---T2: 扣减库存--> ✓
| | |
| | |---T3: 扣减余额--> ✓
| | | |
| | | |---T4: 创建物流单--> ✓
| | | |
└─────────────────┴─────────────────┴─────────────────┘
所有事务成功提交
失败场景(补偿)
订单服务 库存服务 账户服务 物流服务
| | | |
|---T1: 创建订单--> ✓
| |
| |---T2: 扣减库存--> ✓
| | |
| | |---T3: 扣减余额--> ✗ (失败)
| | |
| |<--C2: 恢复库存---|
| | |
|<--C1: 取消订单--| |
| | |
└─────────────────┴─────────────────┘
执行补偿事务,回滚已完成的操作
正向流程(Forward Recovery):
T1 → T2 → T3 → T4
补偿流程(Backward Recovery):
C4 ← C3 ← C2 ← C1
3. Saga的两种实现方式
方式1:协同式Saga(Choreography-based)
原理:每个服务监听事件,自主决定下一步操作,无中央协调器。
架构图:
订单服务 ---发布[OrderCreated]事件---> 事件总线
↑ |
| ↓
| 库存服务
| |
| (监听事件)
| ↓
| 发布[StockDeducted]事件
| |
| ↓
| 账户服务
| |
| (监听事件)
| ↓
←---发布[PaymentFailed]事件--- (余额不足)
实现示例:
// 订单服务
@Service
public class OrderService {
@Autowired
private EventPublisher eventPublisher;
/**
* 创建订单
*/
@Transactional
public void createOrder(OrderDTO orderDTO) {
// 1. 本地事务:创建订单
Order order = new Order();
order.setStatus(OrderStatus.PENDING);
orderMapper.insert(order);
// 2. 发布订单创建事件
OrderCreatedEvent event = new OrderCreatedEvent(order);
eventPublisher.publish(event);
}
/**
* 监听支付失败事件,执行补偿
*/
@EventListener
@Transactional
public void onPaymentFailed(PaymentFailedEvent event) {
// 补偿:取消订单
Order order = orderMapper.selectById(event.getOrderId());
order.setStatus(OrderStatus.CANCELED);
orderMapper.updateById(order);
log.info("Order canceled due to payment failure: {}", event.getOrderId());
}
}
// 库存服务
@Service
public class InventoryService {
@Autowired
private EventPublisher eventPublisher;
/**
* 监听订单创建事件
*/
@EventListener
@Transactional
public void onOrderCreated(OrderCreatedEvent event) {
try {
// 本地事务:扣减库存
inventoryMapper.deductStock(
event.getProductId(),
event.getQuantity()
);
// 发布库存扣减成功事件
StockDeductedEvent successEvent = new StockDeductedEvent(event);
eventPublisher.publish(successEvent);
} catch (InsufficientStockException e) {
// 库存不足,发布失败事件
StockDeductFailedEvent failEvent = new StockDeductFailedEvent(event);
eventPublisher.publish(failEvent);
}
}
/**
* 监听支付失败事件,执行补偿
*/
@EventListener
@Transactional
public void onPaymentFailed(PaymentFailedEvent event) {
// 补偿:恢复库存
inventoryMapper.restoreStock(
event.getProductId(),
event.getQuantity()
);
log.info("Stock restored for order: {}", event.getOrderId());
}
}
// 账户服务
@Service
public class AccountService {
@Autowired
private EventPublisher eventPublisher;
/**
* 监听库存扣减成功事件
*/
@EventListener
@Transactional
public void onStockDeducted(StockDeductedEvent event) {
try {
// 本地事务:扣减余额
accountMapper.deductBalance(
event.getAccountId(),
event.getAmount()
);
// 发布支付成功事件
PaymentSuccessEvent successEvent = new PaymentSuccessEvent(event);
eventPublisher.publish(successEvent);
} catch (InsufficientBalanceException e) {
// 余额不足,发布失败事件(触发补偿)
PaymentFailedEvent failEvent = new PaymentFailedEvent(event);
eventPublisher.publish(failEvent);
}
}
}
事件定义:
// 订单创建事件
@Data
public class OrderCreatedEvent {
private String orderId;
private String productId;
private Integer quantity;
private BigDecimal amount;
private String accountId;
private LocalDateTime timestamp;
}
// 支付失败事件
@Data
public class PaymentFailedEvent {
private String orderId;
private String productId;
private Integer quantity;
private String reason;
}
优点:
- ✅ 服务高度解耦
- ✅ 无单点故障
- ✅ 易于扩展新服务
缺点:
- ❌ 流程分散,难以理解和维护
- ❌ 事件链路复杂,调试困难
- ❌ 循环依赖风险
方式2:编排式Saga(Orchestration-based)
原理:通过中央协调器(Orchestrator)显式控制事务流程。
架构图:
Saga协调器
|
+---------------+---------------+
| | |
↓ ↓ ↓
订单服务 库存服务 账户服务
| | |
创建订单 扣减库存 扣减余额
↓ ↓ ↓
成功/失败 成功/失败 成功/失败
↓ ↓ ↓
取消订单 恢复库存 退款余额
(补偿操作) (补偿操作) (补偿操作)
实现示例:
/**
* Saga协调器
*/
@Service
public class OrderSagaOrchestrator {
@Autowired
private OrderService orderService;
@Autowired
private InventoryService inventoryService;
@Autowired
private AccountService accountService;
@Autowired
private ShippingService shippingService;
@Autowired
private SagaLogMapper sagaLogMapper;
/**
* 执行订单Saga
*/
public void executeOrderSaga(OrderDTO orderDTO) {
String sagaId = UUID.randomUUID().toString();
List<SagaStep> completedSteps = new ArrayList<>();
try {
// 步骤1:创建订单
SagaStep step1 = new SagaStep("createOrder",
() -> orderService.createOrder(orderDTO),
() -> orderService.cancelOrder(orderDTO.getOrderId())
);
executeStep(sagaId, step1);
completedSteps.add(step1);
// 步骤2:扣减库存
SagaStep step2 = new SagaStep("deductStock",
() -> inventoryService.deductStock(
orderDTO.getProductId(),
orderDTO.getQuantity()
),
() -> inventoryService.restoreStock(
orderDTO.getProductId(),
orderDTO.getQuantity()
)
);
executeStep(sagaId, step2);
completedSteps.add(step2);
// 步骤3:扣减余额
SagaStep step3 = new SagaStep("deductBalance",
() -> accountService.deductBalance(
orderDTO.getAccountId(),
orderDTO.getAmount()
),
() -> accountService.refundBalance(
orderDTO.getAccountId(),
orderDTO.getAmount()
)
);
executeStep(sagaId, step3);
completedSteps.add(step3);
// 步骤4:创建物流单
SagaStep step4 = new SagaStep("createShipping",
() -> shippingService.createShipping(orderDTO),
() -> shippingService.cancelShipping(orderDTO.getOrderId())
);
executeStep(sagaId, step4);
completedSteps.add(step4);
// 所有步骤成功,记录日志
sagaLogMapper.updateStatus(sagaId, SagaStatus.COMPLETED);
} catch (Exception e) {
log.error("Saga execution failed, sagaId: {}", sagaId, e);
// 执行补偿(倒序)
compensate(sagaId, completedSteps);
throw new BusinessException("订单创建失败", e);
}
}
/**
* 执行单个步骤
*/
private void executeStep(String sagaId, SagaStep step) {
try {
// 记录步骤开始
sagaLogMapper.insertStep(sagaId, step.getName(), StepStatus.STARTED);
// 执行正向操作
step.getAction().execute();
// 记录步骤成功
sagaLogMapper.updateStepStatus(sagaId, step.getName(), StepStatus.COMPLETED);
} catch (Exception e) {
// 记录步骤失败
sagaLogMapper.updateStepStatus(sagaId, step.getName(), StepStatus.FAILED);
throw e;
}
}
/**
* 执行补偿(倒序)
*/
private void compensate(String sagaId, List<SagaStep> completedSteps) {
// 倒序遍历已完成的步骤
for (int i = completedSteps.size() - 1; i >= 0; i--) {
SagaStep step = completedSteps.get(i);
try {
log.info("Compensating step: {}", step.getName());
// 执行补偿操作
step.getCompensation().execute();
// 记录补偿成功
sagaLogMapper.updateStepStatus(
sagaId,
step.getName(),
StepStatus.COMPENSATED
);
} catch (Exception e) {
// 补偿失败,记录日志,人工介入
log.error("Compensation failed for step: {}", step.getName(), e);
sagaLogMapper.updateStepStatus(
sagaId,
step.getName(),
StepStatus.COMPENSATION_FAILED
);
}
}
sagaLogMapper.updateStatus(sagaId, SagaStatus.COMPENSATED);
}
}
/**
* Saga步骤定义
*/
@Data
@AllArgsConstructor
class SagaStep {
private String name; // 步骤名称
private SagaAction action; // 正向操作
private SagaAction compensation; // 补偿操作
}
@FunctionalInterface
interface SagaAction {
void execute() throws Exception;
}
Saga日志表:
-- Saga主表
CREATE TABLE saga_log (
saga_id VARCHAR(64) PRIMARY KEY,
saga_type VARCHAR(64) NOT NULL,
status VARCHAR(20) NOT NULL, -- STARTED, COMPLETED, COMPENSATED, FAILED
create_time DATETIME NOT NULL,
update_time DATETIME NOT NULL
);
-- Saga步骤表
CREATE TABLE saga_step_log (
id BIGINT PRIMARY KEY AUTO_INCREMENT,
saga_id VARCHAR(64) NOT NULL,
step_name VARCHAR(64) NOT NULL,
status VARCHAR(20) NOT NULL, -- STARTED, COMPLETED, FAILED, COMPENSATED, COMPENSATION_FAILED
create_time DATETIME NOT NULL,
update_time DATETIME NOT NULL,
INDEX idx_saga_id (saga_id)
);
优点:
- ✅ 流程清晰,易于理解和维护
- ✅ 集中管理,便于监控和调试
- ✅ 易于实现复杂的业务逻辑
缺点:
- ❌ 协调器可能成为单点
- ❌ 协调器逻辑复杂
- ❌ 服务耦合度相对较高
4. 补偿设计原则
原则1:补偿操作必须幂等
/**
* 恢复库存(补偿操作)
*/
@Transactional
public void restoreStock(String productId, int quantity, String sagaId) {
// 幂等性检查:是否已经补偿过
CompensationLog log = logMapper.selectBySagaId(sagaId, "restoreStock");
if (log != null && log.getStatus() == CompensationStatus.COMPLETED) {
log.info("Stock already restored for sagaId: {}", sagaId);
return;
}
// 执行补偿
inventoryMapper.restoreStock(productId, quantity);
// 记录补偿日志
logMapper.insert(sagaId, "restoreStock", CompensationStatus.COMPLETED);
}
原则2:补偿操作应该是可重试的
/**
* 退款(补偿操作,支持重试)
*/
@Transactional
@Retryable(maxAttempts = 3, backoff = @Backoff(delay = 1000))
public void refundBalance(String accountId, BigDecimal amount, String sagaId) {
// 幂等性检查
if (isAlreadyRefunded(sagaId)) {
return;
}
// 执行退款
accountMapper.addBalance(accountId, amount);
// 记录退款
refundMapper.insert(sagaId, accountId, amount);
}
原则3:考虑补偿失败的情况
private void compensate(String sagaId, List<SagaStep> completedSteps) {
for (int i = completedSteps.size() - 1; i >= 0; i--) {
SagaStep step = completedSteps.get(i);
// 最多重试3次
int maxRetry = 3;
boolean compensated = false;
for (int retry = 0; retry < maxRetry; retry++) {
try {
step.getCompensation().execute();
compensated = true;
break;
} catch (Exception e) {
log.error("Compensation failed, retry {}/{}", retry + 1, maxRetry, e);
Thread.sleep(1000 * (retry + 1)); // 递增延迟
}
}
if (!compensated) {
// 补偿失败,记录到失败表,触发告警
alertService.sendCompensationFailedAlert(sagaId, step.getName());
compensationFailureMapper.insert(sagaId, step.getName());
}
}
}
原则4:并非所有操作都能补偿
无法补偿的场景:
// 例1:发送短信/邮件(无法撤回)
public void sendNotification(Order order) {
smsService.send(order.getUserPhone(), "订单已创建");
// 无法补偿:短信已发出,无法撤回
}
// 例2:第三方支付(需要走退款流程)
public void pay(Order order) {
alipayService.pay(order);
// 补偿不是简单回滚,而是发起退款
}
// 处理方式:将无法补偿的操作放在最后
public void executeOrderSaga(Order order) {
// 可补偿的操作
createOrder(order);
deductStock(order);
deductBalance(order);
// 无法补偿的操作放最后
sendNotification(order);
}
5. Saga vs TCC 对比
| 维度 | Saga | TCC |
|---|---|---|
| 资源锁定 | 无资源锁定 | Try阶段锁定资源 |
| 隔离性 | 弱(无法保证) | 较强(资源已预留) |
| 实现复杂度 | 中等 | 高 |
| 性能 | 高(无资源锁定) | 中等 |
| 补偿设计 | 事后补偿 | 预留资源 |
| 适用场景 | 长流程、跨多服务 | 短流程、对一致性要求高 |
| 典型案例 | 旅游订单(机票+酒店) | 电商下单 |
6. 实际应用案例
案例:旅游订单Saga
业务流程:
- 预订机票
- 预订酒店
- 预订门票
- 扣款
@Service
public class TravelOrderSaga {
public void bookTravel(TravelOrderDTO dto) {
String sagaId = UUID.randomUUID().toString();
try {
// 步骤1:预订机票
Flight flight = flightService.bookFlight(dto.getFlightInfo());
// 步骤2:预订酒店
Hotel hotel = hotelService.bookHotel(dto.getHotelInfo());
// 步骤3:预订门票
Ticket ticket = ticketService.bookTicket(dto.getTicketInfo());
// 步骤4:扣款
accountService.deduct(dto.getAccountId(), dto.getTotalAmount());
// 所有步骤成功
travelOrderService.confirm(dto.getOrderId());
} catch (Exception e) {
// 补偿:倒序取消
try {
accountService.refund(dto.getAccountId(), dto.getTotalAmount());
} catch (Exception ex) {}
try {
ticketService.cancelTicket(dto.getTicketInfo());
} catch (Exception ex) {}
try {
hotelService.cancelHotel(dto.getHotelInfo());
} catch (Exception ex) {}
try {
flightService.cancelFlight(dto.getFlightInfo());
} catch (Exception ex) {}
throw new BusinessException("旅游订单预订失败", e);
}
}
}
7. Saga框架:Apache Camel Saga
@Component
public class OrderSagaRoute extends RouteBuilder {
@Override
public void configure() throws Exception {
// 定义Saga
from("direct:createOrder")
.saga()
.compensation("direct:cancelOrder")
.to("bean:orderService?method=createOrder")
.saga()
.compensation("direct:restoreStock")
.to("bean:inventoryService?method=deductStock")
.saga()
.compensation("direct:refundBalance")
.to("bean:accountService?method=deductBalance");
// 补偿路由
from("direct:cancelOrder")
.to("bean:orderService?method=cancelOrder");
from("direct:restoreStock")
.to("bean:inventoryService?method=restoreStock");
from("direct:refundBalance")
.to("bean:accountService?method=refundBalance");
}
}
8. 总结
Saga核心要点:
- 将长事务拆分为多个本地短事务
- 通过补偿事务回滚已完成的操作
- 两种实现方式:协同式(事件驱动)和编排式(中央协调)
补偿设计要点:
- 补偿操作必须幂等
- 补偿操作应该可重试
- 考虑补偿失败的情况
- 将无法补偿的操作放在最后
适用场景:
- 长流程业务(如旅游订单)
- 跨多个服务的业务
- 对一致性要求不是极高的场景
面试要点:
- 能清晰描述Saga的原理和流程
- 理解协同式和编排式的区别
- 知道如何设计补偿操作
- 能够对比Saga和TCC的差异