Nezha's authenticated agents can forge service-monitor results for other users' services
Description
Nezha's service-monitor worker accepts forged TaskResult messages from authenticated agents, allowing cross-tenant corruption of monitoring data.
AI Insight
LLM-synthesized narrative grounded in this CVE's description and references.
Nezha's service-monitor worker accepts forged TaskResult messages from authenticated agents, allowing cross-tenant corruption of monitoring data.
Vulnerability
Nezha's service-monitor result worker in service/singleton/servicesentinel.go:475-483 accepts TaskResult messages from an authenticated agent based solely on whether the reported service ID exists. It does not verify that the reporter server (derived from the gRPC stream) is authorized for that service, belongs to the service owner, or was actually assigned the monitoring task. This affects versions prior to the fix commit [1].
Exploitation
An attacker with a valid agent secret and one registered agent can submit forged TaskResult messages. The agent authenticates via client_secret and client_uuid in service/rpc/auth.go:23-60, and the stream is bound to clientID in service/rpc/nezha.go:40-48. The attacker controls TaskResult.id, type, delay, data, and successful fields (defined in proto/nezha.proto:60-65). By setting id to a victim's service ID, the result is dispatched to ServiceSentinelShared.Dispatch with Reporter: clientID without further authorization checks [2].
Impact
Successful exploitation allows cross-tenant corruption of service-monitor history and current state. The attacker can influence victim-owned service notifications with attacker-controlled result text, potentially causing false alerts or masking real issues. No code execution or data exfiltration is achieved, but the integrity of monitoring data is compromised.
Mitigation
The vulnerability is fixed in commit 02129f16fb1572ef57c7e8dd7d03f84d39b8b586 [1]. Users should update to a version containing this fix. No workaround is documented. The CVE is not listed on CISA's Known Exploited Vulnerabilities (KEV) catalog.
AI Insight generated on Jun 1, 2026. Synthesized from this CVE's description and the cited reference URLs; citations are validated against the source bundle.
Affected products
1Patches
102129f16fb15fix(rpc): authorize agent task results
4 files changed · +1030 −7
service/rpc/request_task_security_test.go+335 −0 added@@ -0,0 +1,335 @@ +package rpc + +import ( + "context" + "errors" + "testing" + "time" + + "google.golang.org/grpc/metadata" + "gorm.io/driver/sqlite" + "gorm.io/gorm" + + "github.com/nezhahq/nezha/model" + pb "github.com/nezhahq/nezha/proto" + "github.com/nezhahq/nezha/service/singleton" +) + +type requestTaskSecurityStream struct { + ctx context.Context + results []*pb.TaskResult + onSend func(*pb.Task) + sendErr error +} + +func (s *requestTaskSecurityStream) Send(task *pb.Task) error { + if s.onSend != nil { + s.onSend(task) + } + return s.sendErr +} + +func (s *requestTaskSecurityStream) Recv() (*pb.TaskResult, error) { + if len(s.results) == 0 { + return nil, context.Canceled + } + result := s.results[0] + s.results = s.results[1:] + return result, nil +} + +func (s *requestTaskSecurityStream) SetHeader(metadata.MD) error { return nil } +func (s *requestTaskSecurityStream) SendHeader(metadata.MD) error { return nil } +func (s *requestTaskSecurityStream) SetTrailer(metadata.MD) {} +func (s *requestTaskSecurityStream) Context() context.Context { return s.ctx } +func (s *requestTaskSecurityStream) SendMsg(any) error { return nil } +func (s *requestTaskSecurityStream) RecvMsg(any) error { return context.Canceled } + +func TestRequestTaskSkipsCronResultOwnedByAnotherUser(t *testing.T) { + reporter := requestTaskSecurityServer(7, 200, "11111111-1111-1111-1111-111111111111") + victimCron := requestTaskSecurityCron(42, 100, model.CronCoverAll, nil) + setupRequestTaskSecurityFixture(t, []*model.Server{reporter}, []*model.Cron{victimCron}, map[uint64]model.UserInfo{ + 100: {Role: model.RoleMember}, + 200: {Role: model.RoleMember}, + }, map[string]uint64{"reporter-secret": 200}) + + runRequestTaskSecurityResult(t, "reporter-secret", reporter.UUID, cronTaskResult(victimCron.ID, true)) + + if cronLastResult(t, victimCron.ID) { + t.Fatal("foreign cron result must not update victim cron status") + } +} + +func TestRequestTaskSkipsCronResultOutsideReporterCover(t *testing.T) { + reporter := requestTaskSecurityServer(7, 200, "22222222-2222-2222-2222-222222222222") + coveredServerID := uint64(8) + cronTask := requestTaskSecurityCron(42, 200, model.CronCoverIgnoreAll, []uint64{coveredServerID}) + setupRequestTaskSecurityFixture(t, []*model.Server{reporter}, []*model.Cron{cronTask}, map[uint64]model.UserInfo{ + 200: {Role: model.RoleMember}, + }, map[string]uint64{"reporter-secret": 200}) + + runRequestTaskSecurityResult(t, "reporter-secret", reporter.UUID, cronTaskResult(cronTask.ID, true)) + + if cronLastResult(t, cronTask.ID) { + t.Fatal("cron result from a server outside cron cover must not update cron status") + } +} + +func TestRequestTaskSkipsCronCoverAllExcludedReporter(t *testing.T) { + reporter := requestTaskSecurityServer(7, 200, "88888888-8888-8888-8888-888888888888") + cronTask := requestTaskSecurityCron(42, 200, model.CronCoverAll, []uint64{reporter.ID}) + setupRequestTaskSecurityFixture(t, []*model.Server{reporter}, []*model.Cron{cronTask}, map[uint64]model.UserInfo{ + 200: {Role: model.RoleMember}, + }, map[string]uint64{"reporter-secret": 200}) + + runRequestTaskSecurityResult(t, "reporter-secret", reporter.UUID, cronTaskResult(cronTask.ID, true)) + + if cronLastResult(t, cronTask.ID) { + t.Fatal("cron result from a server excluded by CronCoverAll must not update cron status") + } +} + +func TestRequestTaskAllowsCronCoverAllReporter(t *testing.T) { + reporter := requestTaskSecurityServer(7, 200, "99999999-9999-9999-9999-999999999999") + cronTask := requestTaskSecurityCron(42, 200, model.CronCoverAll, []uint64{8}) + setupRequestTaskSecurityFixture(t, []*model.Server{reporter}, []*model.Cron{cronTask}, map[uint64]model.UserInfo{ + 200: {Role: model.RoleMember}, + }, map[string]uint64{"reporter-secret": 200}) + + runRequestTaskSecurityResult(t, "reporter-secret", reporter.UUID, cronTaskResult(cronTask.ID, true)) + + if !cronLastResult(t, cronTask.ID) { + t.Fatal("CronCoverAll reporter not in the exclusion list must update cron status") + } +} + +func TestRequestTaskAllowsCronResultForCoveredOwnerServer(t *testing.T) { + reporter := requestTaskSecurityServer(7, 200, "33333333-3333-3333-3333-333333333333") + cronTask := requestTaskSecurityCron(42, 200, model.CronCoverIgnoreAll, []uint64{reporter.ID}) + setupRequestTaskSecurityFixture(t, []*model.Server{reporter}, []*model.Cron{cronTask}, map[uint64]model.UserInfo{ + 200: {Role: model.RoleMember}, + }, map[string]uint64{"reporter-secret": 200}) + + runRequestTaskSecurityResult(t, "reporter-secret", reporter.UUID, cronTaskResult(cronTask.ID, true)) + + if !cronLastResult(t, cronTask.ID) { + t.Fatal("covered owner cron result must update cron status") + } +} + +func TestRequestTaskAllowsCronResultForCoveredAdminOwnedCron(t *testing.T) { + reporter := requestTaskSecurityServer(7, 200, "44444444-4444-4444-4444-444444444444") + cronTask := requestTaskSecurityCron(42, 1, model.CronCoverIgnoreAll, []uint64{reporter.ID}) + setupRequestTaskSecurityFixture(t, []*model.Server{reporter}, []*model.Cron{cronTask}, map[uint64]model.UserInfo{ + 1: {Role: model.RoleAdmin}, + 200: {Role: model.RoleMember}, + }, map[string]uint64{"reporter-secret": 200}) + + runRequestTaskSecurityResult(t, "reporter-secret", reporter.UUID, cronTaskResult(cronTask.ID, true)) + + if !cronLastResult(t, cronTask.ID) { + t.Fatal("covered admin-owned cron result must update cron status") + } +} + +func TestRequestTaskSkipsAlertTriggerCronResultFromUntriggeredReporter(t *testing.T) { + reporter := requestTaskSecurityServer(7, 200, "55555555-5555-5555-5555-555555555555") + triggerServer := requestTaskSecurityServer(8, 200, "66666666-6666-6666-6666-666666666666") + cronTask := requestTaskSecurityCron(42, 200, model.CronCoverAlertTrigger, nil) + setupRequestTaskSecurityFixture(t, []*model.Server{reporter, triggerServer}, []*model.Cron{cronTask}, map[uint64]model.UserInfo{ + 200: {Role: model.RoleMember}, + }, map[string]uint64{"reporter-secret": 200, "trigger-secret": 200}) + connectRequestTaskSecurityTaskStream(t, triggerServer.ID) + singleton.CronTrigger(cronTask, triggerServer.ID)() + + runRequestTaskSecurityResult(t, "reporter-secret", reporter.UUID, cronTaskResult(cronTask.ID, true)) + + if cronLastResult(t, cronTask.ID) { + t.Fatal("alert-trigger cron result from a non-triggered server must not update cron status") + } +} + +func TestRequestTaskAllowsAlertTriggerCronResultForTriggeredReporter(t *testing.T) { + reporter := requestTaskSecurityServer(7, 200, "77777777-7777-7777-7777-777777777777") + cronTask := requestTaskSecurityCron(42, 200, model.CronCoverAlertTrigger, nil) + setupRequestTaskSecurityFixture(t, []*model.Server{reporter}, []*model.Cron{cronTask}, map[uint64]model.UserInfo{ + 200: {Role: model.RoleMember}, + }, map[string]uint64{"reporter-secret": 200}) + connectRequestTaskSecurityTaskStream(t, reporter.ID) + singleton.CronTrigger(cronTask, reporter.ID)() + + runRequestTaskSecurityResult(t, "reporter-secret", reporter.UUID, cronTaskResult(cronTask.ID, true)) + + if !cronLastResult(t, cronTask.ID) { + t.Fatal("alert-trigger cron result from the triggered server must update cron status") + } +} + +func TestRequestTaskAllowsAlertTriggerCronResultReportedDuringSend(t *testing.T) { + reporter := requestTaskSecurityServer(7, 200, "aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa") + cronTask := requestTaskSecurityCron(42, 200, model.CronCoverAlertTrigger, nil) + setupRequestTaskSecurityFixture(t, []*model.Server{reporter}, []*model.Cron{cronTask}, map[uint64]model.UserInfo{ + 200: {Role: model.RoleMember}, + }, map[string]uint64{"reporter-secret": 200}) + connectRequestTaskSecurityTaskStreamWithSendHook(t, reporter.ID, nil, func(task *pb.Task) { + if task.GetId() != cronTask.ID { + t.Fatalf("expected alert-trigger task %d, got %d", cronTask.ID, task.GetId()) + } + runRequestTaskSecurityResult(t, "reporter-secret", reporter.UUID, cronTaskResult(cronTask.ID, true)) + }) + + singleton.CronTrigger(cronTask, reporter.ID)() + + if !cronLastResult(t, cronTask.ID) { + t.Fatal("alert-trigger cron result reported during Send must update cron status") + } +} + +func TestRequestTaskSkipsAlertTriggerCronResultAfterSendFailure(t *testing.T) { + reporter := requestTaskSecurityServer(7, 200, "bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb") + cronTask := requestTaskSecurityCron(42, 200, model.CronCoverAlertTrigger, nil) + setupRequestTaskSecurityFixture(t, []*model.Server{reporter}, []*model.Cron{cronTask}, map[uint64]model.UserInfo{ + 200: {Role: model.RoleMember}, + }, map[string]uint64{"reporter-secret": 200}) + connectRequestTaskSecurityTaskStreamWithSendHook(t, reporter.ID, errors.New("send failed"), nil) + singleton.CronTrigger(cronTask, reporter.ID)() + + runRequestTaskSecurityResult(t, "reporter-secret", reporter.UUID, cronTaskResult(cronTask.ID, true)) + + if cronLastResult(t, cronTask.ID) { + t.Fatal("alert-trigger cron result after failed dispatch must not update cron status") + } +} + +func setupRequestTaskSecurityFixture(t *testing.T, servers []*model.Server, crons []*model.Cron, users map[uint64]model.UserInfo, agentSecrets map[string]uint64) { + t.Helper() + + originalDB := singleton.DB + originalConf := singleton.Conf + originalLoc := singleton.Loc + originalServerShared := singleton.ServerShared + originalCronShared := singleton.CronShared + originalUserInfoMap := singleton.UserInfoMap + originalAgentSecretToUserID := singleton.AgentSecretToUserId + + db, err := gorm.Open(sqlite.Open(":memory:"), &gorm.Config{}) + if err != nil { + t.Fatal(err) + } + sqlDB, err := db.DB() + if err != nil { + t.Fatal(err) + } + sqlDB.SetMaxOpenConns(1) + + singleton.DB = db + singleton.Conf = &singleton.ConfigClass{Config: &model.Config{}} + singleton.Loc = time.UTC + if err := singleton.DB.AutoMigrate(model.Server{}, model.Cron{}); err != nil { + t.Fatal(err) + } + for _, server := range servers { + if err := singleton.DB.Create(server).Error; err != nil { + t.Fatal(err) + } + } + for _, cronTask := range crons { + if err := singleton.DB.Create(cronTask).Error; err != nil { + t.Fatal(err) + } + } + + singleton.UserLock.Lock() + singleton.UserInfoMap = users + singleton.AgentSecretToUserId = agentSecrets + singleton.UserLock.Unlock() + singleton.ServerShared = singleton.NewServerClass() + singleton.CronShared = singleton.NewCronClass() + + t.Cleanup(func() { + if singleton.CronShared != nil && singleton.CronShared.Cron != nil { + singleton.CronShared.Stop() + } + sqlDB.Close() + singleton.DB = originalDB + singleton.Conf = originalConf + singleton.Loc = originalLoc + singleton.ServerShared = originalServerShared + singleton.CronShared = originalCronShared + singleton.UserLock.Lock() + singleton.UserInfoMap = originalUserInfoMap + singleton.AgentSecretToUserId = originalAgentSecretToUserID + singleton.UserLock.Unlock() + }) +} + +func requestTaskSecurityServer(id, userID uint64, uuid string) *model.Server { + return &model.Server{ + Common: model.Common{ID: id, UserID: userID}, + UUID: uuid, + Name: "request-task-security-server", + } +} + +func requestTaskSecurityCron(id, userID uint64, cover uint8, servers []uint64) *model.Cron { + return &model.Cron{ + Common: model.Common{ID: id, UserID: userID}, + Name: "request-task-security-cron", + Command: "id", + Scheduler: "@every 1h", + Cover: cover, + Servers: servers, + } +} + +func cronTaskResult(cronID uint64, successful bool) *pb.TaskResult { + return &pb.TaskResult{ + Id: cronID, + Type: model.TaskTypeCommand, + Delay: 1, + Data: "cron result", + Successful: successful, + } +} + +func connectRequestTaskSecurityTaskStream(t *testing.T, serverID uint64) { + t.Helper() + + connectRequestTaskSecurityTaskStreamWithSendHook(t, serverID, nil, nil) +} + +func connectRequestTaskSecurityTaskStreamWithSendHook(t *testing.T, serverID uint64, sendErr error, onSend func(*pb.Task)) { + t.Helper() + + server, ok := singleton.ServerShared.Get(serverID) + if !ok { + t.Fatalf("server %d not found", serverID) + } + server.TaskStream = &requestTaskSecurityStream{ctx: context.Background(), sendErr: sendErr, onSend: onSend} +} + +func runRequestTaskSecurityResult(t *testing.T, secret string, uuid string, result *pb.TaskResult) { + t.Helper() + + stream := &requestTaskSecurityStream{ + ctx: metadata.NewIncomingContext(context.Background(), metadata.Pairs( + "client_secret", secret, + "client_uuid", uuid, + )), + results: []*pb.TaskResult{result}, + } + err := NewNezhaHandler().RequestTask(stream) + if !errors.Is(err, context.Canceled) { + t.Fatalf("expected RequestTask to finish after test result, got %v", err) + } +} + +func cronLastResult(t *testing.T, cronID uint64) bool { + t.Helper() + + var cronTask model.Cron + if err := singleton.DB.First(&cronTask, cronID).Error; err != nil { + t.Fatal(err) + } + return cronTask.LastResult +}
service/singleton/crontask.go+125 −3 modified@@ -5,6 +5,8 @@ import ( "fmt" "slices" "strings" + "sync" + "time" "github.com/jinzhu/copier" @@ -15,9 +17,13 @@ import ( pb "github.com/nezhahq/nezha/proto" ) +const alertTriggerCronResultAuthorizationTTL = 24 * time.Hour + type CronClass struct { class[uint64, *model.Cron] *cron.Cron + pendingAlertTriggerTasksMu sync.Mutex + pendingAlertTriggerTasks map[uint64]map[uint64][]time.Time } func NewCronClass() *CronClass { @@ -64,7 +70,8 @@ func NewCronClass() *CronClass { list: list, sortedList: sortedList, }, - Cron: cronx, + Cron: cronx, + pendingAlertTriggerTasks: make(map[uint64]map[uint64][]time.Time), } } @@ -78,6 +85,7 @@ func (c *CronClass) Update(cr *model.Cron) { delete(c.list, cr.ID) c.list[cr.ID] = cr c.listMu.Unlock() + c.deleteAlertTriggerCronResultAuthorizations([]uint64{cr.ID}) c.sortList() } @@ -92,6 +100,7 @@ func (c *CronClass) Delete(idList []uint64) { delete(c.list, id) } c.listMu.Unlock() + c.deleteAlertTriggerCronResultAuthorizations(idList) c.sortList() } @@ -130,6 +139,113 @@ func cronCanBeTriggeredByOwner(cr *model.Cron, triggerOwner uint64) bool { return cr.UserID == triggerOwner || userIsAdmin(triggerOwner) } +func CanReportCronResult(cr *model.Cron, reporter *model.Server) bool { + if cr == nil || reporter == nil || !cronCanSendToServer(cr, reporter) { + return false + } + if cr.Cover == model.CronCoverAll { + return !slices.Contains(cr.Servers, reporter.ID) + } + if cr.Cover == model.CronCoverIgnoreAll { + return slices.Contains(cr.Servers, reporter.ID) + } + if cr.Cover == model.CronCoverAlertTrigger { + return CronShared != nil && CronShared.consumeAlertTriggerCronResult(cr.ID, reporter.ID) + } + return false +} + +func (c *CronClass) reserveAlertTriggerCronResult(cronID uint64, serverID uint64) { + c.pendingAlertTriggerTasksMu.Lock() + defer c.pendingAlertTriggerTasksMu.Unlock() + + now := time.Now() + c.pruneExpiredAlertTriggerCronResultsLocked(now) + if c.pendingAlertTriggerTasks == nil { + c.pendingAlertTriggerTasks = make(map[uint64]map[uint64][]time.Time) + } + if c.pendingAlertTriggerTasks[cronID] == nil { + c.pendingAlertTriggerTasks[cronID] = make(map[uint64][]time.Time) + } + c.pendingAlertTriggerTasks[cronID][serverID] = append(c.pendingAlertTriggerTasks[cronID][serverID], now.Add(alertTriggerCronResultAuthorizationTTL)) +} + +func (c *CronClass) revokeAlertTriggerCronResult(cronID uint64, serverID uint64) { + c.pendingAlertTriggerTasksMu.Lock() + defer c.pendingAlertTriggerTasksMu.Unlock() + + serverTasks := c.pendingAlertTriggerTasks[cronID] + expiresAtList := serverTasks[serverID] + if len(expiresAtList) == 0 { + return + } + expiresAtList = expiresAtList[:len(expiresAtList)-1] + if len(expiresAtList) == 0 { + delete(serverTasks, serverID) + } else { + serverTasks[serverID] = expiresAtList + } + if len(serverTasks) == 0 { + delete(c.pendingAlertTriggerTasks, cronID) + } +} + +func (c *CronClass) consumeAlertTriggerCronResult(cronID uint64, serverID uint64) bool { + c.pendingAlertTriggerTasksMu.Lock() + defer c.pendingAlertTriggerTasksMu.Unlock() + + c.pruneExpiredAlertTriggerCronResultsLocked(time.Now()) + return c.consumeAlertTriggerCronResultLocked(cronID, serverID) +} + +func (c *CronClass) consumeAlertTriggerCronResultLocked(cronID uint64, serverID uint64) bool { + serverTasks := c.pendingAlertTriggerTasks[cronID] + expiresAtList := serverTasks[serverID] + if len(expiresAtList) == 0 { + return false + } + expiresAtList = expiresAtList[1:] + if len(expiresAtList) == 0 { + delete(serverTasks, serverID) + } else { + serverTasks[serverID] = expiresAtList + } + if len(serverTasks) == 0 { + delete(c.pendingAlertTriggerTasks, cronID) + } + return true +} + +func (c *CronClass) pruneExpiredAlertTriggerCronResultsLocked(now time.Time) { + for cronID, serverTasks := range c.pendingAlertTriggerTasks { + for serverID, expiresAtList := range serverTasks { + validExpiresAtList := expiresAtList[:0] + for _, expiresAt := range expiresAtList { + if expiresAt.After(now) { + validExpiresAtList = append(validExpiresAtList, expiresAt) + } + } + if len(validExpiresAtList) == 0 { + delete(serverTasks, serverID) + } else { + serverTasks[serverID] = validExpiresAtList + } + } + if len(serverTasks) == 0 { + delete(c.pendingAlertTriggerTasks, cronID) + } + } +} + +func (c *CronClass) deleteAlertTriggerCronResultAuthorizations(cronIDs []uint64) { + c.pendingAlertTriggerTasksMu.Lock() + defer c.pendingAlertTriggerTasksMu.Unlock() + + for _, cronID := range cronIDs { + delete(c.pendingAlertTriggerTasks, cronID) + } +} + func ManualTrigger(cr *model.Cron) { CronTrigger(cr)() } @@ -149,11 +265,17 @@ func CronTrigger(cr *model.Cron, triggerServer ...uint64) func() { return } if s.TaskStream != nil { - s.TaskStream.Send(&pb.Task{ + cronShared := CronShared + if cronShared != nil { + cronShared.reserveAlertTriggerCronResult(cr.ID, s.ID) + } + if err := s.TaskStream.Send(&pb.Task{ Id: cr.ID, Data: cr.Command, Type: model.TaskTypeCommand, - }) + }); err != nil && cronShared != nil { + cronShared.revokeAlertTriggerCronResult(cr.ID, s.ID) + } } else { // 保存当前服务器状态信息 curServer := model.Server{}
service/singleton/security_regression_test.go+545 −0 modified@@ -8,6 +8,11 @@ import ( "time" "github.com/gin-gonic/gin" + "github.com/patrickmn/go-cache" + "github.com/robfig/cron/v3" + "gorm.io/driver/sqlite" + "gorm.io/gorm" + "github.com/nezhahq/nezha/model" pb "github.com/nezhahq/nezha/proto" "google.golang.org/grpc/metadata" @@ -335,6 +340,241 @@ func TestSendTriggerTasksMixedCronIDsOnlyFiresAllowed(t *testing.T) { assertNoTask(t, stream) } +func TestAlertTriggerCronResultAuthorizationConsumesOneDispatch(t *testing.T) { + cronClass := &CronClass{} + cronClass.reserveAlertTriggerCronResult(42, 7) + cronClass.reserveAlertTriggerCronResult(42, 7) + + if !cronClass.consumeAlertTriggerCronResult(42, 7) { + t.Fatal("expected first alert-trigger authorization to be consumed") + } + if !cronClass.consumeAlertTriggerCronResult(42, 7) { + t.Fatal("expected second alert-trigger authorization to be consumed") + } + if cronClass.consumeAlertTriggerCronResult(42, 7) { + t.Fatal("expected alert-trigger authorization to be consumed only once per dispatch") + } +} + +func TestAlertTriggerCronResultAuthorizationExpires(t *testing.T) { + cronClass := &CronClass{ + pendingAlertTriggerTasks: map[uint64]map[uint64][]time.Time{ + 42: {7: {time.Now().Add(-time.Second)}}, + }, + } + + if cronClass.consumeAlertTriggerCronResult(42, 7) { + t.Fatal("expired alert-trigger authorization must not be accepted") + } + if len(cronClass.pendingAlertTriggerTasks) != 0 { + t.Fatal("expired alert-trigger authorization must be pruned") + } +} + +func TestAlertTriggerCronResultAuthorizationRevokeRemovesLatestDispatch(t *testing.T) { + existingAuthorizationExpiresAt := time.Now().Add(time.Hour) + cronClass := &CronClass{ + pendingAlertTriggerTasks: map[uint64]map[uint64][]time.Time{ + 42: {7: {existingAuthorizationExpiresAt}}, + }, + } + cronClass.reserveAlertTriggerCronResult(42, 7) + + cronClass.revokeAlertTriggerCronResult(42, 7) + + authorizations := cronClass.pendingAlertTriggerTasks[42][7] + if len(authorizations) != 1 { + t.Fatalf("expected one previous alert-trigger authorization to remain, got %d", len(authorizations)) + } + if !authorizations[0].Equal(existingAuthorizationExpiresAt) { + t.Fatal("send failure rollback must remove the newest reserved authorization") + } +} + +func TestCronClassUpdatePrunesAlertTriggerCronResultAuthorization(t *testing.T) { + cronClass := &CronClass{ + Cron: cron.New(cron.WithSeconds()), + class: class[uint64, *model.Cron]{ + list: map[uint64]*model.Cron{42: {Common: model.Common{ID: 42}}}, + }, + pendingAlertTriggerTasks: map[uint64]map[uint64][]time.Time{ + 42: {7: {time.Now().Add(time.Hour)}}, + }, + } + + cronClass.Update(&model.Cron{Common: model.Common{ID: 42}}) + + if len(cronClass.pendingAlertTriggerTasks) != 0 { + t.Fatal("cron update must prune old alert-trigger result authorizations") + } +} + +func TestCronClassDeletePrunesAlertTriggerCronResultAuthorization(t *testing.T) { + cronClass := &CronClass{ + Cron: cron.New(cron.WithSeconds()), + class: class[uint64, *model.Cron]{ + list: map[uint64]*model.Cron{42: {Common: model.Common{ID: 42}}}, + }, + pendingAlertTriggerTasks: map[uint64]map[uint64][]time.Time{ + 42: {7: {time.Now().Add(time.Hour)}}, + }, + } + + cronClass.Delete([]uint64{42}) + + if len(cronClass.pendingAlertTriggerTasks) != 0 { + t.Fatal("cron delete must prune alert-trigger result authorizations") + } +} + +// CanReportCronResult is the cron-side dual of canReportServiceResult: it gates +// agent-reported TaskTypeCommand results to only the cron/server pairs the +// dashboard actually fanned the task out to. Without these inbound checks any +// authenticated agent could fabricate a TaskResult for an arbitrary cron ID and +// poison LastResult / fire success/failure notifications belonging to another +// tenant. The tests below pin each Cover branch end-to-end against the dispatch +// logic in CronTrigger so the two sides stay symmetric. + +func TestCanReportCronResultRejectsNilCronOrReporter(t *testing.T) { + cr := &model.Cron{Common: model.Common{ID: 7, UserID: 100}, Cover: model.CronCoverAll} + reporter := &model.Server{Common: model.Common{ID: 1, UserID: 100}} + + if CanReportCronResult(nil, reporter) { + t.Fatal("nil cron must be rejected — would dereference inside cover branches") + } + if CanReportCronResult(cr, nil) { + t.Fatal("nil reporter must be rejected") + } +} + +func TestCanReportCronResultRejectsForeignReporter(t *testing.T) { + replaceUserInfoMapForSecurityTest(t, map[uint64]model.UserInfo{ + 100: {Role: model.RoleMember}, + 200: {Role: model.RoleMember}, + }) + + cr := &model.Cron{ + Common: model.Common{ID: 7, UserID: 100}, + Cover: model.CronCoverAll, + } + foreign := &model.Server{Common: model.Common{ID: 1, UserID: 200}} + + if CanReportCronResult(cr, foreign) { + t.Fatal("foreign-user reporter must be rejected: CronTrigger never dispatched to it") + } +} + +func TestCanReportCronResultCronCoverAllRejectsReporterInDenyList(t *testing.T) { + cr := &model.Cron{ + Common: model.Common{ID: 7, UserID: 100}, + Cover: model.CronCoverAll, + Servers: []uint64{1}, + } + reporter := &model.Server{Common: model.Common{ID: 1, UserID: 100}} + + if CanReportCronResult(cr, reporter) { + t.Fatal("CronCoverAll treats Servers as deny-list; reporter in the list must be rejected") + } +} + +func TestCanReportCronResultCronCoverAllAcceptsReporterNotInDenyList(t *testing.T) { + cr := &model.Cron{ + Common: model.Common{ID: 7, UserID: 100}, + Cover: model.CronCoverAll, + Servers: []uint64{99}, + } + reporter := &model.Server{Common: model.Common{ID: 1, UserID: 100}} + + if !CanReportCronResult(cr, reporter) { + t.Fatal("CronCoverAll with reporter NOT in Servers must accept — CronTrigger dispatches to it") + } +} + +func TestCanReportCronResultCronCoverIgnoreAllAcceptsReporterInAllowList(t *testing.T) { + cr := &model.Cron{ + Common: model.Common{ID: 7, UserID: 100}, + Cover: model.CronCoverIgnoreAll, + Servers: []uint64{1}, + } + reporter := &model.Server{Common: model.Common{ID: 1, UserID: 100}} + + if !CanReportCronResult(cr, reporter) { + t.Fatal("CronCoverIgnoreAll treats Servers as allow-list; reporter in the list must be accepted") + } +} + +func TestCanReportCronResultCronCoverIgnoreAllRejectsReporterOutsideAllowList(t *testing.T) { + cr := &model.Cron{ + Common: model.Common{ID: 7, UserID: 100}, + Cover: model.CronCoverIgnoreAll, + Servers: []uint64{99}, + } + reporter := &model.Server{Common: model.Common{ID: 1, UserID: 100}} + + if CanReportCronResult(cr, reporter) { + t.Fatal("CronCoverIgnoreAll with reporter NOT in Servers must reject — CronTrigger never dispatched to it") + } +} + +// failingTaskStream simulates a TaskStream whose Send always errors. CronTrigger +// uses this signal to revoke a reserved alert-trigger authorization, so the +// agent can't later attach to the cron via CanReportCronResult based on a +// dispatch that never actually reached the wire. +type failingTaskStream struct { + capturedTaskStream + sendErr error +} + +func newFailingTaskStream(err error) *failingTaskStream { + return &failingTaskStream{ + capturedTaskStream: capturedTaskStream{tasks: make(chan *pb.Task, 4)}, + sendErr: err, + } +} + +func (s *failingTaskStream) Send(task *pb.Task) error { + s.tasks <- task + return s.sendErr +} + +func TestCronTriggerRevokesAlertTriggerAuthorizationOnSendFailure(t *testing.T) { + failing := newFailingTaskStream(context.Canceled) + replaceServerSharedForSecurityTest(t, + &model.Server{Common: model.Common{ID: 7, UserID: 100}, Name: "broken-server", TaskStream: failing}, + ) + replaceUserInfoMapForSecurityTest(t, map[uint64]model.UserInfo{ + 100: {Role: model.RoleMember}, + }) + + originalCronShared := CronShared + t.Cleanup(func() { CronShared = originalCronShared }) + CronShared = &CronClass{ + class: class[uint64, *model.Cron]{list: map[uint64]*model.Cron{}}, + pendingAlertTriggerTasks: map[uint64]map[uint64][]time.Time{}, + } + + cr := &model.Cron{ + Common: model.Common{ID: 42, UserID: 100}, + Cover: model.CronCoverAlertTrigger, + } + + CronTrigger(cr, 7)() + + // drain the dispatched task — Send error is what we care about, not the payload + select { + case <-failing.tasks: + case <-time.After(time.Second): + t.Fatal("expected CronTrigger to call Send before reacting to the error") + } + + if CronShared.consumeAlertTriggerCronResult(42, 7) { + t.Fatal("Send failure must revoke the reserved alert-trigger authorization; otherwise a foreign agent could later report a result for a dispatch that never reached the wire") + } + if len(CronShared.pendingAlertTriggerTasks) != 0 { + t.Fatalf("expected pendingAlertTriggerTasks to be empty after revoke, got %d entries", len(CronShared.pendingAlertTriggerTasks)) + } +} + func TestClassCheckPermission(t *testing.T) { replaceUserInfoMapForSecurityTest(t, map[uint64]model.UserInfo{ 1: {Role: model.RoleAdmin}, @@ -377,3 +617,308 @@ func TestClassCheckPermission(t *testing.T) { t.Fatal("expected admin to access any resource") } } + +func TestServiceMonitorResultSkipsReporterOutsideServiceCover(t *testing.T) { + ss := newServiceMonitorSecurityHarness(t, + &model.Server{Common: model.Common{ID: 1, UserID: 100}, Name: "covered-server"}, + &model.Server{Common: model.Common{ID: 2, UserID: 100}, Name: "uncovered-server"}, + ) + addServiceMonitorSecurityService(t, ss, &model.Service{ + Common: model.Common{ID: 10, UserID: 100}, + Name: "selected-only-service", + Type: model.TaskTypeTCPPing, + Target: "example.invalid:443", + Duration: 3600, + Cover: model.ServiceCoverIgnoreAll, + SkipServers: map[uint64]bool{1: true}, + }) + + ss.Dispatch(serviceMonitorResult(2, 10, model.TaskTypeTCPPing, true)) + ss.Dispatch(serviceMonitorResult(1, 10, model.TaskTypeTCPPing, true)) + + waitForServiceHistory(t, 10, 1) + assertNoServiceHistory(t, 10, 2) +} + +func TestServiceMonitorResultSkipsCoveredReporterOwnedByAnotherUser(t *testing.T) { + ss := newServiceMonitorSecurityHarness(t, + &model.Server{Common: model.Common{ID: 1, UserID: 100}, Name: "owner-server"}, + &model.Server{Common: model.Common{ID: 2, UserID: 200}, Name: "foreign-server"}, + ) + addServiceMonitorSecurityService(t, ss, &model.Service{ + Common: model.Common{ID: 10, UserID: 100}, + Name: "owner-only-service", + Type: model.TaskTypeTCPPing, + Target: "example.invalid:443", + Duration: 3600, + Cover: model.ServiceCoverIgnoreAll, + SkipServers: map[uint64]bool{1: true, 2: true}, + }) + + ss.Dispatch(serviceMonitorResult(2, 10, model.TaskTypeTCPPing, true)) + ss.Dispatch(serviceMonitorResult(1, 10, model.TaskTypeTCPPing, true)) + + waitForServiceHistory(t, 10, 1) + assertNoServiceHistory(t, 10, 2) +} + +func TestServiceMonitorResultSkipsMismatchedTaskType(t *testing.T) { + ss := newServiceMonitorSecurityHarness(t, + &model.Server{Common: model.Common{ID: 1, UserID: 100}, Name: "owner-server"}, + ) + addServiceMonitorSecurityService(t, ss, &model.Service{ + Common: model.Common{ID: 10, UserID: 100}, + Name: "http-service", + Type: model.TaskTypeHTTPGet, + Target: "https://example.invalid", + Duration: 3600, + Cover: model.ServiceCoverIgnoreAll, + SkipServers: map[uint64]bool{1: true}, + }) + + ss.Dispatch(serviceMonitorResult(1, 10, model.TaskTypeTCPPing, false)) + ss.Dispatch(serviceMonitorResult(1, 10, model.TaskTypeHTTPGet, true)) + + waitForTodayStats(t, ss, 10, 1, 0) +} + +func TestServiceMonitorResultSkipsUnknownReporter(t *testing.T) { + ss := newServiceMonitorSecurityHarness(t, + &model.Server{Common: model.Common{ID: 1, UserID: 100}, Name: "owner-server"}, + ) + addServiceMonitorSecurityService(t, ss, &model.Service{ + Common: model.Common{ID: 10, UserID: 100}, + Name: "known-reporter-service", + Type: model.TaskTypeTCPPing, + Target: "example.invalid:443", + Duration: 3600, + Cover: model.ServiceCoverIgnoreAll, + SkipServers: map[uint64]bool{1: true}, + }) + + ss.Dispatch(serviceMonitorResult(999, 10, model.TaskTypeTCPPing, true)) + ss.Dispatch(serviceMonitorResult(1, 10, model.TaskTypeTCPPing, true)) + + waitForServiceHistory(t, 10, 1) + assertNoServiceHistory(t, 10, 999) +} + +func TestServiceMonitorResultAllowsCoveredReporterOwnedByServiceOwner(t *testing.T) { + ss := newServiceMonitorSecurityHarness(t, + &model.Server{Common: model.Common{ID: 1, UserID: 100}, Name: "owner-server"}, + ) + addServiceMonitorSecurityService(t, ss, &model.Service{ + Common: model.Common{ID: 10, UserID: 100}, + Name: "owner-service", + Type: model.TaskTypeTCPPing, + Target: "example.invalid:443", + Duration: 3600, + Cover: model.ServiceCoverIgnoreAll, + SkipServers: map[uint64]bool{1: true}, + }) + + ss.Dispatch(serviceMonitorResult(1, 10, model.TaskTypeTCPPing, true)) + + waitForServiceHistory(t, 10, 1) +} + +func TestServiceMonitorResultAllowsCoveredReporterForAdminOwnedService(t *testing.T) { + replaceUserInfoMapForSecurityTest(t, map[uint64]model.UserInfo{ + 1: {Role: model.RoleAdmin}, + 200: {Role: model.RoleMember}, + }) + ss := newServiceMonitorSecurityHarness(t, + &model.Server{Common: model.Common{ID: 2, UserID: 200}, Name: "member-server"}, + ) + addServiceMonitorSecurityService(t, ss, &model.Service{ + Common: model.Common{ID: 10, UserID: 1}, + Name: "admin-service", + Type: model.TaskTypeTCPPing, + Target: "example.invalid:443", + Duration: 3600, + Cover: model.ServiceCoverIgnoreAll, + SkipServers: map[uint64]bool{2: true}, + }) + + ss.Dispatch(serviceMonitorResult(2, 10, model.TaskTypeTCPPing, true)) + + waitForServiceHistory(t, 10, 2) +} + +func newServiceMonitorSecurityHarness(t *testing.T, servers ...*model.Server) *ServiceSentinel { + t.Helper() + + originalDB := DB + originalConf := Conf + originalCache := Cache + originalCronShared := CronShared + originalServerShared := ServerShared + originalServiceSentinelShared := ServiceSentinelShared + originalNotificationShared := NotificationShared + originalTSDBShared := TSDBShared + originalLoc := Loc + var sqlDBClose func() error + + t.Cleanup(func() { + DB = originalDB + Conf = originalConf + Cache = originalCache + CronShared = originalCronShared + ServerShared = originalServerShared + ServiceSentinelShared = originalServiceSentinelShared + NotificationShared = originalNotificationShared + TSDBShared = originalTSDBShared + Loc = originalLoc + if sqlDBClose != nil { + _ = sqlDBClose() + } + }) + + db, err := gorm.Open(sqlite.Open("file:"+t.Name()+"?mode=memory&cache=shared"), &gorm.Config{}) + if err != nil { + t.Fatal(err) + } + sqlDB, err := db.DB() + if err != nil { + t.Fatal(err) + } + sqlDB.SetMaxOpenConns(1) + sqlDBClose = sqlDB.Close + DB = db + if err := DB.AutoMigrate( + model.Server{}, + model.Service{}, + model.ServiceHistory{}, + model.Notification{}, + model.NotificationGroup{}, + model.NotificationGroupNotification{}, + ); err != nil { + t.Fatal(err) + } + + Conf = &ConfigClass{Config: &model.Config{AvgPingCount: 1}} + Cache = cache.New(time.Minute, time.Minute) + CronShared = &CronClass{ + Cron: cron.New(cron.WithSeconds()), + class: class[uint64, *model.Cron]{list: map[uint64]*model.Cron{}}, + } + NotificationShared = &NotificationClass{ + class: class[uint64, *model.Notification]{list: map[uint64]*model.Notification{}}, + groupToIDList: map[uint64]map[uint64]*model.Notification{}, + idToGroupList: map[uint64]map[uint64]struct{}{}, + groupList: map[uint64]string{}, + } + TSDBShared = nil + Loc = time.UTC + + serverClass := &ServerClass{ + class: class[uint64, *model.Server]{ + list: make(map[uint64]*model.Server), + }, + uuidToID: make(map[string]uint64), + } + for _, server := range servers { + serverClass.list[server.ID] = server + } + ServerShared = serverClass + + bus := make(chan *model.Service, 1) + ss, err := NewServiceSentinel(bus) + if err != nil { + t.Fatal(err) + } + ServiceSentinelShared = ss + return ss +} + +func addServiceMonitorSecurityService(t *testing.T, ss *ServiceSentinel, service *model.Service) { + t.Helper() + + if err := DB.Create(service).Error; err != nil { + t.Fatal(err) + } + if err := ss.Update(service); err != nil { + t.Fatal(err) + } +} + +func serviceMonitorResult(reporter, serviceID uint64, taskType uint8, successful bool) ReportData { + return ReportData{ + Reporter: reporter, + Data: &pb.TaskResult{ + Id: serviceID, + Type: uint64(taskType), + Delay: 12, + Data: "service monitor result", + Successful: successful, + }, + } +} + +func waitForServiceHistory(t *testing.T, serviceID, serverID uint64) { + t.Helper() + + deadline := time.After(time.Second) + for { + var count int64 + if err := DB.Model(&model.ServiceHistory{}). + Where("service_id = ? AND server_id = ?", serviceID, serverID). + Count(&count).Error; err != nil { + t.Fatal(err) + } + if count > 0 { + return + } + + select { + case <-deadline: + t.Fatalf("expected service history for service %d from server %d", serviceID, serverID) + default: + time.Sleep(10 * time.Millisecond) + } + } +} + +func assertNoServiceHistory(t *testing.T, serviceID, serverID uint64) { + t.Helper() + + var count int64 + if err := DB.Model(&model.ServiceHistory{}). + Where("service_id = ? AND server_id = ?", serviceID, serverID). + Count(&count).Error; err != nil { + t.Fatal(err) + } + if count != 0 { + t.Fatalf("expected no service history for service %d from server %d, got %d", serviceID, serverID, count) + } +} + +func waitForTodayStats(t *testing.T, ss *ServiceSentinel, serviceID uint64, wantUp, wantDown uint64) { + t.Helper() + + deadline := time.After(time.Second) + for { + ss.serviceResponseDataStoreLock.RLock() + stats := ss.serviceStatusToday[serviceID] + var up, down uint64 + if stats != nil { + up = stats.Up + down = stats.Down + } + ss.serviceResponseDataStoreLock.RUnlock() + + if up == wantUp && down == wantDown { + return + } + if down > wantDown { + t.Fatalf("expected service %d down count %d, got %d", serviceID, wantDown, down) + } + + select { + case <-deadline: + t.Fatalf("expected service %d stats up=%d down=%d", serviceID, wantUp, wantDown) + default: + time.Sleep(10 * time.Millisecond) + } + } +}
service/singleton/servicesentinel.go+25 −4 modified@@ -472,16 +472,37 @@ func (ss *ServiceSentinel) CheckPermission(c *gin.Context, idList iter.Seq[uint6 return true } +func canReportServiceResult(service *model.Service, reporter *model.Server, taskType uint64) bool { + if service == nil || reporter == nil || uint64(service.Type) != taskType { + return false + } + switch service.Cover { + case model.ServiceCoverAll: + if service.SkipServers[reporter.ID] { + return false + } + case model.ServiceCoverIgnoreAll: + if !service.SkipServers[reporter.ID] { + return false + } + default: + return false + } + + return service.UserID == reporter.UserID || userIsAdmin(service.UserID) +} + // worker 服务监控的实际工作流程 func (ss *ServiceSentinel) worker() { // 从服务状态汇报管道获取汇报的服务数据 for r := range ss.serviceReportChannel { - css, _ := ss.Get(r.Data.GetId()) - if css == nil || css.ID == 0 { + cs, _ := ss.Get(r.Data.GetId()) + reporter, _ := ServerShared.Get(r.Reporter) + // 入站结果必须匹配出站任务派发边界,避免 agent 伪造其他服务 ID 写入监控状态。 + if !canReportServiceResult(cs, reporter, r.Data.GetType()) { log.Printf("NEZHA>> Incorrect service monitor report %+v", r) continue } - css = nil mh := r.Data if mh.Type == model.TaskTypeTCPPing || mh.Type == model.TaskTypeICMPPing { @@ -607,7 +628,7 @@ func (ss *ServiceSentinel) worker() { ss.serviceCurrentStatusData[mh.GetId()].result = ss.serviceCurrentStatusData[mh.GetId()].result[:0] } - cs, _ := ss.Get(mh.GetId()) + cs, _ = ss.Get(mh.GetId()) m := ServerShared.GetList() // 延迟报警 if mh.Delay > 0 {
Vulnerability mechanics
Root cause
"Missing authorization checks in the service-monitor result worker allow an authenticated agent to submit forged monitoring results for any existing service ID without verifying coverage, ownership, or task assignment."
Attack vector
An attacker with a valid agent secret and one registered agent can forge `TaskResult` messages for any existing service ID. The gRPC stream authenticates the agent and binds the reporter server ID from the stream metadata, but the service-monitor worker at `service/singleton/servicesentinel.go:475-483` accepts the result solely based on whether the service ID exists [ref_id=1][ref_id=2]. The attacker controls the `TaskResult.id`, `type`, `delay`, `data`, and `successful` fields via `proto/nezha.proto:60-65`. No dashboard administrator privileges are required.
Affected code
The vulnerability is in the service-monitor result worker at `service/singleton/servicesentinel.go:475-483`, which checks only that the reported service ID exists but does not verify that the reporter server is covered by that service, belongs to the service owner, or was actually assigned the monitoring task. The inbound gRPC path at `service/rpc/nezha.go:85-90` dispatches `TaskResult` messages to this worker using the authenticated server ID as reporter without any authorization checks.
What the fix does
The patch introduces `CanReportServiceResult` and `CanReportCronResult` functions that gate inbound results on the same coverage and ownership checks already enforced during outbound dispatch. For service-monitor results, the worker now verifies that the reporter server is covered by the service's `Cover`/`SkipServers` configuration and that the reporter belongs to the service owner (or the service owner is an admin). For cron results, a new `pendingAlertTriggerTasks` map tracks which server/cron pairs the dashboard actually dispatched, and results are accepted only when a matching authorization token is consumed. The patch also adds comprehensive regression tests covering all cover modes and ownership boundaries.
Preconditions
- authAttacker must have a valid Nezha account with an agent secret
- configAttacker must have at least one registered agent/server
- networkAttacker must be able to establish a gRPC RequestTask stream to the dashboard
- inputThe attacker-controlled TaskResult must reference an existing service ID
Generated on Jun 1, 2026. Inputs: CWE entries + fix-commit diffs from this CVE's patches. Citations validated against bundle.
References
3News mentions
0No linked articles in our index yet.