MISSING_AGGREGATION 错误类
非聚合表达式 <expression>
基于不参与 GROUP BY 子句的列。
将列或表达式添加到 GROUP BY、聚合表达式,或者在不关心返回组中哪些值时使用 <expressionAnyValue>
。
参数
- expression:
SELECT
列表中的非聚合、非分组表达式。 - expressionAnyValue:包装在 any_value() 聚合函数中的
expression
。
说明
在包含 GROUP BY 子句的查询的上下文中,SELECT 列表中的局部列引用必须是:
- 用作聚合函数的参数,或
- 与
GROUP BY
子句中的表达式匹配的表达式的一部分。
换言之:列引用必须是分组键的一部分,或者必须是聚合的一部分。
Azure Databricks 会尽力而为匹配表达式:例如,它将 SELECT c1 + 5 FROM T GROUP BY 5 + c1
识别为匹配的表达式。
但 SELECT c1 FROM T GROUP BY c1 + 5
不是匹配项。
缓解措施
错误的缓解方法取决于具体原因:
是否缺少分组列?
将
expression
或将expression
的相关子表达式添加到GROUP BY
子句。GROUP BY
表达式的列引用部分是否不同于epression
?匹配
SELECT
列表中的表达式或简化GROUP BY
表达式。是否缺少聚合?
使用聚合函数包装列引用。 如果只需要组中的代表性值,则可以使用 any_value(epression)。
示例
-- Sample data
> CREATE OR REPLACE TEMPORARY VIEW tasks(name, firstname, task, cost) AS
VALUES ('Smith' , 'Sam' , 'UNPIVOT', 10),
('Smith' , 'Sam' , 'LATERAL', 5),
('Shuster', 'Sally' , 'DELETE' , 7),
('Shuster', 'Sally' , 'GRANT' , 8);
-- `name` and `firstname` are part of teh group by coumns, but incomplete
> SELECT name, firstname, sum(cost) FROM tasks GROUP BY firstname || ' ' || name;
[MISSING_AGGREGATION] The expression "name" is neither present in the group by, nor is it an aggregate function.
-- Match the GROUP BY expression
> SELECT firstname || ' ' || name, sum(cost) FROM tasks GROUP BY firstname || ' ' || name;
Sam Smith 15
Sally Shuster 15
-- Break up the GROUP BY expression
> SELECT firstname, name, sum(cost) FROM tasks GROUP BY firstname, name;
Sam Smith 15
Sally Shuster 15
-- Missing grouping column
> SELECT name, firstname, sum(cost) FROM tasks GROUP BY name;
[MISSING_AGGREGATION] The expression "firstname" is neither present in the group by, nor is it an aggregate function.
-- Add the grouping column
> SELECT firstname, name, sum(cost) FROM tasks GROUP BY firstname, name;
Sam Smith 15
Sally Shuster 15
-- Missing aggregate
> SELECT firstname, name, sum(cost), task FROM tasks GROUP BY firstname, name;
[MISSING_AGGREGATION] The expression "task" is neither present in the group by, nor is it an aggregate function.
-- Add an aggregate
> SELECT firstname, name, sum(cost), array_agg(task) FROM tasks GROUP BY firstname, name;
Sam Smith 15 ["UNPIVOT","LATERAL"]
Sally Shuster 15 ["DELETE","GRANT"]
-- Return any task
> SELECT firstname, name, sum(cost), any_value(task) FROM tasks GROUP BY firstname, name;
Sam Smith 15 LATERAL
Sally Shuster 15 DELETE