
本文介绍一种通用、可扩展的递归方法,将具有深层嵌套结构(如按地域层级展开)的字典列表扁平化为单一层级的字典列表,保留关键字段(person、city、address、facebooklink),并自动提取每层的业务数据。
在处理地理层级、组织架构或树状分类等嵌套 jsON 数据时,常遇到类似如下结构:顶层是国家,其下以键名(如 “united states”)存储子列表,每个子项又包含同构字段及下一级嵌套键(如 “ohio” → “clevland” → “Street A”)。目标不是简单展开数组,而是逐层提取有效业务对象,忽略作为容器的动态键名,仅保留含 person、city、address、facebooklink 等语义字段的字典。
以下是一个健壮、可读性强的递归实现:
def flatten_objects(data): """ 递归扁平化嵌套字典列表。 假设每个有效节点都包含 person/city/address/facebooklink 字段; 动态键(如 "united states", "ohio")对应子列表,需递归处理。 """ result = [] # 支持输入为单个 dict 或 list of dict if isinstance(data, dict): data = [data] for item in data: # 提取当前层级的业务字段(非嵌套值) base_fields = {} nested_lists = {} for key, value in item.items(): # 若 value 是 list 且所有元素均为 dict,则视为嵌套子结构 if isinstance(value, list) and value and all(isinstance(v, dict) for v in value): nested_lists[key] = value else: base_fields[key] = value # 当前层级有有效字段 → 保存 if base_fields: result.append(base_fields) # 递归处理每个嵌套列表 for sublist in nested_lists.values(): result.extend(flatten_objects(sublist)) return result
✅ 使用示例:
nested_data = [ { "person": "abc", "city": "united states", "facebooklink": "link", "address": "united states", "united states": [ { "person": "cdf", "city": "ohio", "facebooklink": "link", "address": "united states/ohio", "ohio": [ { "person": "efg", "city": "clevland", "facebooklink": "link", "address": "united states/ohio/clevland", "clevland": [ { "person": "jkl", "city": "Street A", "facebooklink": "link", "address": "united states/ohio/clevland/Street A", "Street A": [ { "person": "jkl", "city": "House 1", "facebooklink": "link", "address": "united states/ohio/clevland/Street A/House 1" } ] } ] }, { "person": "ghi", "city": "columbus", "facebooklink": "link", "address": "united states/ohio/columbus" } ] }, { "person": "abc", "city": "washington", "facebooklink": "link", "address": "united states/washington" } ] } ] flattened = flatten_objects(nested_data) for obj in flattened: print(obj)
⚠️ 注意事项:
立即学习“Python免费学习笔记(深入)”;
- 该函数不依赖外部库(如 flatten_json),避免因键名动态性导致的路径解析失败;
- 判断嵌套的标准是:value 为非空 list,且所有元素均为 dict —— 这能准确区分数据容器与普通字段(如 “facebooklink”: “link”);
- 若原始数据中存在同名字段(如某层 “address” 是字符串,另一层是对象),需提前清洗,本函数默认按字符串/基础类型处理;
- 时间复杂度为 O(N),其中 N 是所有嵌套字典节点总数;空间复杂度为 O(D),D 为最大嵌套深度(递归栈开销)。
? 进阶建议:如需保留层级路径信息(例如增加 “level”: 2, “parent”: “ohio” 字段),可在递归调用时传入上下文参数;若需支持异构结构(混合 list/dict/str),可进一步增强类型判断逻辑。但对本文所示的典型地域树结构,上述实现已简洁、高效且易于维护。