标签传播

修改密码

提交

Change Email

Submit

修改昵称

当前昵称：

提交

基础信息

用户邮箱：

用户昵称：
手机号：
公司名称：
公司邮箱:

修改密码

申请证书

当前未申请证书.

申请证书

Certificate	Issued at	Valid until	Serial No.	File

Serial No.	Valid until	File

Not having one? Apply now! >>>

ProductName	CreateTime	ID	Price	File

ProductName	CreateTime	ID	Price	File

No Invoice

创建嬴图账号

我已阅读并同意隐私政策。

请勾选表示您已阅读并同意

已有嬴图账号？去登录！

忘记密码

重置密码

返回登录

标签传播

HDC

概述

标签传播算法（LPA, Label Propagation Algorithm）是一种基于标签传播的社区识别算法。每个节点初始化标签后，在算法的每次迭代中，每个节点都会将其标签更新为在其邻居中最普遍的标签。这个迭代过程会使连接密集的节点拥有同一标签，有相同标签的节点继而形成一个社区。

LPA不要求预先设定要划分的社区数量，也不将任何用于衡量社区划分优劣的标准作为目标，相反，它仅利用网络结构来引导其进程，这使得LPA能够有效地在大型和复杂的网络中运行。

算法的相关资料如下：

U.N. Raghavan, R. Albert, S. Kumara, Near linear time algorithm to detect community structures in large-scale networks (2007)

基本概念

在最简单的场景中，每次迭代传播时，每个节点都会将其标签更新为其最多邻居拥有的标签。

如下例所示，蓝色节点的标签将从d更新为c。

考虑节点和边权重时，标签权重则等于相应节点和边权重的乘积之和，每个节点将其标签更新为权重最大的那一个。

以下的示例中包含节点和边的权重，蓝色节点的标签将从d更新为a。

多标签传播

在多标签传播中，每个节点在传播过程中会保留多个标签。在这种情况下，节点的每个标签会关联一个与其权重成正比的标签概率，而每个节点的标签概率总和保持为1。

在下面的示例中，每个节点保留2个标签，其概率写在标签旁，蓝色节点的标签将从d, c更新为a, c，标签概率为P_a = 6.3/(6.3+1.85) = 0.77和P_c = 1.85/(6.3+1.85) = 0.23。

特殊说明

LPA忽略边的方向，按照无向边进行计算。
具有自环边的节点能将自身当前的标签传播给自身，并且每条自环边计算两次。
LPA在更新节点标签时遵循同步更新原则，即所有节点会根据其邻居的标签同时更新其标签。但在某些情况下可能会发生标签振荡，这多见于二分图中。为了解决这个问题，LPA包含一个中断机制，可以检测并防止标签过度振荡。
受节点顺序、同权重标签的随机选取以及并行计算等因素的影响，标签传播算法的社区划分结果可能会不同。

示例图

在一个空图中运行以下语句定义图结构并插入数据：

ALTER GRAPH CURRENT_GRAPH ADD NODE {
  user ({interest string, level int32})
};
ALTER GRAPH CURRENT_GRAPH ADD EDGE {
  connect ()-[{strength int32}]->()
};
INSERT (A:user {_id:"A",interest:"flute",level:2}),
       (B:user {_id:"B",interest:"football",level:4}),
       (C:user {_id:"C",interest:"piano",level:4}),
       (D:user {_id:"D",interest:"violin",level:2}),
       (E:user {_id:"E",interest:"piano",level:4}),
       (F:user {_id:"F",interest:"movie",level:1}),
       (G:user {_id:"G",interest:"piano",level:4}),
       (H:user {_id:"H",interest:"tennis",level:2}),
       (I:user {_id:"I",interest:"violin",level:3}),
       (J:user {_id:"J",interest:"badminton",level:5}),
       (K:user {_id:"K",interest:"swimming",level:4}),
       (L:user {_id:"L",interest:"cello",level:1}),
       (M:user {_id:"M",interest:"saxophone",level:2}),
       (N:user {_id:"N",interest:"novel",level:3}),
       (O:user {_id:"O",interest:"swimming",level:3}),
       (A)-[:connect {strength:3}]->(B),
       (A)-[:connect {strength:5}]->(C),
       (A)-[:connect {strength:8}]->(F),
       (A)-[:connect {strength:6}]->(K),
       (B)-[:connect {strength:2}]->(C),
       (C)-[:connect {strength:9}]->(D),
       (D)-[:connect {strength:5}]->(A),
       (D)-[:connect {strength:6}]->(E),
       (E)-[:connect {strength:5}]->(A),
       (F)-[:connect {strength:9}]->(G),
       (F)-[:connect {strength:4}]->(J),
       (G)-[:connect {strength:10}]->(H),
       (H)-[:connect {strength:3}]->(F),
       (I)-[:connect {strength:2}]->(F),
       (I)-[:connect {strength:4}]->(H),
       (J)-[:connect {strength:1}]->(I),
       (K)-[:connect {strength:1}]->(F),
       (K)-[:connect {strength:10}]->(N),
       (L)-[:connect {strength:1}]->(M),
       (L)-[:connect {strength:4}]->(N),
       (M)-[:connect {strength:10}]->(K),
       (M)-[:connect {strength:8}]->(N),
       (N)-[:connect {strength:4}]->(M),
       (O)-[:connect {strength:1}]->(N);

create().node_schema("user").edge_schema("connect");
create().node_property(@user,"interest",string).node_property(@user,"level",int32).edge_property(@connect,"strength",int32);
insert().into(@user).nodes([{_id:"A",interest:"flute",level:2}, {_id:"B",interest:"football",level:4}, {_id:"C",interest:"piano",level:4}, {_id:"D",interest:"violin",level:2}, {_id:"E",interest:"piano",level:4}, {_id:"F",interest:"movie",level:1}, {_id:"G",interest:"piano",level:4}, {_id:"H",interest:"tennis",level:2}, {_id:"I",interest:"violin",level:3}, {_id:"J",interest:"badminton",level:5}, {_id:"K",interest:"swimming",level:4}, {_id:"L",interest:"cello",level:1}, {_id:"M",interest:"saxophone",level:2}, {_id:"N",interest:"novel",level:3}, {_id:"O",interest:"swimming",level:3}]);
insert().into(@connect).edges([{_from:"A",_to:"B",strength:3}, {_from:"A",_to:"C",strength:5}, {_from:"A",_to:"F",strength:8}, {_from:"A",_to:"K",strength:6}, {_from:"B",_to:"C",strength:2}, {_from:"C",_to:"D",strength:9}, {_from:"D",_to:"A",strength:5}, {_from:"D",_to:"E",strength:6}, {_from:"E",_to:"A",strength:5}, {_from:"F",_to:"G",strength:9}, {_from:"F",_to:"J",strength:4}, {_from:"G",_to:"H",strength:10}, {_from:"H",_to:"F",strength:3}, {_from:"I",_to:"H",strength:4}, {_from:"I",_to:"F",strength:2}, {_from:"J",_to:"I",strength:1}, {_from:"K",_to:"F",strength:1}, {_from:"K",_to:"N",strength:10}, {_from:"L",_to:"M",strength:1}, {_from:"L",_to:"N",strength:4}, {_from:"M",_to:"N",strength:8}, {_from:"M",_to:"K",strength:10}, {_from:"N",_to:"M",strength:4}, {_from:"O",_to:"N",strength:1}]);

在HDC图上运行算法

创建HDC图

将当前图集全部加载到HDC服务器hdc-server-1上，并命名为 my_hdc_graph：

CREATE HDC GRAPH my_hdc_graph ON "hdc-server-1" OPTIONS {
  nodes: {"*": ["*"]},
  edges: {"*": ["*"]},
  direction: "undirected",
  load_id: true,
  update: "static"
}

hdc.graph.create("my_hdc_graph", {
  nodes: {"*": ["*"]},
  edges: {"*": ["*"]},
  direction: "undirected",
  load_id: true,
  update: "static"
}).to("hdc-server-1")

参数

算法名：lpa

参数名	类型	规范	默认值	可选	描述
`node_label_property`	"`<@schema.?><property>`"	/	/	是	用于初始化节点标签的数值类型或字符串类型点属性；不包含指定属性的点将被忽略。若未设置，则由系统自动生成标签
`node_weight_property`	"`<@schema.?><property>`"	/	/	是	作为点权重的数值类型点属性
`edge_weight_property`	"`<@schema.?><property>`"	/	/	是	作为边权重的数值类型边属性
`loop_num`	Integer	≥1	`5`	是	传播迭代次数
`k`	Integer	≥1	`1`	是	指定计算结束时每个节点保留的最大标签数量，所有标签按概率降序排序
`return_id_uuid`	String	`uuid`, `id`, `both`	`uuid`	是	在结果中使用`_uuid`、`_id`或同时使用两者来表示点

文件回写

CALL algo.lpa.write("my_hdc_graph", {
  return_id_uuid: "id",
  k: 2,
  loop_num: 5,
  edge_weight_property: 'strength'
}, {
  file: {
    filename: "lpa"
  }
})

algo(lpa).params({
  projection: "my_hdc_graph",
  return_id_uuid: "id",
  k: 2,
  loop_num: 5,
  edge_weight_property: 'strength'
}).write({
  file: {
    filename: "lpa"
  }
})

数据库回写

将结果中的label_<N>与对应的probability_<N>值写入指定点属性。对应属性类型分别为string和float。

CALL algo.lpa.write("my_hdc_graph", {
  node_label_property: 'interest',
  k: 2,
  loop_num: 10
}, {
  db: {
    property: "lab"
  }
})

algo(lpa).params({
  projection: "my_hdc_graph",
  node_label_property: 'interest',
  k: 2,
  loop_num: 10
}).write({
  db: {
    property: "lab"
  }
})

各点的标签和标签概率被写入新属性lab_1，probability_1，lab_2和probability_2中。

统计回写

CALL algo.lpa.write("my_hdc_graph", {
  node_label_property: 'interest',
  k: 2,
  loop_num: 10
}, {
  stats: {}
})

algo(lpa).params({
  projection: "my_hdc_graph",
  node_label_property: 'interest',
  k: 2,
  loop_num: 10
}).write({
  stats:{}
})

结果：

label_count
6

完整返回

CALL algo.lpa.run("my_hdc_graph", {
  return_id_uuid: "id",
  node_label_property: "@user.interest",
  k: 2
}) YIELD r
RETURN r

exec{
  algo(lpa).params({
    return_id_uuid: "id",
    node_label_property: "@user.interest",
    k: 2
  }) as r
  return r
} on my_hdc_graph

结果：

_id	label_1	probability_1	label_2	probability_2
I	badminton	0.517124	movie	0.482876
G	movie	0.563411	badminton	0.436589
J	movie	0.605133	badminton	0.394867
D	piano	0.701716	flute	0.298284
N	swimming	0.675096	saxophone	0.324904
F	badminton	0.564691	movie	0.435309
H	movie	0.535167	badminton	0.464833
B	piano	0.646695	flute	0.353305
L	novel	0.510868	swimming	0.489132
A	piano	0.736380	flute	0.263620
O	novel	0.765123	swimming	0.234877
E	piano	0.594943	flute	0.405057
K	novel	0.510868	swimming	0.489132
M	novel	0.515860	swimming	0.484140
C	piano	0.640369	flute	0.359631

流式返回

CALL algo.lpa.stream("my_hdc_graph", {
  return_id_uuid: "id",
  node_label_property: "@user.interest",
  node_weight_property: "@user.level",
  edge_weight_property: "strength",
  loop_num: 10
}) YIELD r
RETURN r.label_1 AS label, count(r) GROUP BY label

exec{
  algo(lpa).params({
    return_id_uuid: "id",
    node_label_property: "@user.interest",
    node_weight_property: "@user.level",
    edge_weight_property: "strength",
    loop_num: 10
  }).stream() as r
  group by r.label_1 as label
  return table(label, count(r)) 
} on my_hdc_graph

结果：

label	count(r)
violin	3
tennis	2
swimming	3
novel	2
piano	5

统计返回

CALL algo.lpa.stats("my_hdc_graph", {
  node_label_property: "interest",
  edge_weight_property: "strength",
  k: 1,
  loop_num: 5
}) YIELD s
RETURN s

exec{
  algo(lpa).params({
    node_label_property: "interest",
    edge_weight_property: "strength",
    k: 1,
    loop_num: 5
  }).stats() as s
  return s
} on my_hdc_graph

结果：

label_count
5

在分布式投影上运行算法

创建分布式投影

将图集全部投影到shard服务器上并命名为myProj：

CREATE PROJECTION myProj OPTIONS {
  nodes: {"*": ["*"]}, 
  edges: {"*": ["*"]},
  direction: "undirected",
  load_id: true
}

create().projection("myProj", {
  nodes: {"*": ["*"]}, 
  edges: {"*": ["*"]},
  direction: "undirected",
  load_id: true
})

参数

算法名：lpa

参数名	类型	规范	默认值	可选	描述
`node_label_property`	"`<@schema.?><property>`"	/	/	是	用于初始化节点标签的数值类型或字符串类型点属性；不包含指定属性的点将被忽略。若未设置，则由系统自动生成标签
`node_weight_property`	"`<@schema.?><property>`"	/	/	是	作为点权重的数值类型点属性
`edge_weight_property`	"`<@schema.?><property>`"	/	/	是	作为边权重的数值类型边属性
`loop_num`	Integer	≥1	`5`	是	传播迭代次数

文件回写

CALL algo.lpa.write("myProj", {
  loop_num: 5,
  edge_weight_property: 'strength'
}, {
  file: {
    filename: "lpa"
  }
})

algo(lpa).params({
  projection: "myProj",
  loop_num: 5,
  edge_weight_property: 'strength'
}).write({
  file: {
    filename: "lpa"
  }
})

数据库回写

将结果中的label_<N>与对应的probability_<N>值写入指定点属性。对应属性类型分别为string和float。

CALL algo.lpa.write("myProj", {
  node_label_property: 'interest',
  loop_num: 10
}, {
  db: {
    property: "lab"
  }
})

algo(lpa).params({
  projection: "myProj",
  edge_schema_property: 'score'
}).write({
  db:{ 
    property: 'lpa'
  }
})

ID
产品
状态
核数
Shard 服务最大数量
Shard 服务最大总核数
HDC 服务最大数量
HDC 服务最大总核数
申请天数
审批日期
过期日期
MAC地址
申请理由
审核信息