图神经⽹络:GraphSAGEtensorflow1.x实战,新节点预测服务摘要:GraphSAGE,tensorflow,tensorflow_model_server,tensorboard,saved_model_cli
Gr a phSAGE实战⽬标
Gra
GraphSAGE的⽬标是对节点的邻居采样,从⽽避免每次计算都需要拿到全部节点的邻接矩阵,因此可以将训练好的模型直接⽤于新节点预
测,GraphSAGE的实战⽬标就是对新的在训练数据中没有出现过的中⼼节点,基于该节点⾃⾝的特征和邻居特征,预测该节点的任务,⽐如预测该节点的分类。
Gr a phSAGE数据链路分析
Gra
数据准备预处理
数据采⽤的cora数据,将全部数据分为训练,验证,预测,三个部分独⽴存储,其中验证集作为早停,
预测集⽤来评价模型,注意虽然三个数据集的节点索引互不重合,但是在采样过程中训练集的节点可以采样到他的属于验证集的邻居,只是拿到邻居的特征向量,不知道节点的y值。具体实现如下:
新的loss低点就停⽌,每次验证集loss下降就保存⼀次检查点,最终取最新的检查点,检查点只保存1份。import sys
import os
import pickle
import shutil
import random
import time
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
DATA_PATH = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
ROOT_PATH = os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import numpy as np
import tensorflow as tf
from tensorflow.python.saved_model import tag_constants
from model import GraphSageGCN
fig import get_string
from preprocessing import sample, get_nodes_features
(train_nodes, train_y) = pickle.load(open(os.path.join(DATA_PATH, get_string("train_data_path")), "rb"))
(val_nodes, val_y) = pickle.load(open(os.path.join(DATA_PATH, get_string("val_data_path")), "rb"))
(test_nodes, test_y) = pickle.load(open(os.path.join(DATA_PATH, get_string("test_data_path")), "rb"))
neighbour_list = pickle.load(open(os.path.join(DATA_PATH, get_string("neighbour_data_path")), "rb"))
nodes_features = pickle.load(open(os.path.join(DATA_PATH, get_string("feature_data_path")), "rb"))
features_size = nodes_features.shape[1]
def get_batch(epoches, batch_size, nodes, labels, neighbours, features, layer1_supports=10, layer2_supports=5):
for epoch in range(epoches):import pickle
tmp = list(zip(nodes, labels))
random.shuffle(tmp)
nodes, labels = zip(*tmp)
for batch in range(0, len(nodes), batch_size):
if batch + batch_size < len(nodes):
batch_nodes = nodes[batch: (batch + batch_size)]
batch_labels = labels[batch: (batch + batch_size)]
else:
batch_nodes = nodes[batch: len(nodes)]
batch_labels = labels[batch: len(nodes)]
# 得到训练集的1跳2跳
layer_neighbours = sample(batch_nodes, neighbours, num_supports=[layer2_supports, layer1_supports])
# 所有节点的embedding
input_x = get_nodes_features(list(batch_nodes), features)
input_x_1 = get_nodes_features(sum(layer_neighbours[2], []), features)
input_x_2 = get_nodes_features(sum(layer_neighbours[1], []), features)
yield [epoch, input_x, input_x_1, input_x_2, batch_labels]
def train_main():
model = GraphSageGCN(num_class=7, feature_size=1433,
num_supports_1=int(get_string("layer2_supports")),
num_supports_2=int(get_string("layer1_supports")),
decay_learning_rate=float(get_string("decay_learning_rate")),
learning_rate=float(get_string("learning_rate")),
weight_decay=float(get_string("weight_decay")))
saver = tf.train.Saver(tf.global_variables(), max_to_keep=1)
with tf.Session() as sess:
init_op = tf.group(tf.global_variables_initializer())
sess.run(init_op)
<(os.path.join(ROOT_PATH, "./summary"), ignore_errors=True)
writer = tf.summary.FileWriter(os.path.join(ROOT_PATH, "./summary"), aph)
batches = get_batch(int(get_string("epoches")), int(get_string("batch_size")), train_nodes, train_y,
neighbour_list, nodes_features, layer1_supports=int(get_string("layer1_supports")),
layer2_supports=int(get_string("layer2_supports")))
# 验证数据
layer_neighbours = sample(val_nodes, neighbour_list,
num_supports=[int(get_string("layer2_supports")), int(get_string("layer1_supports"))])
val_input_x = get_nodes_features(val_nodes, nodes_features)
val_input_x_1 = get_nodes_features(sum(layer_neighbours[2], []), nodes_features)
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。
发表评论