FATE DSL配置文件详细解释

这里对FATE DSL文件做详细解释。

DSL

DSL有两个版本,FATE 1.7以上版本强制使用v2。

dsl.json提供了流程,conf.json提供了个流程参数。

其中dsl.json的配置参见:https://github.com/FederatedAI/FATE/blob/master/doc/tutorial/dsl_conf/dsl_conf_v2_setting_guide.zh.md

下面选取一个案例。

//dsl.json
{
"components": { // 一级配置,表示这个任务会使用的组件
"reader_0": { // 组件的名字,自定义
// 指定模块,参数需要和目录/data/projects/fate/fate/python/federatedml/components一致,里面有一些定义好的组件,但是感觉不全,我的建议还是看官方docs:https://fate.readthedocs.io/en/latest/federatedml_component/
"module": "Reader", //数据需要通过Reader组件从数据存储拿取数据,注意此组件仅有输出output,此模块必须要有
"output": {
"data": [
"data" // 这个地方是自定义还是必须是train?需要尝试
]
}
},
// 除reader之外,每个组件下包含input和output,input和output下又包含data和model
// data input: 来自于之前的组件有四种可能的类型,
// 1. data: 用在data_transform, feature_engineering modules 和 evaluation模块
// 2. train_data: 用在训练组件,比如HeteroLR、HeteroSBT,如果使用了这个字段, 则这个task会被解析为一个fit task(训练任务?)
// 3. validate_data: 如果有了train_data,那么该字段就是可选地. 这种情况下,数据被用作validation集.
// 4. test_data: 指定用于预测的数据,如果设置了这个字段,模型也需要。

// model input: 来自于之前的模块,有2种可能的类型,
// 1. model: 由同类型(指“module”字段相同)组件输入的模型。把其他组件的模型输出作为输入。
// 2. isometric_model: 模型输入来自上游组件。

// data output: 来自于之前的模块,有4种可能的类型,
// 1. data:
// 2. train_data、validate_data、test_data: 仅用于数据分片?

// model : 来自于之前的模块,有1种可能的类型,
// 1. model

"data_transform_0": {
"module": "DataTransform",
"input": {
"data": {
"data": [
"reader_0.data"
]
}
},
"output": {
"data": [
"data"
],
"model": [
"model"
]
}
},
"scale_0": {
"module": "FeatureScale",
"input": {
"data": {
"data": [
"data_transform_0.data"
]
}
},
"output": {
"data": [
"data"
],
"model": [
"model"
]
}
},
"homo_lr_0": {
"module": "HomoLR",
"input": {
"data": {
"train_data": [
"scale_0.data"
]
}
},
"output": {
"data": [
"data"
],
"model": [
"model"
]
}
},
"evaluation_0": {
"module": "Evaluation",
"input": {
"data": {
"data": [
"homo_lr_0.data"
]
}
},
"output": {
"data": [
"data"
]
}
}
}
}



CONF


//conf.json
{
//fate版本大于等于1.7时,必须设置dsl_version=2
"dsl_version": 2,

"initiator": { //定义发起者的角色和partyid
"role": "guest",
"party_id": 10000
},
//定义所有的参与方
"role": {
"guest": [
10000
],
"host": [
10000
],
"arbiter": [
10000
]
},

"component_parameters": {
//common:参数应用到所有的参与方, role:参数应用到指定的参与方
"common": {
//组件的详细参数参见:https://fate.readthedocs.io/en/latest/federatedml_component/
"data_transform_0": {
"with_label": true,
"output_format": "dense"
},
"homo_lr_0": {
"penalty": "L2",
"tol": 1e-05,
"alpha": 0.01,
"optimizer": "sgd",
"batch_size": -1,
"learning_rate": 0.15,
"init_param": {
"init_method": "zeros"
},
"max_iter": 30,
"early_stop": "diff",
"encrypt_param": {
"method": null
},
"cv_param": {
"n_splits": 4,
"shuffle": true,
"random_seed": 33,
"need_cv": false
},
"decay": 1,
"decay_sqrt": true
},
"evaluation_0": {
"eval_type": "binary"
}
},

"role": {
"host": {
"0": { //role.host.0:参数应用到host的index=0的参与方
"reader_0": {
"table": {
"name": "homo_default_credit_host",
"namespace": "homo_default_credit_host"
}
},
"evaluation_0": {
"need_run": false
}
}
},
"guest": {
"0": {
"reader_0": {
"table": {
"name": "homo_default_credit_guest",
"namespace": "homo_default_credit_guest"
}
}
}
}
}
}
}

注:

  1. 上面的conf没有使用到provider组件,该组件支持加载多种且多版本的组件提供方
  2. 上面conf没涉及系统运行时参数,具体参见:https://federatedai.github.io/FATE-Flow/latest/zh/fate_flow_job_scheduling/#43
  3. dsl v2中,predict dsl不会在训练后自动生成,用户需要通过flow client部署所需的组件:https://github.com/FederatedAI/FATE-Flow/blob/main/doc/cli/model.md#deploy
  4. train dsl 和predict dsl examples:
"components": {
"reader_0": {
"module": "Reader",
"output": {
"data": [
"data"
]
}
},
"data_transform_0": {
"module": "DataTransform",
"input": {
"data": {
"data": [
"reader_0.data"
]
}
},
"output": {
"data": [
"data"
],
"model": [
"model"
]
}
},
"intersection_0": {
"module": "Intersection",
"input": {
"data": {
"data": [
"data_transform_0.data"
]
}
},
"output": {
"data":[
"data"
]
}
},
"hetero_nn_0": {
"module": "HeteroNN",
"input": {
"data": {
"train_data": [
"intersection_0.data"
]
}
},
"output": {
"data": [
"data"
],
"model": [
"model"
]
}
}
}

"components": {
"reader_0": {
"module": "Reader",
"output": {
"data": [
"data"
]
}
},
"data_transform_0": {
"module": "DataTransform",
"input": {
"data": {
"data": [
"reader_0.data"
]
}
},
"output": {
"data": [
"data"
],
"model": [
"model"
]
}
},
"intersection_0": {
"module": "Intersection",
"input": {
"data": {
"data": [
"data_transform_0.data"
]
}
},
"output": {
"data":[
"data"
]
}
},
"hetero_nn_0": {
"module": "HeteroNN",
"input": {
"data": {
"train_data": [
"intersection_0.data"
]
}
},
"output": {
"data": [
"data"
],
"model": [
"model"
]
}
},
"evaluation_0": {
"module": "Evaluation",
"input": {
"data": {
"data": [
"hetero_nn_0.data"
]
}
},
"output": {
"data": [
"data"
]
}
}

基本工作流

  1. 提交作业后,作业的dsl和配置会存储到相应的目录:/data/projects/fate/fateflow/jobs
  2. 解析dsl和conf,生成配置,分发共同的配置给每一方,并生成存储特定方的配置在目录:/data/projects/fate/fateflow/jobs/[job_id]/[role]/[party_id]

文章作者: Met Guo
文章链接: https://guoyujian.github.io/2022/11/04/FATE-DSL%E9%85%8D%E7%BD%AE%E6%96%87%E4%BB%B6%E8%AF%A6%E7%BB%86%E8%A7%A3%E9%87%8A/
版权声明: 本博客所有文章除特别声明外,均采用 CC BY-NC-SA 4.0 许可协议。转载请注明来自 Gmet's Blog