Federated Machine Learning¶
[中文]
FederatedML includes implementation of many common machine learning algorithms on federated learning. All modules are developed in a decoupling modular approach to enhance scalability. Specifically, we provide:
- Federated Statistic: PSI, Union, Pearson Correlation, etc.
- Federated Information Retrieval: PIR(SIR) Based OT
- Federated Feature Engineering: Feature Sampling, Feature Binning, Feature Selection, etc.
- Federated Machine Learning Algorithms: LR, GBDT, DNN, TransferLearning, UnsupervisedLearning which support Heterogeneous and Homogeneous styles.
- Model Evaluation: Binary | Multiclass | Regression | Clustering Evaluation, Local vs Federated Comparison.
- Secure Protocol: Provides multiple security protocols for secure multi-party computing and interaction between participants.
Algorithm List¶
Algorithm | Module Name | Description | Data Input | Data Output | Model Input | Model Output |
---|---|---|---|---|---|---|
DataIO | DataIO | This component transforms user-uploaded data into Instance object(deprecate in FATe-v1.7, use DataTransform instead). | Table, values are raw data. | Transformed Table, values are data instance defined here | DataIO Model | |
DataTransform | DataTransform | This component transforms user-uploaded data into Instance object. | Table, values are raw data. | Transformed Table, values are data instance defined here | DataTransform Model | |
Intersect | Intersection | Compute intersect data set of multiple parties without leakage of difference set information. Mainly used in hetero scenario task. | Table. | Table with only common instance keys. | Intersect Model | |
Federated Sampling | FederatedSample | Federated Sampling data so that its distribution become balance in each party.This module supports standalone and federated versions. | Table | Table of sampled data; both random and stratified sampling methods are supported. | ||
Feature Scale | FeatureScale | module for feature scaling and standardization. | Table,values are instances. | Transformed Table. | Transform factors like min/max, mean/std. | |
Hetero Feature Binning | HeteroFeatureBinning | With binning input data, calculates each column's iv and woe and transform data according to the binned information. | Table, values are instances. | Transformed Table. | iv/woe, split points, event count, non-event count etc. of each column. | |
Homo Feature Binning | HomoFeatureBinning | Calculate quantile binning through multiple parties | Table | Transformed Table | Split points of each column | |
OneHot Encoder | OneHotEncoder | Transfer a column into one-hot format. | Table, values are instances. | Transformed Table with new header. | Feature-name mapping between original header and new header. | |
Hetero Feature Selection | HeteroFeatureSelection | Provide 5 types of filters. Each filters can select columns according to user config | Table | Transformed Table with new header and filtered data instance. | If iv filters used, hetero_binning model is needed. | Whether each column is filtered. |
Union | Union | Combine multiple data tables into one. | Tables. | Table with combined values from input Tables. | ||
Hetero-LR | HeteroLR | Build hetero logistic regression model through multiple parties. | Table, values are instances | Table, values are instances. | Logistic Regression Model, consists of model-meta and model-param. | |
Local Baseline | LocalBaseline | Wrapper that runs sklearn(scikit-learn) Logistic Regression model with local data. | Table, values are instances. | Table, values are instances. | ||
Hetero-LinR | HeteroLinR | Build hetero linear regression model through multiple parties. | Table, values are instances. | Table, values are instances. | Linear Regression Model, consists of model-meta and model-param. | |
Hetero-Poisson | HeteroPoisson | Build hetero poisson regression model through multiple parties. | Table, values are instances. | Table, values are instances. | Poisson Regression Model, consists of model-meta and model-param. | |
Homo-LR | HomoLR | Build homo logistic regression model through multiple parties. | Table, values are instances. | Table, values are instances. | Logistic Regression Model, consists of model-meta and model-param. | |
Homo-NN | HomoNN | Build homo neural network model through multiple parties. | Table, values are instances. | Table, values are instances. | Neural Network Model, consists of model-meta and model-param. | |
Hetero Secure Boosting | HeteroSecureBoost | Build hetero secure boosting model through multiple parties | Table, values are instances. | Table, values are instances. | SecureBoost Model, consists of model-meta and model-param. | |
Hetero Fast Secure Boosting | HeteroFastSecureBoost | Build hetero secure boosting model through multiple parties in layered/mix manners. | Table, values are instances. | Table, values are instances. | FastSecureBoost Model, consists of model-meta and model-param. | |
Evaluation | Evaluation | Output the model evaluation metrics for user. | Table(s), values are instances. | |||
Hetero Pearson | HeteroPearson | Calculate hetero correlation of features from different parties. | Table, values are instances. | |||
Hetero-NN | HeteroNN | Build hetero neural network model. | Table, values are instances. | Table, values are instances. | Hetero Neural Network Model, consists of model-meta and model-param. | |
Homo Secure Boosting | HomoSecureBoost | Build homo secure boosting model through multiple parties | Table, values are instances. | Table, values are instances. | SecureBoost Model, consists of model-meta and model-param. | |
Homo OneHot Encoder | HomoOneHotEncoder | Build homo onehot encoder model through multiple parties. | Table, values are instances. | Transformed Table with new header. | Feature-name mapping between original header and new header. | |
Hetero Data Split | HeteroDataSplit | Split one data table into 3 tables by given ratio or count | Table, values are instances. | 3 Tables, values are instance. | ||
Homo Data Split | HomoDataSplit | Split one data table into 3 tables by given ratio or count | Table, values are instances. | 3 Tables, values are instance. | ||
Column Expand | ColumnExpand | Add arbitrary number of columns with user-provided values. | Table, values are raw data. | Transformed Table with added column(s) and new header. | Column Expand Model | |
Secure Information Retrieval | SecureInformationRetrieval | Securely retrieves information from host through oblivious transfer | Table, values are instance | Table, values are instance | ||
Hetero Federated Transfer Learning | FTL | Build Hetero FTL Model Between 2 party | Table, values are instance | Hetero FTL Model | ||
Hetero KMeans | HeteroKMeans | Build Hetero KMeans model through multiple parties | Table, values are instance | Table, values are instance; Arbier outputs 2 Tables | Hetero KMeans Model | |
PSI | PSI | Compute PSI value of features between two table | Table, values are instance | PSI Results | ||
Data Statistics | DataStatistics | This component will do some statistical work on the data, including statistical mean, maximum and minimum, median, etc. | Table, values are instance | Table | Statistic Result | |
Scorecard | Scorecard | Scale predict score to credit score by given scaling parameters | Table, values are predict score | Table, values are score results | ||
Sample Weight | SampleWeight | Assign weight to instances according to user-specified parameters | Table, values are instance | Table, values are weighted instance | SampleWeight Model | |
Feldman Verifiable Sum | FeldmanVerifiableSum | This component will sum multiple privacy values without exposing data | Table, values to sum | Table, values are sum results | ||
Feature Imputation | FeatureImputation | This component imputes missing features using arbitrary methods/values | Table, values are Instances | Table, values with missing features filled | FeatureImputation Model | |
Label Transform | LabelTransform | Replaces label values of input data instances and predict results | Table, values are Instances or prediction results | Table, values with transformed label values | LabelTransform Model | |
Hetero SSHE Logistic Regression | HeteroSSHELR | Build hetero logistic regression model without arbiter | Table, values are Instances | Table, values are Instances | SSHE LR Model | |
Hetero SSHE Linear Regression | HeteroSSHELinR | Build hetero linear regression model without arbiter | Table, values are Instances | Table, values are Instances | SSHE LinR Model |
Secure Protocol¶
- Encrypt
- Hash
- Diffne Hellman Key Exchange
- SecretShare MPC Protocol(SPDZ)
- Oblivious Transfer
- Feldman Verifiable Secret Sharing
Params¶
param
special
¶
__all__
special
¶
Modules¶
base_param
¶
BaseParam
¶Source code in federatedml/param/base_param.py
class BaseParam(metaclass=_StaticDefaultMeta):
def __init__(self):
pass
def set_name(self, name: str):
self._name = name
return self
def check(self):
raise NotImplementedError("Parameter Object should be checked.")
@classmethod
def _get_or_init_deprecated_params_set(cls):
if not hasattr(cls, _DEPRECATED_PARAMS):
setattr(cls, _DEPRECATED_PARAMS, set())
return getattr(cls, _DEPRECATED_PARAMS)
def _get_or_init_feeded_deprecated_params_set(self, conf=None):
if not hasattr(self, _FEEDED_DEPRECATED_PARAMS):
if conf is None:
setattr(self, _FEEDED_DEPRECATED_PARAMS, set())
else:
setattr(
self,
_FEEDED_DEPRECATED_PARAMS,
set(conf[_FEEDED_DEPRECATED_PARAMS]),
)
return getattr(self, _FEEDED_DEPRECATED_PARAMS)
def _get_or_init_user_feeded_params_set(self, conf=None):
if not hasattr(self, _USER_FEEDED_PARAMS):
if conf is None:
setattr(self, _USER_FEEDED_PARAMS, set())
else:
setattr(self, _USER_FEEDED_PARAMS, set(conf[_USER_FEEDED_PARAMS]))
return getattr(self, _USER_FEEDED_PARAMS)
def get_user_feeded(self):
return self._get_or_init_user_feeded_params_set()
def get_feeded_deprecated_params(self):
return self._get_or_init_feeded_deprecated_params_set()
@property
def _deprecated_params_set(self):
return {name: True for name in self.get_feeded_deprecated_params()}
def as_dict(self):
def _recursive_convert_obj_to_dict(obj):
ret_dict = {}
for attr_name in list(obj.__dict__):
# get attr
attr = getattr(obj, attr_name)
if attr and type(attr).__name__ not in dir(builtins):
ret_dict[attr_name] = _recursive_convert_obj_to_dict(attr)
else:
ret_dict[attr_name] = attr
return ret_dict
return _recursive_convert_obj_to_dict(self)
def update(self, conf, allow_redundant=False):
update_from_raw_conf = conf.get(_IS_RAW_CONF, True)
if update_from_raw_conf:
deprecated_params_set = self._get_or_init_deprecated_params_set()
feeded_deprecated_params_set = (
self._get_or_init_feeded_deprecated_params_set()
)
user_feeded_params_set = self._get_or_init_user_feeded_params_set()
setattr(self, _IS_RAW_CONF, False)
else:
feeded_deprecated_params_set = (
self._get_or_init_feeded_deprecated_params_set(conf)
)
user_feeded_params_set = self._get_or_init_user_feeded_params_set(conf)
def _recursive_update_param(param, config, depth, prefix):
if depth > consts.PARAM_MAXDEPTH:
raise ValueError("Param define nesting too deep!!!, can not parse it")
inst_variables = param.__dict__
redundant_attrs = []
for config_key, config_value in config.items():
# redundant attr
if config_key not in inst_variables:
if not update_from_raw_conf and config_key.startswith("_"):
setattr(param, config_key, config_value)
else:
redundant_attrs.append(config_key)
continue
full_config_key = f"{prefix}{config_key}"
if update_from_raw_conf:
# add user feeded params
user_feeded_params_set.add(full_config_key)
# update user feeded deprecated param set
if full_config_key in deprecated_params_set:
feeded_deprecated_params_set.add(full_config_key)
# supported attr
attr = getattr(param, config_key)
if type(attr).__name__ in dir(builtins) or attr is None:
setattr(param, config_key, config_value)
else:
# recursive set obj attr
sub_params = _recursive_update_param(
attr, config_value, depth + 1, prefix=f"{prefix}{config_key}."
)
setattr(param, config_key, sub_params)
if not allow_redundant and redundant_attrs:
raise ValueError(
f"cpn `{getattr(self, '_name', type(self))}` has redundant parameters: `{[redundant_attrs]}`"
)
return param
return _recursive_update_param(param=self, config=conf, depth=0, prefix="")
def extract_not_builtin(self):
def _get_not_builtin_types(obj):
ret_dict = {}
for variable in obj.__dict__:
attr = getattr(obj, variable)
if attr and type(attr).__name__ not in dir(builtins):
ret_dict[variable] = _get_not_builtin_types(attr)
return ret_dict
return _get_not_builtin_types(self)
def validate(self):
self.builtin_types = dir(builtins)
self.func = {
"ge": self._greater_equal_than,
"le": self._less_equal_than,
"in": self._in,
"not_in": self._not_in,
"range": self._range,
}
home_dir = os.path.abspath(os.path.dirname(os.path.realpath(__file__)))
param_validation_path_prefix = home_dir + "/param_validation/"
param_name = type(self).__name__
param_validation_path = "/".join(
[param_validation_path_prefix, param_name + ".json"]
)
validation_json = None
try:
with open(param_validation_path, "r") as fin:
validation_json = json.loads(fin.read())
except BaseException:
return
self._validate_param(self, validation_json)
def _validate_param(self, param_obj, validation_json):
default_section = type(param_obj).__name__
var_list = param_obj.__dict__
for variable in var_list:
attr = getattr(param_obj, variable)
if type(attr).__name__ in self.builtin_types or attr is None:
if variable not in validation_json:
continue
validation_dict = validation_json[default_section][variable]
value = getattr(param_obj, variable)
value_legal = False
for op_type in validation_dict:
if self.func[op_type](value, validation_dict[op_type]):
value_legal = True
break
if not value_legal:
raise ValueError(
"Plase check runtime conf, {} = {} does not match user-parameter restriction".format(
variable, value
)
)
elif variable in validation_json:
self._validate_param(attr, validation_json)
@staticmethod
def check_string(param, descr):
if type(param).__name__ not in ["str"]:
raise ValueError(
descr + " {} not supported, should be string type".format(param)
)
@staticmethod
def check_positive_integer(param, descr):
if type(param).__name__ not in ["int", "long"] or param <= 0:
raise ValueError(
descr + " {} not supported, should be positive integer".format(param)
)
@staticmethod
def check_positive_number(param, descr):
if type(param).__name__ not in ["float", "int", "long"] or param <= 0:
raise ValueError(
descr + " {} not supported, should be positive numeric".format(param)
)
@staticmethod
def check_nonnegative_number(param, descr):
if type(param).__name__ not in ["float", "int", "long"] or param < 0:
raise ValueError(
descr
+ " {} not supported, should be non-negative numeric".format(param)
)
@staticmethod
def check_decimal_float(param, descr):
if type(param).__name__ not in ["float", "int"] or param < 0 or param > 1:
raise ValueError(
descr
+ " {} not supported, should be a float number in range [0, 1]".format(
param
)
)
@staticmethod
def check_boolean(param, descr):
if type(param).__name__ != "bool":
raise ValueError(
descr + " {} not supported, should be bool type".format(param)
)
@staticmethod
def check_open_unit_interval(param, descr):
if type(param).__name__ not in ["float"] or param <= 0 or param >= 1:
raise ValueError(
descr + " should be a numeric number between 0 and 1 exclusively"
)
@staticmethod
def check_valid_value(param, descr, valid_values):
if param not in valid_values:
raise ValueError(
descr
+ " {} is not supported, it should be in {}".format(param, valid_values)
)
@staticmethod
def check_defined_type(param, descr, types):
if type(param).__name__ not in types:
raise ValueError(
descr + " {} not supported, should be one of {}".format(param, types)
)
@staticmethod
def check_and_change_lower(param, valid_list, descr=""):
if type(param).__name__ != "str":
raise ValueError(
descr
+ " {} not supported, should be one of {}".format(param, valid_list)
)
lower_param = param.lower()
if lower_param in valid_list:
return lower_param
else:
raise ValueError(
descr
+ " {} not supported, should be one of {}".format(param, valid_list)
)
@staticmethod
def _greater_equal_than(value, limit):
return value >= limit - consts.FLOAT_ZERO
@staticmethod
def _less_equal_than(value, limit):
return value <= limit + consts.FLOAT_ZERO
@staticmethod
def _range(value, ranges):
in_range = False
for left_limit, right_limit in ranges:
if (
left_limit - consts.FLOAT_ZERO
<= value
<= right_limit + consts.FLOAT_ZERO
):
in_range = True
break
return in_range
@staticmethod
def _in(value, right_value_list):
return value in right_value_list
@staticmethod
def _not_in(value, wrong_value_list):
return value not in wrong_value_list
def _warn_deprecated_param(self, param_name, descr):
if self._deprecated_params_set.get(param_name):
LOGGER.warning(
f"{descr} {param_name} is deprecated and ignored in this version."
)
def _warn_to_deprecate_param(self, param_name, descr, new_param):
if self._deprecated_params_set.get(param_name):
LOGGER.warning(
f"{descr} {param_name} will be deprecated in future release; "
f"please use {new_param} instead."
)
return True
return False
__init__(self)
special
¶Source code in federatedml/param/base_param.py
def __init__(self):
pass
set_name(self, name)
¶Source code in federatedml/param/base_param.py
def set_name(self, name: str):
self._name = name
return self
check(self)
¶Source code in federatedml/param/base_param.py
def check(self):
raise NotImplementedError("Parameter Object should be checked.")
get_user_feeded(self)
¶Source code in federatedml/param/base_param.py
def get_user_feeded(self):
return self._get_or_init_user_feeded_params_set()
get_feeded_deprecated_params(self)
¶Source code in federatedml/param/base_param.py
def get_feeded_deprecated_params(self):
return self._get_or_init_feeded_deprecated_params_set()
as_dict(self)
¶Source code in federatedml/param/base_param.py
def as_dict(self):
def _recursive_convert_obj_to_dict(obj):
ret_dict = {}
for attr_name in list(obj.__dict__):
# get attr
attr = getattr(obj, attr_name)
if attr and type(attr).__name__ not in dir(builtins):
ret_dict[attr_name] = _recursive_convert_obj_to_dict(attr)
else:
ret_dict[attr_name] = attr
return ret_dict
return _recursive_convert_obj_to_dict(self)
update(self, conf, allow_redundant=False)
¶Source code in federatedml/param/base_param.py
def update(self, conf, allow_redundant=False):
update_from_raw_conf = conf.get(_IS_RAW_CONF, True)
if update_from_raw_conf:
deprecated_params_set = self._get_or_init_deprecated_params_set()
feeded_deprecated_params_set = (
self._get_or_init_feeded_deprecated_params_set()
)
user_feeded_params_set = self._get_or_init_user_feeded_params_set()
setattr(self, _IS_RAW_CONF, False)
else:
feeded_deprecated_params_set = (
self._get_or_init_feeded_deprecated_params_set(conf)
)
user_feeded_params_set = self._get_or_init_user_feeded_params_set(conf)
def _recursive_update_param(param, config, depth, prefix):
if depth > consts.PARAM_MAXDEPTH:
raise ValueError("Param define nesting too deep!!!, can not parse it")
inst_variables = param.__dict__
redundant_attrs = []
for config_key, config_value in config.items():
# redundant attr
if config_key not in inst_variables:
if not update_from_raw_conf and config_key.startswith("_"):
setattr(param, config_key, config_value)
else:
redundant_attrs.append(config_key)
continue
full_config_key = f"{prefix}{config_key}"
if update_from_raw_conf:
# add user feeded params
user_feeded_params_set.add(full_config_key)
# update user feeded deprecated param set
if full_config_key in deprecated_params_set:
feeded_deprecated_params_set.add(full_config_key)
# supported attr
attr = getattr(param, config_key)
if type(attr).__name__ in dir(builtins) or attr is None:
setattr(param, config_key, config_value)
else:
# recursive set obj attr
sub_params = _recursive_update_param(
attr, config_value, depth + 1, prefix=f"{prefix}{config_key}."
)
setattr(param, config_key, sub_params)
if not allow_redundant and redundant_attrs:
raise ValueError(
f"cpn `{getattr(self, '_name', type(self))}` has redundant parameters: `{[redundant_attrs]}`"
)
return param
return _recursive_update_param(param=self, config=conf, depth=0, prefix="")
extract_not_builtin(self)
¶Source code in federatedml/param/base_param.py
def extract_not_builtin(self):
def _get_not_builtin_types(obj):
ret_dict = {}
for variable in obj.__dict__:
attr = getattr(obj, variable)
if attr and type(attr).__name__ not in dir(builtins):
ret_dict[variable] = _get_not_builtin_types(attr)
return ret_dict
return _get_not_builtin_types(self)
validate(self)
¶Source code in federatedml/param/base_param.py
def validate(self):
self.builtin_types = dir(builtins)
self.func = {
"ge": self._greater_equal_than,
"le": self._less_equal_than,
"in": self._in,
"not_in": self._not_in,
"range": self._range,
}
home_dir = os.path.abspath(os.path.dirname(os.path.realpath(__file__)))
param_validation_path_prefix = home_dir + "/param_validation/"
param_name = type(self).__name__
param_validation_path = "/".join(
[param_validation_path_prefix, param_name + ".json"]
)
validation_json = None
try:
with open(param_validation_path, "r") as fin:
validation_json = json.loads(fin.read())
except BaseException:
return
self._validate_param(self, validation_json)
check_string(param, descr)
staticmethod
¶Source code in federatedml/param/base_param.py
@staticmethod
def check_string(param, descr):
if type(param).__name__ not in ["str"]:
raise ValueError(
descr + " {} not supported, should be string type".format(param)
)
check_positive_integer(param, descr)
staticmethod
¶Source code in federatedml/param/base_param.py
@staticmethod
def check_positive_integer(param, descr):
if type(param).__name__ not in ["int", "long"] or param <= 0:
raise ValueError(
descr + " {} not supported, should be positive integer".format(param)
)
check_positive_number(param, descr)
staticmethod
¶Source code in federatedml/param/base_param.py
@staticmethod
def check_positive_number(param, descr):
if type(param).__name__ not in ["float", "int", "long"] or param <= 0:
raise ValueError(
descr + " {} not supported, should be positive numeric".format(param)
)
check_nonnegative_number(param, descr)
staticmethod
¶Source code in federatedml/param/base_param.py
@staticmethod
def check_nonnegative_number(param, descr):
if type(param).__name__ not in ["float", "int", "long"] or param < 0:
raise ValueError(
descr
+ " {} not supported, should be non-negative numeric".format(param)
)
check_decimal_float(param, descr)
staticmethod
¶Source code in federatedml/param/base_param.py
@staticmethod
def check_decimal_float(param, descr):
if type(param).__name__ not in ["float", "int"] or param < 0 or param > 1:
raise ValueError(
descr
+ " {} not supported, should be a float number in range [0, 1]".format(
param
)
)
check_boolean(param, descr)
staticmethod
¶Source code in federatedml/param/base_param.py
@staticmethod
def check_boolean(param, descr):
if type(param).__name__ != "bool":
raise ValueError(
descr + " {} not supported, should be bool type".format(param)
)
check_open_unit_interval(param, descr)
staticmethod
¶Source code in federatedml/param/base_param.py
@staticmethod
def check_open_unit_interval(param, descr):
if type(param).__name__ not in ["float"] or param <= 0 or param >= 1:
raise ValueError(
descr + " should be a numeric number between 0 and 1 exclusively"
)
check_valid_value(param, descr, valid_values)
staticmethod
¶Source code in federatedml/param/base_param.py
@staticmethod
def check_valid_value(param, descr, valid_values):
if param not in valid_values:
raise ValueError(
descr
+ " {} is not supported, it should be in {}".format(param, valid_values)
)
check_defined_type(param, descr, types)
staticmethod
¶Source code in federatedml/param/base_param.py
@staticmethod
def check_defined_type(param, descr, types):
if type(param).__name__ not in types:
raise ValueError(
descr + " {} not supported, should be one of {}".format(param, types)
)
check_and_change_lower(param, valid_list, descr='')
staticmethod
¶Source code in federatedml/param/base_param.py
@staticmethod
def check_and_change_lower(param, valid_list, descr=""):
if type(param).__name__ != "str":
raise ValueError(
descr
+ " {} not supported, should be one of {}".format(param, valid_list)
)
lower_param = param.lower()
if lower_param in valid_list:
return lower_param
else:
raise ValueError(
descr
+ " {} not supported, should be one of {}".format(param, valid_list)
)
deprecated_param(*names)
¶Source code in federatedml/param/base_param.py
def deprecated_param(*names):
def _decorator(cls: "BaseParam"):
deprecated = cls._get_or_init_deprecated_params_set()
for name in names:
deprecated.add(name)
return cls
return _decorator
boosting_param
¶
hetero_deprecated_param_list
¶homo_deprecated_param_list
¶Classes¶
ObjectiveParam (BaseParam)
¶Define objective parameters that used in federated ml.
Parameters¶
objective : {None, 'cross_entropy', 'lse', 'lae', 'log_cosh', 'tweedie', 'fair', 'huber'} None in host's config, should be str in guest'config. when task_type is classification, only support 'cross_entropy', other 6 types support in regression task
params : None or list should be non empty list when objective is 'tweedie','fair','huber', first element of list shoulf be a float-number large than 0.0 when objective is 'fair', 'huber', first element of list should be a float-number in [1.0, 2.0) when objective is 'tweedie'
Source code in federatedml/param/boosting_param.py
class ObjectiveParam(BaseParam):
"""
Define objective parameters that used in federated ml.
Parameters
----------
objective : {None, 'cross_entropy', 'lse', 'lae', 'log_cosh', 'tweedie', 'fair', 'huber'}
None in host's config, should be str in guest'config.
when task_type is classification, only support 'cross_entropy',
other 6 types support in regression task
params : None or list
should be non empty list when objective is 'tweedie','fair','huber',
first element of list shoulf be a float-number large than 0.0 when objective is 'fair', 'huber',
first element of list should be a float-number in [1.0, 2.0) when objective is 'tweedie'
"""
def __init__(self, objective='cross_entropy', params=None):
self.objective = objective
self.params = params
def check(self, task_type=None):
if self.objective is None:
return True
descr = "objective param's"
LOGGER.debug('check objective {}'.format(self.objective))
if task_type not in [consts.CLASSIFICATION, consts.REGRESSION]:
self.objective = self.check_and_change_lower(self.objective,
["cross_entropy", "lse", "lae", "huber", "fair",
"log_cosh", "tweedie"],
descr)
if task_type == consts.CLASSIFICATION:
if self.objective != "cross_entropy":
raise ValueError("objective param's objective {} not supported".format(self.objective))
elif task_type == consts.REGRESSION:
self.objective = self.check_and_change_lower(self.objective,
["lse", "lae", "huber", "fair", "log_cosh", "tweedie"],
descr)
params = self.params
if self.objective in ["huber", "fair", "tweedie"]:
if type(params).__name__ != 'list' or len(params) < 1:
raise ValueError(
"objective param's params {} not supported, should be non-empty list".format(params))
if type(params[0]).__name__ not in ["float", "int", "long"]:
raise ValueError("objective param's params[0] {} not supported".format(self.params[0]))
if self.objective == 'tweedie':
if params[0] < 1 or params[0] >= 2:
raise ValueError("in tweedie regression, objective params[0] should betweend [1, 2)")
if self.objective == 'fair' or 'huber':
if params[0] <= 0.0:
raise ValueError("in {} regression, objective params[0] should greater than 0.0".format(
self.objective))
return True
__init__(self, objective='cross_entropy', params=None)
special
¶Source code in federatedml/param/boosting_param.py
def __init__(self, objective='cross_entropy', params=None):
self.objective = objective
self.params = params
check(self, task_type=None)
¶Source code in federatedml/param/boosting_param.py
def check(self, task_type=None):
if self.objective is None:
return True
descr = "objective param's"
LOGGER.debug('check objective {}'.format(self.objective))
if task_type not in [consts.CLASSIFICATION, consts.REGRESSION]:
self.objective = self.check_and_change_lower(self.objective,
["cross_entropy", "lse", "lae", "huber", "fair",
"log_cosh", "tweedie"],
descr)
if task_type == consts.CLASSIFICATION:
if self.objective != "cross_entropy":
raise ValueError("objective param's objective {} not supported".format(self.objective))
elif task_type == consts.REGRESSION:
self.objective = self.check_and_change_lower(self.objective,
["lse", "lae", "huber", "fair", "log_cosh", "tweedie"],
descr)
params = self.params
if self.objective in ["huber", "fair", "tweedie"]:
if type(params).__name__ != 'list' or len(params) < 1:
raise ValueError(
"objective param's params {} not supported, should be non-empty list".format(params))
if type(params[0]).__name__ not in ["float", "int", "long"]:
raise ValueError("objective param's params[0] {} not supported".format(self.params[0]))
if self.objective == 'tweedie':
if params[0] < 1 or params[0] >= 2:
raise ValueError("in tweedie regression, objective params[0] should betweend [1, 2)")
if self.objective == 'fair' or 'huber':
if params[0] <= 0.0:
raise ValueError("in {} regression, objective params[0] should greater than 0.0".format(
self.objective))
return True
DecisionTreeParam (BaseParam)
¶Define decision tree parameters that used in federated ml.
Parameters¶
criterion_method : {"xgboost"}, default: "xgboost" the criterion function to use
list or dict
should be non empty and elements are float-numbers, if a list is offered, the first one is l2 regularization value, and the second one is l1 regularization value. if a dict is offered, make sure it contains key 'l1', and 'l2'. l1, l2 regularization values are non-negative floats. default: [0.1, 0] or {'l1':0, 'l2':0,1}
positive integer
the max depth of a decision tree, default: 3
int
least quantity of nodes to split, default: 2
float
least gain of a single split need to reach, default: 1e-3
float
sum of hessian needed in child nodes. default is 0
int
when samples no more than min_leaf_node, it becomes a leave, default: 1
positive integer
we will use no more than max_split_nodes to parallel finding their splits in a batch, for memory consideration. default is 65536
{'split', 'gain'}
if is 'split', feature_importances calculate by feature split times, if is 'gain', feature_importances calculate by feature split gain. default: 'split'
Due to the safety concern, we adjust training strategy of Hetero-SBT in FATE-1.8,
When running Hetero-SBT, this parameter is now abandoned.
In Hetero-SBT of FATE-1.8, guest side will compute split, gain of local features,
and receive anonymous feature importance results from hosts. Hosts will compute split
importance of local features.
bool, accepted True, False only, default: False
use missing value in training process or not.
bool
regard 0 as missing value or not, will be use only if use_missing=True, default: False
bool
ensure stability when computing histogram. Set this to true to ensure stable result when using same data and same parameter. But it may slow down computation.
Source code in federatedml/param/boosting_param.py
class DecisionTreeParam(BaseParam):
"""
Define decision tree parameters that used in federated ml.
Parameters
----------
criterion_method : {"xgboost"}, default: "xgboost"
the criterion function to use
criterion_params: list or dict
should be non empty and elements are float-numbers,
if a list is offered, the first one is l2 regularization value, and the second one is
l1 regularization value.
if a dict is offered, make sure it contains key 'l1', and 'l2'.
l1, l2 regularization values are non-negative floats.
default: [0.1, 0] or {'l1':0, 'l2':0,1}
max_depth: positive integer
the max depth of a decision tree, default: 3
min_sample_split: int
least quantity of nodes to split, default: 2
min_impurity_split: float
least gain of a single split need to reach, default: 1e-3
min_child_weight: float
sum of hessian needed in child nodes. default is 0
min_leaf_node: int
when samples no more than min_leaf_node, it becomes a leave, default: 1
max_split_nodes: positive integer
we will use no more than max_split_nodes to
parallel finding their splits in a batch, for memory consideration. default is 65536
feature_importance_type: {'split', 'gain'}
if is 'split', feature_importances calculate by feature split times,
if is 'gain', feature_importances calculate by feature split gain.
default: 'split'
Due to the safety concern, we adjust training strategy of Hetero-SBT in FATE-1.8,
When running Hetero-SBT, this parameter is now abandoned.
In Hetero-SBT of FATE-1.8, guest side will compute split, gain of local features,
and receive anonymous feature importance results from hosts. Hosts will compute split
importance of local features.
use_missing: bool, accepted True, False only, default: False
use missing value in training process or not.
zero_as_missing: bool
regard 0 as missing value or not,
will be use only if use_missing=True, default: False
deterministic: bool
ensure stability when computing histogram. Set this to true to ensure stable result when using
same data and same parameter. But it may slow down computation.
"""
def __init__(self, criterion_method="xgboost", criterion_params=[0.1, 0], max_depth=3,
min_sample_split=2, min_impurity_split=1e-3, min_leaf_node=1,
max_split_nodes=consts.MAX_SPLIT_NODES, feature_importance_type='split',
n_iter_no_change=True, tol=0.001, min_child_weight=0,
use_missing=False, zero_as_missing=False, deterministic=False):
super(DecisionTreeParam, self).__init__()
self.criterion_method = criterion_method
self.criterion_params = criterion_params
self.max_depth = max_depth
self.min_sample_split = min_sample_split
self.min_impurity_split = min_impurity_split
self.min_leaf_node = min_leaf_node
self.min_child_weight = min_child_weight
self.max_split_nodes = max_split_nodes
self.feature_importance_type = feature_importance_type
self.n_iter_no_change = n_iter_no_change
self.tol = tol
self.use_missing = use_missing
self.zero_as_missing = zero_as_missing
self.deterministic = deterministic
def check(self):
descr = "decision tree param"
self.criterion_method = self.check_and_change_lower(self.criterion_method,
["xgboost"],
descr)
if len(self.criterion_params) == 0:
raise ValueError("decisition tree param's criterio_params should be non empty")
if isinstance(self.criterion_params, list):
assert len(self.criterion_params) == 2, 'length of criterion_param should be 2: l1, l2 regularization ' \
'values are needed'
self.check_nonnegative_number(self.criterion_params[0], 'l2 reg value')
self.check_nonnegative_number(self.criterion_params[1], 'l1 reg value')
elif isinstance(self.criterion_params, dict):
assert 'l1' in self.criterion_params and 'l2' in self.criterion_params, 'l1 and l2 keys are needed in ' \
'criterion_params dict'
self.criterion_params = [self.criterion_params['l2'], self.criterion_params['l1']]
else:
raise ValueError('criterion_params should be a dict or a list contains l1, l2 reg value')
if type(self.max_depth).__name__ not in ["int", "long"]:
raise ValueError("decision tree param's max_depth {} not supported, should be integer".format(
self.max_depth))
if self.max_depth < 1:
raise ValueError("decision tree param's max_depth should be positive integer, no less than 1")
if type(self.min_sample_split).__name__ not in ["int", "long"]:
raise ValueError("decision tree param's min_sample_split {} not supported, should be integer".format(
self.min_sample_split))
if type(self.min_impurity_split).__name__ not in ["int", "long", "float"]:
raise ValueError("decision tree param's min_impurity_split {} not supported, should be numeric".format(
self.min_impurity_split))
if type(self.min_leaf_node).__name__ not in ["int", "long"]:
raise ValueError("decision tree param's min_leaf_node {} not supported, should be integer".format(
self.min_leaf_node))
if type(self.max_split_nodes).__name__ not in ["int", "long"] or self.max_split_nodes < 1:
raise ValueError("decision tree param's max_split_nodes {} not supported, " +
"should be positive integer between 1 and {}".format(self.max_split_nodes,
consts.MAX_SPLIT_NODES))
if type(self.n_iter_no_change).__name__ != "bool":
raise ValueError("decision tree param's n_iter_no_change {} not supported, should be bool type".format(
self.n_iter_no_change))
if type(self.tol).__name__ not in ["float", "int", "long"]:
raise ValueError("decision tree param's tol {} not supported, should be numeric".format(self.tol))
self.feature_importance_type = self.check_and_change_lower(self.feature_importance_type,
["split", "gain"],
descr)
self.check_nonnegative_number(self.min_child_weight, 'min_child_weight')
self.check_boolean(self.deterministic, 'deterministic')
return True
__init__(self, criterion_method='xgboost', criterion_params=[0.1, 0], max_depth=3, min_sample_split=2, min_impurity_split=0.001, min_leaf_node=1, max_split_nodes=65536, feature_importance_type='split', n_iter_no_change=True, tol=0.001, min_child_weight=0, use_missing=False, zero_as_missing=False, deterministic=False)
special
¶Source code in federatedml/param/boosting_param.py
def __init__(self, criterion_method="xgboost", criterion_params=[0.1, 0], max_depth=3,
min_sample_split=2, min_impurity_split=1e-3, min_leaf_node=1,
max_split_nodes=consts.MAX_SPLIT_NODES, feature_importance_type='split',
n_iter_no_change=True, tol=0.001, min_child_weight=0,
use_missing=False, zero_as_missing=False, deterministic=False):
super(DecisionTreeParam, self).__init__()
self.criterion_method = criterion_method
self.criterion_params = criterion_params
self.max_depth = max_depth
self.min_sample_split = min_sample_split
self.min_impurity_split = min_impurity_split
self.min_leaf_node = min_leaf_node
self.min_child_weight = min_child_weight
self.max_split_nodes = max_split_nodes
self.feature_importance_type = feature_importance_type
self.n_iter_no_change = n_iter_no_change
self.tol = tol
self.use_missing = use_missing
self.zero_as_missing = zero_as_missing
self.deterministic = deterministic
check(self)
¶Source code in federatedml/param/boosting_param.py
def check(self):
descr = "decision tree param"
self.criterion_method = self.check_and_change_lower(self.criterion_method,
["xgboost"],
descr)
if len(self.criterion_params) == 0:
raise ValueError("decisition tree param's criterio_params should be non empty")
if isinstance(self.criterion_params, list):
assert len(self.criterion_params) == 2, 'length of criterion_param should be 2: l1, l2 regularization ' \
'values are needed'
self.check_nonnegative_number(self.criterion_params[0], 'l2 reg value')
self.check_nonnegative_number(self.criterion_params[1], 'l1 reg value')
elif isinstance(self.criterion_params, dict):
assert 'l1' in self.criterion_params and 'l2' in self.criterion_params, 'l1 and l2 keys are needed in ' \
'criterion_params dict'
self.criterion_params = [self.criterion_params['l2'], self.criterion_params['l1']]
else:
raise ValueError('criterion_params should be a dict or a list contains l1, l2 reg value')
if type(self.max_depth).__name__ not in ["int", "long"]:
raise ValueError("decision tree param's max_depth {} not supported, should be integer".format(
self.max_depth))
if self.max_depth < 1:
raise ValueError("decision tree param's max_depth should be positive integer, no less than 1")
if type(self.min_sample_split).__name__ not in ["int", "long"]:
raise ValueError("decision tree param's min_sample_split {} not supported, should be integer".format(
self.min_sample_split))
if type(self.min_impurity_split).__name__ not in ["int", "long", "float"]:
raise ValueError("decision tree param's min_impurity_split {} not supported, should be numeric".format(
self.min_impurity_split))
if type(self.min_leaf_node).__name__ not in ["int", "long"]:
raise ValueError("decision tree param's min_leaf_node {} not supported, should be integer".format(
self.min_leaf_node))
if type(self.max_split_nodes).__name__ not in ["int", "long"] or self.max_split_nodes < 1:
raise ValueError("decision tree param's max_split_nodes {} not supported, " +
"should be positive integer between 1 and {}".format(self.max_split_nodes,
consts.MAX_SPLIT_NODES))
if type(self.n_iter_no_change).__name__ != "bool":
raise ValueError("decision tree param's n_iter_no_change {} not supported, should be bool type".format(
self.n_iter_no_change))
if type(self.tol).__name__ not in ["float", "int", "long"]:
raise ValueError("decision tree param's tol {} not supported, should be numeric".format(self.tol))
self.feature_importance_type = self.check_and_change_lower(self.feature_importance_type,
["split", "gain"],
descr)
self.check_nonnegative_number(self.min_child_weight, 'min_child_weight')
self.check_boolean(self.deterministic, 'deterministic')
return True
BoostingParam (BaseParam)
¶Basic parameter for Boosting Algorithms
Parameters¶
task_type : {'classification', 'regression'}, default: 'classification' task type
objective_param : ObjectiveParam Object, default: ObjectiveParam() objective param
learning_rate : float, int or long the learning rate of secure boost. default: 0.3
num_trees : int or float the max number of boosting round. default: 5
subsample_feature_rate : float a float-number in [0, 1], default: 1.0
n_iter_no_change : bool, when True and residual error less than tol, tree building process will stop. default: True
positive integer greater than 1
bin number use in quantile. default: 32
None or positive integer or container object in python
Do validation in training process or Not. if equals None, will not do validation in train process; if equals positive integer, will validate data every validation_freqs epochs passes; if container object in python, will validate data if epochs belong to this container. e.g. validation_freqs = [10, 15], will validate data when epoch equals to 10 and 15. Default: None
Source code in federatedml/param/boosting_param.py
class BoostingParam(BaseParam):
"""
Basic parameter for Boosting Algorithms
Parameters
----------
task_type : {'classification', 'regression'}, default: 'classification'
task type
objective_param : ObjectiveParam Object, default: ObjectiveParam()
objective param
learning_rate : float, int or long
the learning rate of secure boost. default: 0.3
num_trees : int or float
the max number of boosting round. default: 5
subsample_feature_rate : float
a float-number in [0, 1], default: 1.0
n_iter_no_change : bool,
when True and residual error less than tol, tree building process will stop. default: True
bin_num: positive integer greater than 1
bin number use in quantile. default: 32
validation_freqs: None or positive integer or container object in python
Do validation in training process or Not.
if equals None, will not do validation in train process;
if equals positive integer, will validate data every validation_freqs epochs passes;
if container object in python, will validate data if epochs belong to this container.
e.g. validation_freqs = [10, 15], will validate data when epoch equals to 10 and 15.
Default: None
"""
def __init__(self, task_type=consts.CLASSIFICATION,
objective_param=ObjectiveParam(),
learning_rate=0.3, num_trees=5, subsample_feature_rate=1, n_iter_no_change=True,
tol=0.0001, bin_num=32,
predict_param=PredictParam(), cv_param=CrossValidationParam(),
validation_freqs=None, metrics=None, random_seed=100,
binning_error=consts.DEFAULT_RELATIVE_ERROR):
super(BoostingParam, self).__init__()
self.task_type = task_type
self.objective_param = copy.deepcopy(objective_param)
self.learning_rate = learning_rate
self.num_trees = num_trees
self.subsample_feature_rate = subsample_feature_rate
self.n_iter_no_change = n_iter_no_change
self.tol = tol
self.bin_num = bin_num
self.predict_param = copy.deepcopy(predict_param)
self.cv_param = copy.deepcopy(cv_param)
self.validation_freqs = validation_freqs
self.metrics = metrics
self.random_seed = random_seed
self.binning_error = binning_error
def check(self):
descr = "boosting tree param's"
if self.task_type not in [consts.CLASSIFICATION, consts.REGRESSION]:
raise ValueError("boosting_core tree param's task_type {} not supported, should be {} or {}".format(
self.task_type, consts.CLASSIFICATION, consts.REGRESSION))
self.objective_param.check(self.task_type)
if type(self.learning_rate).__name__ not in ["float", "int", "long"]:
raise ValueError("boosting_core tree param's learning_rate {} not supported, should be numeric".format(
self.learning_rate))
if type(self.subsample_feature_rate).__name__ not in ["float", "int", "long"] or \
self.subsample_feature_rate < 0 or self.subsample_feature_rate > 1:
raise ValueError(
"boosting_core tree param's subsample_feature_rate should be a numeric number between 0 and 1")
if type(self.n_iter_no_change).__name__ != "bool":
raise ValueError("boosting_core tree param's n_iter_no_change {} not supported, should be bool type".format(
self.n_iter_no_change))
if type(self.tol).__name__ not in ["float", "int", "long"]:
raise ValueError("boosting_core tree param's tol {} not supported, should be numeric".format(self.tol))
if type(self.bin_num).__name__ not in ["int", "long"] or self.bin_num < 2:
raise ValueError(
"boosting_core tree param's bin_num {} not supported, should be positive integer greater than 1".format(
self.bin_num))
if self.validation_freqs is None:
pass
elif isinstance(self.validation_freqs, int):
if self.validation_freqs < 1:
raise ValueError("validation_freqs should be larger than 0 when it's integer")
elif not isinstance(self.validation_freqs, collections.Container):
raise ValueError("validation_freqs should be None or positive integer or container")
if self.metrics is not None and not isinstance(self.metrics, list):
raise ValueError("metrics should be a list")
if self.random_seed is not None:
assert isinstance(self.random_seed, int) and self.random_seed >= 0, 'random seed must be an integer >= 0'
self.check_decimal_float(self.binning_error, descr)
return True
__init__(self, task_type='classification', objective_param=<federatedml.param.boosting_param.ObjectiveParam object at 0x7f3a40c33410>, learning_rate=0.3, num_trees=5, subsample_feature_rate=1, n_iter_no_change=True, tol=0.0001, bin_num=32, predict_param=<federatedml.param.predict_param.PredictParam object at 0x7f3a40c335d0>, cv_param=<federatedml.param.cross_validation_param.CrossValidationParam object at 0x7f3a40c336d0>, validation_freqs=None, metrics=None, random_seed=100, binning_error=0.0001)
special
¶Source code in federatedml/param/boosting_param.py
def __init__(self, task_type=consts.CLASSIFICATION,
objective_param=ObjectiveParam(),
learning_rate=0.3, num_trees=5, subsample_feature_rate=1, n_iter_no_change=True,
tol=0.0001, bin_num=32,
predict_param=PredictParam(), cv_param=CrossValidationParam(),
validation_freqs=None, metrics=None, random_seed=100,
binning_error=consts.DEFAULT_RELATIVE_ERROR):
super(BoostingParam, self).__init__()
self.task_type = task_type
self.objective_param = copy.deepcopy(objective_param)
self.learning_rate = learning_rate
self.num_trees = num_trees
self.subsample_feature_rate = subsample_feature_rate
self.n_iter_no_change = n_iter_no_change
self.tol = tol
self.bin_num = bin_num
self.predict_param = copy.deepcopy(predict_param)
self.cv_param = copy.deepcopy(cv_param)
self.validation_freqs = validation_freqs
self.metrics = metrics
self.random_seed = random_seed
self.binning_error = binning_error
check(self)
¶Source code in federatedml/param/boosting_param.py
def check(self):
descr = "boosting tree param's"
if self.task_type not in [consts.CLASSIFICATION, consts.REGRESSION]:
raise ValueError("boosting_core tree param's task_type {} not supported, should be {} or {}".format(
self.task_type, consts.CLASSIFICATION, consts.REGRESSION))
self.objective_param.check(self.task_type)
if type(self.learning_rate).__name__ not in ["float", "int", "long"]:
raise ValueError("boosting_core tree param's learning_rate {} not supported, should be numeric".format(
self.learning_rate))
if type(self.subsample_feature_rate).__name__ not in ["float", "int", "long"] or \
self.subsample_feature_rate < 0 or self.subsample_feature_rate > 1:
raise ValueError(
"boosting_core tree param's subsample_feature_rate should be a numeric number between 0 and 1")
if type(self.n_iter_no_change).__name__ != "bool":
raise ValueError("boosting_core tree param's n_iter_no_change {} not supported, should be bool type".format(
self.n_iter_no_change))
if type(self.tol).__name__ not in ["float", "int", "long"]:
raise ValueError("boosting_core tree param's tol {} not supported, should be numeric".format(self.tol))
if type(self.bin_num).__name__ not in ["int", "long"] or self.bin_num < 2:
raise ValueError(
"boosting_core tree param's bin_num {} not supported, should be positive integer greater than 1".format(
self.bin_num))
if self.validation_freqs is None:
pass
elif isinstance(self.validation_freqs, int):
if self.validation_freqs < 1:
raise ValueError("validation_freqs should be larger than 0 when it's integer")
elif not isinstance(self.validation_freqs, collections.Container):
raise ValueError("validation_freqs should be None or positive integer or container")
if self.metrics is not None and not isinstance(self.metrics, list):
raise ValueError("metrics should be a list")
if self.random_seed is not None:
assert isinstance(self.random_seed, int) and self.random_seed >= 0, 'random seed must be an integer >= 0'
self.check_decimal_float(self.binning_error, descr)
return True
HeteroBoostingParam (BoostingParam)
¶Parameters¶
encrypt_param : EncodeParam Object encrypt method use in secure boost, default: EncryptParam()
EncryptedModeCalculatorParam object
the calculation mode use in secureboost, default: EncryptedModeCalculatorParam()
Source code in federatedml/param/boosting_param.py
class HeteroBoostingParam(BoostingParam):
"""
Parameters
----------
encrypt_param : EncodeParam Object
encrypt method use in secure boost, default: EncryptParam()
encrypted_mode_calculator_param: EncryptedModeCalculatorParam object
the calculation mode use in secureboost,
default: EncryptedModeCalculatorParam()
"""
def __init__(self, task_type=consts.CLASSIFICATION,
objective_param=ObjectiveParam(),
learning_rate=0.3, num_trees=5, subsample_feature_rate=1, n_iter_no_change=True,
tol=0.0001, encrypt_param=EncryptParam(),
bin_num=32,
encrypted_mode_calculator_param=EncryptedModeCalculatorParam(),
predict_param=PredictParam(), cv_param=CrossValidationParam(),
validation_freqs=None, early_stopping_rounds=None, metrics=None, use_first_metric_only=False,
random_seed=100, binning_error=consts.DEFAULT_RELATIVE_ERROR):
super(HeteroBoostingParam, self).__init__(task_type, objective_param, learning_rate, num_trees,
subsample_feature_rate, n_iter_no_change, tol, bin_num,
predict_param, cv_param, validation_freqs, metrics=metrics,
random_seed=random_seed,
binning_error=binning_error)
self.encrypt_param = copy.deepcopy(encrypt_param)
self.encrypted_mode_calculator_param = copy.deepcopy(encrypted_mode_calculator_param)
self.early_stopping_rounds = early_stopping_rounds
self.use_first_metric_only = use_first_metric_only
def check(self):
super(HeteroBoostingParam, self).check()
self.encrypted_mode_calculator_param.check()
self.encrypt_param.check()
if self.early_stopping_rounds is None:
pass
elif isinstance(self.early_stopping_rounds, int):
if self.early_stopping_rounds < 1:
raise ValueError("early stopping rounds should be larger than 0 when it's integer")
if self.validation_freqs is None:
raise ValueError("validation freqs must be set when early stopping is enabled")
if not isinstance(self.use_first_metric_only, bool):
raise ValueError("use_first_metric_only should be a boolean")
return True
__init__(self, task_type='classification', objective_param=<federatedml.param.boosting_param.ObjectiveParam object at 0x7f3a40c33690>, learning_rate=0.3, num_trees=5, subsample_feature_rate=1, n_iter_no_change=True, tol=0.0001, encrypt_param=<federatedml.param.encrypt_param.EncryptParam object at 0x7f3a40c33790>, bin_num=32, encrypted_mode_calculator_param=<federatedml.param.encrypted_mode_calculation_param.EncryptedModeCalculatorParam object at 0x7f3a40c33850>, predict_param=<federatedml.param.predict_param.PredictParam object at 0x7f3a40c33650>, cv_param=<federatedml.param.cross_validation_param.CrossValidationParam object at 0x7f3a40c337d0>, validation_freqs=None, early_stopping_rounds=None, metrics=None, use_first_metric_only=False, random_seed=100, binning_error=0.0001)
special
¶Source code in federatedml/param/boosting_param.py
def __init__(self, task_type=consts.CLASSIFICATION,
objective_param=ObjectiveParam(),
learning_rate=0.3, num_trees=5, subsample_feature_rate=1, n_iter_no_change=True,
tol=0.0001, encrypt_param=EncryptParam(),
bin_num=32,
encrypted_mode_calculator_param=EncryptedModeCalculatorParam(),
predict_param=PredictParam(), cv_param=CrossValidationParam(),
validation_freqs=None, early_stopping_rounds=None, metrics=None, use_first_metric_only=False,
random_seed=100, binning_error=consts.DEFAULT_RELATIVE_ERROR):
super(HeteroBoostingParam, self).__init__(task_type, objective_param, learning_rate, num_trees,
subsample_feature_rate, n_iter_no_change, tol, bin_num,
predict_param, cv_param, validation_freqs, metrics=metrics,
random_seed=random_seed,
binning_error=binning_error)
self.encrypt_param = copy.deepcopy(encrypt_param)
self.encrypted_mode_calculator_param = copy.deepcopy(encrypted_mode_calculator_param)
self.early_stopping_rounds = early_stopping_rounds
self.use_first_metric_only = use_first_metric_only
check(self)
¶Source code in federatedml/param/boosting_param.py
def check(self):
super(HeteroBoostingParam, self).check()
self.encrypted_mode_calculator_param.check()
self.encrypt_param.check()
if self.early_stopping_rounds is None:
pass
elif isinstance(self.early_stopping_rounds, int):
if self.early_stopping_rounds < 1:
raise ValueError("early stopping rounds should be larger than 0 when it's integer")
if self.validation_freqs is None:
raise ValueError("validation freqs must be set when early stopping is enabled")
if not isinstance(self.use_first_metric_only, bool):
raise ValueError("use_first_metric_only should be a boolean")
return True
HeteroSecureBoostParam (HeteroBoostingParam)
¶Define boosting tree parameters that used in federated ml.
Parameters¶
task_type : {'classification', 'regression'}, default: 'classification' task type
tree_param : DecisionTreeParam Object, default: DecisionTreeParam() tree param
objective_param : ObjectiveParam Object, default: ObjectiveParam() objective param
learning_rate : float, int or long the learning rate of secure boost. default: 0.3
num_trees : int or float the max number of trees to build. default: 5
subsample_feature_rate : float a float-number in [0, 1], default: 1.0
int
seed that controls all random functions
n_iter_no_change : bool, when True and residual error less than tol, tree building process will stop. default: True
encrypt_param : EncodeParam Object encrypt method use in secure boost, default: EncryptParam(), this parameter is only for hetero-secureboost
positive integer greater than 1
bin number use in quantile. default: 32
EncryptedModeCalculatorParam object
the calculation mode use in secureboost, default: EncryptedModeCalculatorParam(), only for hetero-secureboost
bool
use missing value in training process or not. default: False
bool
regard 0 as missing value or not, will be use only if use_missing=True, default: False
None or positive integer or container object in python
Do validation in training process or Not. if equals None, will not do validation in train process; if equals positive integer, will validate data every validation_freqs epochs passes; if container object in python, will validate data if epochs belong to this container. e.g. validation_freqs = [10, 15], will validate data when epoch equals to 10 and 15. Default: None The default value is None, 1 is suggested. You can set it to a number larger than 1 in order to speed up training by skipping validation rounds. When it is larger than 1, a number which is divisible by "num_trees" is recommended, otherwise, you will miss the validation scores of last training iteration.
integer larger than 0
will stop training if one metric of one validation data doesn’t improve in last early_stopping_round rounds, need to set validation freqs and will check early_stopping every at every validation epoch,
list, default: []
Specify which metrics to be used when performing evaluation during training process. If set as empty, default metrics will be used. For regression tasks, default metrics are ['root_mean_squared_error', 'mean_absolute_error'], For binary-classificatiin tasks, default metrics are ['auc', 'ks']. For multi-classification tasks, default metrics are ['accuracy', 'precision', 'recall']
bool
use only the first metric for early stopping
bool
if use complete_secure, when use complete secure, build first tree using only guest features
Sparse_optimization
this parameter is abandoned in FATE-1.7.1
bool
activate Gradient-based One-Side Sampling, which selects large gradient and small gradient samples using top_rate and other_rate.
top_rate: float, the retain ratio of large gradient data, used when run_goss is True
other_rate: float, the retain ratio of small gradient data, used when run_goss is True
cipher_compress_error: This param is now abandoned
cipher_compress: bool, default is True, use cipher compressing to reduce computation cost and transfer cost
boosting_strategy:str
std: standard sbt setting
!!! mix "alternate using guest/host features to build trees. For example, the first 'tree_num_per_party' trees"
use guest features,
the second k trees use host features, and so on
!!! layered "only support 2 party, when running layered mode, first 'host_depth' layer will use host features,"
and then next 'guest_depth' will only use guest features
str
This parameter has the same function as boosting_strategy, but is deprecated
int, every party will alternate build 'tree_num_per_party' trees until reach max tree num, this
param is valid when boosting_strategy is mix
int, guest will build last guest_depth of a decision tree using guest features, is valid when boosting_strategy
is layered
int, host will build first host_depth of a decision tree using host features, is valid when work boosting_strategy
layered
multi_mode: str, decide which mode to use when running multi-classification task:
single_output standard gbdt multi-classification strategy
multi_output every leaf give a multi-dimension predict, using multi_mode can save time
by learning a model with less trees.
bool
default is False, this option changes the inference algorithm used in predict tasks. a secure prediction method that hides decision path to enhance security in the inference step. This method is insprired by EINI inference algorithm.
bool
default is False multiply predict result by a random float number to confuse original predict result. This operation further enhances the security of naive EINI algorithm.
bool
default is False check the complexity of tree models when running EINI algorithms. Complexity models are easy to hide their decision path, while simple tree models are not, therefore if a tree model is too simple, it is not allowed to run EINI predict algorithms.
Source code in federatedml/param/boosting_param.py
class HeteroSecureBoostParam(HeteroBoostingParam):
"""
Define boosting tree parameters that used in federated ml.
Parameters
----------
task_type : {'classification', 'regression'}, default: 'classification'
task type
tree_param : DecisionTreeParam Object, default: DecisionTreeParam()
tree param
objective_param : ObjectiveParam Object, default: ObjectiveParam()
objective param
learning_rate : float, int or long
the learning rate of secure boost. default: 0.3
num_trees : int or float
the max number of trees to build. default: 5
subsample_feature_rate : float
a float-number in [0, 1], default: 1.0
random_seed: int
seed that controls all random functions
n_iter_no_change : bool,
when True and residual error less than tol, tree building process will stop. default: True
encrypt_param : EncodeParam Object
encrypt method use in secure boost, default: EncryptParam(), this parameter
is only for hetero-secureboost
bin_num: positive integer greater than 1
bin number use in quantile. default: 32
encrypted_mode_calculator_param: EncryptedModeCalculatorParam object
the calculation mode use in secureboost, default: EncryptedModeCalculatorParam(), only for hetero-secureboost
use_missing: bool
use missing value in training process or not. default: False
zero_as_missing: bool
regard 0 as missing value or not, will be use only if use_missing=True, default: False
validation_freqs: None or positive integer or container object in python
Do validation in training process or Not.
if equals None, will not do validation in train process;
if equals positive integer, will validate data every validation_freqs epochs passes;
if container object in python, will validate data if epochs belong to this container.
e.g. validation_freqs = [10, 15], will validate data when epoch equals to 10 and 15.
Default: None
The default value is None, 1 is suggested. You can set it to a number larger than 1 in order to
speed up training by skipping validation rounds. When it is larger than 1, a number which is
divisible by "num_trees" is recommended, otherwise, you will miss the validation scores
of last training iteration.
early_stopping_rounds: integer larger than 0
will stop training if one metric of one validation data
doesn’t improve in last early_stopping_round rounds,
need to set validation freqs and will check early_stopping every at every validation epoch,
metrics: list, default: []
Specify which metrics to be used when performing evaluation during training process.
If set as empty, default metrics will be used. For regression tasks, default metrics are
['root_mean_squared_error', 'mean_absolute_error'], For binary-classificatiin tasks, default metrics
are ['auc', 'ks']. For multi-classification tasks, default metrics are ['accuracy', 'precision', 'recall']
use_first_metric_only: bool
use only the first metric for early stopping
complete_secure: bool
if use complete_secure, when use complete secure, build first tree using only guest features
sparse_optimization:
this parameter is abandoned in FATE-1.7.1
run_goss: bool
activate Gradient-based One-Side Sampling, which selects large gradient and small
gradient samples using top_rate and other_rate.
top_rate: float, the retain ratio of large gradient data, used when run_goss is True
other_rate: float, the retain ratio of small gradient data, used when run_goss is True
cipher_compress_error: This param is now abandoned
cipher_compress: bool, default is True, use cipher compressing to reduce computation cost and transfer cost
boosting_strategy:str
std: standard sbt setting
mix: alternate using guest/host features to build trees. For example, the first 'tree_num_per_party' trees
use guest features,
the second k trees use host features, and so on
layered: only support 2 party, when running layered mode, first 'host_depth' layer will use host features,
and then next 'guest_depth' will only use guest features
work_mode: str
This parameter has the same function as boosting_strategy, but is deprecated
tree_num_per_party: int, every party will alternate build 'tree_num_per_party' trees until reach max tree num, this
param is valid when boosting_strategy is mix
guest_depth: int, guest will build last guest_depth of a decision tree using guest features, is valid when boosting_strategy
is layered
host_depth: int, host will build first host_depth of a decision tree using host features, is valid when work boosting_strategy
layered
multi_mode: str, decide which mode to use when running multi-classification task:
single_output standard gbdt multi-classification strategy
multi_output every leaf give a multi-dimension predict, using multi_mode can save time
by learning a model with less trees.
EINI_inference: bool
default is False, this option changes the inference algorithm used in predict tasks.
a secure prediction method that hides decision path to enhance security in the inference
step. This method is insprired by EINI inference algorithm.
EINI_random_mask: bool
default is False
multiply predict result by a random float number to confuse original predict result. This operation further
enhances the security of naive EINI algorithm.
EINI_complexity_check: bool
default is False
check the complexity of tree models when running EINI algorithms. Complexity models are easy to hide their
decision path, while simple tree models are not, therefore if a tree model is too simple, it is not allowed
to run EINI predict algorithms.
"""
def __init__(self, tree_param: DecisionTreeParam = DecisionTreeParam(), task_type=consts.CLASSIFICATION,
objective_param=ObjectiveParam(),
learning_rate=0.3, num_trees=5, subsample_feature_rate=1.0, n_iter_no_change=True,
tol=0.0001, encrypt_param=EncryptParam(),
bin_num=32,
encrypted_mode_calculator_param=EncryptedModeCalculatorParam(),
predict_param=PredictParam(), cv_param=CrossValidationParam(),
validation_freqs=None, early_stopping_rounds=None, use_missing=False, zero_as_missing=False,
complete_secure=False, metrics=None, use_first_metric_only=False, random_seed=100,
binning_error=consts.DEFAULT_RELATIVE_ERROR,
sparse_optimization=False, run_goss=False, top_rate=0.2, other_rate=0.1,
cipher_compress_error=None, cipher_compress=True, new_ver=True, boosting_strategy=consts.STD_TREE,
work_mode=None, tree_num_per_party=1, guest_depth=2, host_depth=3, callback_param=CallbackParam(),
multi_mode=consts.SINGLE_OUTPUT, EINI_inference=False, EINI_random_mask=False,
EINI_complexity_check=False):
super(HeteroSecureBoostParam, self).__init__(task_type, objective_param, learning_rate, num_trees,
subsample_feature_rate, n_iter_no_change, tol, encrypt_param,
bin_num, encrypted_mode_calculator_param, predict_param, cv_param,
validation_freqs, early_stopping_rounds, metrics=metrics,
use_first_metric_only=use_first_metric_only,
random_seed=random_seed,
binning_error=binning_error)
self.tree_param = copy.deepcopy(tree_param)
self.zero_as_missing = zero_as_missing
self.use_missing = use_missing
self.complete_secure = complete_secure
self.sparse_optimization = sparse_optimization
self.run_goss = run_goss
self.top_rate = top_rate
self.other_rate = other_rate
self.cipher_compress_error = cipher_compress_error
self.cipher_compress = cipher_compress
self.new_ver = new_ver
self.EINI_inference = EINI_inference
self.EINI_random_mask = EINI_random_mask
self.EINI_complexity_check = EINI_complexity_check
self.boosting_strategy = boosting_strategy
self.work_mode = work_mode
self.tree_num_per_party = tree_num_per_party
self.guest_depth = guest_depth
self.host_depth = host_depth
self.callback_param = copy.deepcopy(callback_param)
self.multi_mode = multi_mode
def check(self):
super(HeteroSecureBoostParam, self).check()
self.tree_param.check()
if not isinstance(self.use_missing, bool):
raise ValueError('use missing should be bool type')
if not isinstance(self.zero_as_missing, bool):
raise ValueError('zero as missing should be bool type')
self.check_boolean(self.complete_secure, 'complete_secure')
self.check_boolean(self.run_goss, 'run goss')
self.check_decimal_float(self.top_rate, 'top rate')
self.check_decimal_float(self.other_rate, 'other rate')
self.check_positive_number(self.other_rate, 'other_rate')
self.check_positive_number(self.top_rate, 'top_rate')
self.check_boolean(self.new_ver, 'code version switcher')
self.check_boolean(self.cipher_compress, 'cipher compress')
self.check_boolean(self.EINI_inference, 'eini inference')
self.check_boolean(self.EINI_random_mask, 'eini random mask')
self.check_boolean(self.EINI_complexity_check, 'eini complexity check')
if self.EINI_inference and self.EINI_random_mask:
LOGGER.warning('To protect the inference decision path, notice that current setting will multiply'
' predict result by a random number, hence SecureBoost will return confused predict scores'
' that is not the same as the original predict scores')
if self.work_mode == consts.MIX_TREE and self.EINI_inference:
LOGGER.warning('Mix tree mode does not support EINI, use default predict setting')
if self.work_mode is not None:
self.boosting_strategy = self.work_mode
if self.multi_mode not in [consts.SINGLE_OUTPUT, consts.MULTI_OUTPUT]:
raise ValueError('unsupported multi-classification mode')
if self.multi_mode == consts.MULTI_OUTPUT:
if self.boosting_strategy != consts.STD_TREE:
raise ValueError('MO trees only works when boosting strategy is std tree')
if not self.cipher_compress:
raise ValueError('Mo trees only works when cipher compress is enabled')
if self.boosting_strategy not in [consts.STD_TREE, consts.LAYERED_TREE, consts.MIX_TREE]:
raise ValueError('unknown sbt boosting strategy{}'.format(self.boosting_strategy))
for p in ["early_stopping_rounds", "validation_freqs", "metrics",
"use_first_metric_only"]:
# if self._warn_to_deprecate_param(p, "", ""):
if self._deprecated_params_set.get(p):
if "callback_param" in self.get_user_feeded():
raise ValueError(f"{p} and callback param should not be set simultaneously,"
f"{self._deprecated_params_set}, {self.get_user_feeded()}")
else:
self.callback_param.callbacks = ["PerformanceEvaluate"]
break
descr = "boosting_param's"
if self._warn_to_deprecate_param("validation_freqs", descr, "callback_param's 'validation_freqs'"):
self.callback_param.validation_freqs = self.validation_freqs
if self._warn_to_deprecate_param("early_stopping_rounds", descr, "callback_param's 'early_stopping_rounds'"):
self.callback_param.early_stopping_rounds = self.early_stopping_rounds
if self._warn_to_deprecate_param("metrics", descr, "callback_param's 'metrics'"):
self.callback_param.metrics = self.metrics
if self._warn_to_deprecate_param("use_first_metric_only", descr, "callback_param's 'use_first_metric_only'"):
self.callback_param.use_first_metric_only = self.use_first_metric_only
if self.top_rate + self.other_rate >= 1:
raise ValueError('sum of top rate and other rate should be smaller than 1')
return True
__init__(self, tree_param=<federatedml.param.boosting_param.DecisionTreeParam object at 0x7f3a40c33910>, task_type='classification', objective_param=<federatedml.param.boosting_param.ObjectiveParam object at 0x7f3a40c33a90>, learning_rate=0.3, num_trees=5, subsample_feature_rate=1.0, n_iter_no_change=True, tol=0.0001, encrypt_param=<federatedml.param.encrypt_param.EncryptParam object at 0x7f3a40c33b10>, bin_num=32, encrypted_mode_calculator_param=<federatedml.param.encrypted_mode_calculation_param.EncryptedModeCalculatorParam object at 0x7f3a40c33b50>, predict_param=<federatedml.param.predict_param.PredictParam object at 0x7f3a40c33ad0>, cv_param=<federatedml.param.cross_validation_param.CrossValidationParam object at 0x7f3a40c33990>, validation_freqs=None, early_stopping_rounds=None, use_missing=False, zero_as_missing=False, complete_secure=False, metrics=None, use_first_metric_only=False, random_seed=100, binning_error=0.0001, sparse_optimization=False, run_goss=False, top_rate=0.2, other_rate=0.1, cipher_compress_error=None, cipher_compress=True, new_ver=True, boosting_strategy='std', work_mode=None, tree_num_per_party=1, guest_depth=2, host_depth=3, callback_param=<federatedml.param.callback_param.CallbackParam object at 0x7f3a40c33c50>, multi_mode='single_output', EINI_inference=False, EINI_random_mask=False, EINI_complexity_check=False)
special
¶Source code in federatedml/param/boosting_param.py
def __init__(self, tree_param: DecisionTreeParam = DecisionTreeParam(), task_type=consts.CLASSIFICATION,
objective_param=ObjectiveParam(),
learning_rate=0.3, num_trees=5, subsample_feature_rate=1.0, n_iter_no_change=True,
tol=0.0001, encrypt_param=EncryptParam(),
bin_num=32,
encrypted_mode_calculator_param=EncryptedModeCalculatorParam(),
predict_param=PredictParam(), cv_param=CrossValidationParam(),
validation_freqs=None, early_stopping_rounds=None, use_missing=False, zero_as_missing=False,
complete_secure=False, metrics=None, use_first_metric_only=False, random_seed=100,
binning_error=consts.DEFAULT_RELATIVE_ERROR,
sparse_optimization=False, run_goss=False, top_rate=0.2, other_rate=0.1,
cipher_compress_error=None, cipher_compress=True, new_ver=True, boosting_strategy=consts.STD_TREE,
work_mode=None, tree_num_per_party=1, guest_depth=2, host_depth=3, callback_param=CallbackParam(),
multi_mode=consts.SINGLE_OUTPUT, EINI_inference=False, EINI_random_mask=False,
EINI_complexity_check=False):
super(HeteroSecureBoostParam, self).__init__(task_type, objective_param, learning_rate, num_trees,
subsample_feature_rate, n_iter_no_change, tol, encrypt_param,
bin_num, encrypted_mode_calculator_param, predict_param, cv_param,
validation_freqs, early_stopping_rounds, metrics=metrics,
use_first_metric_only=use_first_metric_only,
random_seed=random_seed,
binning_error=binning_error)
self.tree_param = copy.deepcopy(tree_param)
self.zero_as_missing = zero_as_missing
self.use_missing = use_missing
self.complete_secure = complete_secure
self.sparse_optimization = sparse_optimization
self.run_goss = run_goss
self.top_rate = top_rate
self.other_rate = other_rate
self.cipher_compress_error = cipher_compress_error
self.cipher_compress = cipher_compress
self.new_ver = new_ver
self.EINI_inference = EINI_inference
self.EINI_random_mask = EINI_random_mask
self.EINI_complexity_check = EINI_complexity_check
self.boosting_strategy = boosting_strategy
self.work_mode = work_mode
self.tree_num_per_party = tree_num_per_party
self.guest_depth = guest_depth
self.host_depth = host_depth
self.callback_param = copy.deepcopy(callback_param)
self.multi_mode = multi_mode
check(self)
¶Source code in federatedml/param/boosting_param.py
def check(self):
super(HeteroSecureBoostParam, self).check()
self.tree_param.check()
if not isinstance(self.use_missing, bool):
raise ValueError('use missing should be bool type')
if not isinstance(self.zero_as_missing, bool):
raise ValueError('zero as missing should be bool type')
self.check_boolean(self.complete_secure, 'complete_secure')
self.check_boolean(self.run_goss, 'run goss')
self.check_decimal_float(self.top_rate, 'top rate')
self.check_decimal_float(self.other_rate, 'other rate')
self.check_positive_number(self.other_rate, 'other_rate')
self.check_positive_number(self.top_rate, 'top_rate')
self.check_boolean(self.new_ver, 'code version switcher')
self.check_boolean(self.cipher_compress, 'cipher compress')
self.check_boolean(self.EINI_inference, 'eini inference')
self.check_boolean(self.EINI_random_mask, 'eini random mask')
self.check_boolean(self.EINI_complexity_check, 'eini complexity check')
if self.EINI_inference and self.EINI_random_mask:
LOGGER.warning('To protect the inference decision path, notice that current setting will multiply'
' predict result by a random number, hence SecureBoost will return confused predict scores'
' that is not the same as the original predict scores')
if self.work_mode == consts.MIX_TREE and self.EINI_inference:
LOGGER.warning('Mix tree mode does not support EINI, use default predict setting')
if self.work_mode is not None:
self.boosting_strategy = self.work_mode
if self.multi_mode not in [consts.SINGLE_OUTPUT, consts.MULTI_OUTPUT]:
raise ValueError('unsupported multi-classification mode')
if self.multi_mode == consts.MULTI_OUTPUT:
if self.boosting_strategy != consts.STD_TREE:
raise ValueError('MO trees only works when boosting strategy is std tree')
if not self.cipher_compress:
raise ValueError('Mo trees only works when cipher compress is enabled')
if self.boosting_strategy not in [consts.STD_TREE, consts.LAYERED_TREE, consts.MIX_TREE]:
raise ValueError('unknown sbt boosting strategy{}'.format(self.boosting_strategy))
for p in ["early_stopping_rounds", "validation_freqs", "metrics",
"use_first_metric_only"]:
# if self._warn_to_deprecate_param(p, "", ""):
if self._deprecated_params_set.get(p):
if "callback_param" in self.get_user_feeded():
raise ValueError(f"{p} and callback param should not be set simultaneously,"
f"{self._deprecated_params_set}, {self.get_user_feeded()}")
else:
self.callback_param.callbacks = ["PerformanceEvaluate"]
break
descr = "boosting_param's"
if self._warn_to_deprecate_param("validation_freqs", descr, "callback_param's 'validation_freqs'"):
self.callback_param.validation_freqs = self.validation_freqs
if self._warn_to_deprecate_param("early_stopping_rounds", descr, "callback_param's 'early_stopping_rounds'"):
self.callback_param.early_stopping_rounds = self.early_stopping_rounds
if self._warn_to_deprecate_param("metrics", descr, "callback_param's 'metrics'"):
self.callback_param.metrics = self.metrics
if self._warn_to_deprecate_param("use_first_metric_only", descr, "callback_param's 'use_first_metric_only'"):
self.callback_param.use_first_metric_only = self.use_first_metric_only
if self.top_rate + self.other_rate >= 1:
raise ValueError('sum of top rate and other rate should be smaller than 1')
return True
HomoSecureBoostParam (BoostingParam)
¶Parameters¶
{'distributed', 'memory'}
decides which backend to use when computing histograms for homo-sbt
Source code in federatedml/param/boosting_param.py
class HomoSecureBoostParam(BoostingParam):
"""
Parameters
----------
backend: {'distributed', 'memory'}
decides which backend to use when computing histograms for homo-sbt
"""
def __init__(self, tree_param: DecisionTreeParam = DecisionTreeParam(), task_type=consts.CLASSIFICATION,
objective_param=ObjectiveParam(),
learning_rate=0.3, num_trees=5, subsample_feature_rate=1, n_iter_no_change=True,
tol=0.0001, bin_num=32, predict_param=PredictParam(), cv_param=CrossValidationParam(),
validation_freqs=None, use_missing=False, zero_as_missing=False, random_seed=100,
binning_error=consts.DEFAULT_RELATIVE_ERROR, backend=consts.DISTRIBUTED_BACKEND,
callback_param=CallbackParam(), multi_mode=consts.SINGLE_OUTPUT):
super(HomoSecureBoostParam, self).__init__(task_type=task_type,
objective_param=objective_param,
learning_rate=learning_rate,
num_trees=num_trees,
subsample_feature_rate=subsample_feature_rate,
n_iter_no_change=n_iter_no_change,
tol=tol,
bin_num=bin_num,
predict_param=predict_param,
cv_param=cv_param,
validation_freqs=validation_freqs,
random_seed=random_seed,
binning_error=binning_error
)
self.use_missing = use_missing
self.zero_as_missing = zero_as_missing
self.tree_param = copy.deepcopy(tree_param)
self.backend = backend
self.callback_param = copy.deepcopy(callback_param)
self.multi_mode = multi_mode
def check(self):
super(HomoSecureBoostParam, self).check()
self.tree_param.check()
if not isinstance(self.use_missing, bool):
raise ValueError('use missing should be bool type')
if not isinstance(self.zero_as_missing, bool):
raise ValueError('zero as missing should be bool type')
if self.backend not in [consts.MEMORY_BACKEND, consts.DISTRIBUTED_BACKEND]:
raise ValueError('unsupported backend')
if self.multi_mode not in [consts.SINGLE_OUTPUT, consts.MULTI_OUTPUT]:
raise ValueError('unsupported multi-classification mode')
for p in ["validation_freqs", "metrics"]:
# if self._warn_to_deprecate_param(p, "", ""):
if self._deprecated_params_set.get(p):
if "callback_param" in self.get_user_feeded():
raise ValueError(f"{p} and callback param should not be set simultaneously,"
f"{self._deprecated_params_set}, {self.get_user_feeded()}")
else:
self.callback_param.callbacks = ["PerformanceEvaluate"]
break
descr = "boosting_param's"
if self._warn_to_deprecate_param("validation_freqs", descr, "callback_param's 'validation_freqs'"):
self.callback_param.validation_freqs = self.validation_freqs
if self._warn_to_deprecate_param("metrics", descr, "callback_param's 'metrics'"):
self.callback_param.metrics = self.metrics
if self.multi_mode not in [consts.SINGLE_OUTPUT, consts.MULTI_OUTPUT]:
raise ValueError('unsupported multi-classification mode')
if self.multi_mode == consts.MULTI_OUTPUT:
if self.task_type == consts.REGRESSION:
raise ValueError('regression tasks not support multi-output trees')
return True
__init__(self, tree_param=<federatedml.param.boosting_param.DecisionTreeParam object at 0x7f3a40c33c90>, task_type='classification', objective_param=<federatedml.param.boosting_param.ObjectiveParam object at 0x7f3a40c33d50>, learning_rate=0.3, num_trees=5, subsample_feature_rate=1, n_iter_no_change=True, tol=0.0001, bin_num=32, predict_param=<federatedml.param.predict_param.PredictParam object at 0x7f3a40c33dd0>, cv_param=<federatedml.param.cross_validation_param.CrossValidationParam object at 0x7f3a40c33e10>, validation_freqs=None, use_missing=False, zero_as_missing=False, random_seed=100, binning_error=0.0001, backend='distributed', callback_param=<federatedml.param.callback_param.CallbackParam object at 0x7f3a40c33e90>, multi_mode='single_output')
special
¶Source code in federatedml/param/boosting_param.py
def __init__(self, tree_param: DecisionTreeParam = DecisionTreeParam(), task_type=consts.CLASSIFICATION,
objective_param=ObjectiveParam(),
learning_rate=0.3, num_trees=5, subsample_feature_rate=1, n_iter_no_change=True,
tol=0.0001, bin_num=32, predict_param=PredictParam(), cv_param=CrossValidationParam(),
validation_freqs=None, use_missing=False, zero_as_missing=False, random_seed=100,
binning_error=consts.DEFAULT_RELATIVE_ERROR, backend=consts.DISTRIBUTED_BACKEND,
callback_param=CallbackParam(), multi_mode=consts.SINGLE_OUTPUT):
super(HomoSecureBoostParam, self).__init__(task_type=task_type,
objective_param=objective_param,
learning_rate=learning_rate,
num_trees=num_trees,
subsample_feature_rate=subsample_feature_rate,
n_iter_no_change=n_iter_no_change,
tol=tol,
bin_num=bin_num,
predict_param=predict_param,
cv_param=cv_param,
validation_freqs=validation_freqs,
random_seed=random_seed,
binning_error=binning_error
)
self.use_missing = use_missing
self.zero_as_missing = zero_as_missing
self.tree_param = copy.deepcopy(tree_param)
self.backend = backend
self.callback_param = copy.deepcopy(callback_param)
self.multi_mode = multi_mode
check(self)
¶Source code in federatedml/param/boosting_param.py
def check(self):
super(HomoSecureBoostParam, self).check()
self.tree_param.check()
if not isinstance(self.use_missing, bool):
raise ValueError('use missing should be bool type')
if not isinstance(self.zero_as_missing, bool):
raise ValueError('zero as missing should be bool type')
if self.backend not in [consts.MEMORY_BACKEND, consts.DISTRIBUTED_BACKEND]:
raise ValueError('unsupported backend')
if self.multi_mode not in [consts.SINGLE_OUTPUT, consts.MULTI_OUTPUT]:
raise ValueError('unsupported multi-classification mode')
for p in ["validation_freqs", "metrics"]:
# if self._warn_to_deprecate_param(p, "", ""):
if self._deprecated_params_set.get(p):
if "callback_param" in self.get_user_feeded():
raise ValueError(f"{p} and callback param should not be set simultaneously,"
f"{self._deprecated_params_set}, {self.get_user_feeded()}")
else:
self.callback_param.callbacks = ["PerformanceEvaluate"]
break
descr = "boosting_param's"
if self._warn_to_deprecate_param("validation_freqs", descr, "callback_param's 'validation_freqs'"):
self.callback_param.validation_freqs = self.validation_freqs
if self._warn_to_deprecate_param("metrics", descr, "callback_param's 'metrics'"):
self.callback_param.metrics = self.metrics
if self.multi_mode not in [consts.SINGLE_OUTPUT, consts.MULTI_OUTPUT]:
raise ValueError('unsupported multi-classification mode')
if self.multi_mode == consts.MULTI_OUTPUT:
if self.task_type == consts.REGRESSION:
raise ValueError('regression tasks not support multi-output trees')
return True
callback_param
¶
Classes¶
CallbackParam (BaseParam)
¶Define callback method that used in federated ml.
Parameters¶
callbacks : list, default: [] Indicate what kinds of callback functions is desired during the training process. Accepted values: {'EarlyStopping', 'ModelCheckpoint', 'PerformanceEvaluate'}
{None, int, list, tuple, set}
validation frequency during training.
None or int
Will stop training if one metric doesn’t improve in last early_stopping_round rounds
None, or list
Indicate when executing evaluation during train process, which metrics will be used. If set as empty, default metrics for specific task type will be used. As for binary classification, default metrics are ['auc', 'ks']
bool, default: False
Indicate whether use the first metric only for early stopping judgement.
int, default: 1
The callbacks save model every save_freq epoch
Source code in federatedml/param/callback_param.py
class CallbackParam(BaseParam):
"""
Define callback method that used in federated ml.
Parameters
----------
callbacks : list, default: []
Indicate what kinds of callback functions is desired during the training process.
Accepted values: {'EarlyStopping', 'ModelCheckpoint', 'PerformanceEvaluate'}
validation_freqs: {None, int, list, tuple, set}
validation frequency during training.
early_stopping_rounds: None or int
Will stop training if one metric doesn’t improve in last early_stopping_round rounds
metrics: None, or list
Indicate when executing evaluation during train process, which metrics will be used. If set as empty,
default metrics for specific task type will be used. As for binary classification, default metrics are
['auc', 'ks']
use_first_metric_only: bool, default: False
Indicate whether use the first metric only for early stopping judgement.
save_freq: int, default: 1
The callbacks save model every save_freq epoch
"""
def __init__(self, callbacks=None, validation_freqs=None, early_stopping_rounds=None,
metrics=None, use_first_metric_only=False, save_freq=1):
super(CallbackParam, self).__init__()
self.callbacks = callbacks or []
self.validation_freqs = validation_freqs
self.early_stopping_rounds = early_stopping_rounds
self.metrics = metrics or []
self.use_first_metric_only = use_first_metric_only
self.save_freq = save_freq
def check(self):
if self.early_stopping_rounds is None:
pass
elif isinstance(self.early_stopping_rounds, int):
if self.early_stopping_rounds < 1:
raise ValueError("early stopping rounds should be larger than 0 when it's integer")
if self.validation_freqs is None:
raise ValueError("validation freqs must be set when early stopping is enabled")
if self.validation_freqs is not None:
if type(self.validation_freqs).__name__ not in ["int", "list", "tuple", "set"]:
raise ValueError(
"validation strategy param's validate_freqs's type not supported ,"
" should be int or list or tuple or set"
)
if type(self.validation_freqs).__name__ == "int" and \
self.validation_freqs <= 0:
raise ValueError("validation strategy param's validate_freqs should greater than 0")
if self.metrics is not None and not isinstance(self.metrics, list):
raise ValueError("metrics should be a list")
if not isinstance(self.use_first_metric_only, bool):
raise ValueError("use_first_metric_only should be a boolean")
return True
__init__(self, callbacks=None, validation_freqs=None, early_stopping_rounds=None, metrics=None, use_first_metric_only=False, save_freq=1)
special
¶Source code in federatedml/param/callback_param.py
def __init__(self, callbacks=None, validation_freqs=None, early_stopping_rounds=None,
metrics=None, use_first_metric_only=False, save_freq=1):
super(CallbackParam, self).__init__()
self.callbacks = callbacks or []
self.validation_freqs = validation_freqs
self.early_stopping_rounds = early_stopping_rounds
self.metrics = metrics or []
self.use_first_metric_only = use_first_metric_only
self.save_freq = save_freq
check(self)
¶Source code in federatedml/param/callback_param.py
def check(self):
if self.early_stopping_rounds is None:
pass
elif isinstance(self.early_stopping_rounds, int):
if self.early_stopping_rounds < 1:
raise ValueError("early stopping rounds should be larger than 0 when it's integer")
if self.validation_freqs is None:
raise ValueError("validation freqs must be set when early stopping is enabled")
if self.validation_freqs is not None:
if type(self.validation_freqs).__name__ not in ["int", "list", "tuple", "set"]:
raise ValueError(
"validation strategy param's validate_freqs's type not supported ,"
" should be int or list or tuple or set"
)
if type(self.validation_freqs).__name__ == "int" and \
self.validation_freqs <= 0:
raise ValueError("validation strategy param's validate_freqs should greater than 0")
if self.metrics is not None and not isinstance(self.metrics, list):
raise ValueError("metrics should be a list")
if not isinstance(self.use_first_metric_only, bool):
raise ValueError("use_first_metric_only should be a boolean")
return True
column_expand_param
¶
Classes¶
ColumnExpandParam (BaseParam)
¶Define method used for expanding column
Parameters¶
append_header : None or str or List[str], default: None Name(s) for appended feature(s). If None is given, module outputs the original input value without any operation.
method : str, default: 'manual'
If method is 'manual', use user-specified fill_value
to fill in new features.
fill_value : int or float or str or List[int] or List[float] or List[str], default: 1e-8
Used for filling expanded feature columns. If given a list, length of the list must match that of append_header
bool, default: True
Indicate if this module needed to be run.
Source code in federatedml/param/column_expand_param.py
class ColumnExpandParam(BaseParam):
"""
Define method used for expanding column
Parameters
----------
append_header : None or str or List[str], default: None
Name(s) for appended feature(s). If None is given, module outputs the original input value without any operation.
method : str, default: 'manual'
If method is 'manual', use user-specified `fill_value` to fill in new features.
fill_value : int or float or str or List[int] or List[float] or List[str], default: 1e-8
Used for filling expanded feature columns. If given a list, length of the list must match that of `append_header`
need_run: bool, default: True
Indicate if this module needed to be run.
"""
def __init__(self, append_header=None, method="manual",
fill_value=consts.FLOAT_ZERO, need_run=True):
super(ColumnExpandParam, self).__init__()
self.append_header = [] if append_header is None else append_header
self.method = method
self.fill_value = fill_value
self.need_run = need_run
def check(self):
descr = "column_expand param's "
if not isinstance(self.method, str):
raise ValueError(f"{descr}method {self.method} not supported, should be str type")
else:
user_input = self.method.lower()
if user_input == "manual":
self.method = consts.MANUAL
else:
raise ValueError(f"{descr} method {user_input} not supported")
BaseParam.check_boolean(self.need_run, descr=descr)
if not isinstance(self.append_header, list):
raise ValueError(f"{descr} append_header must be None or list of str. "
f"Received {type(self.append_header)} instead.")
for feature_name in self.append_header:
BaseParam.check_string(feature_name, descr + "append_header values")
if isinstance(self.fill_value, list):
if len(self.append_header) != len(self.fill_value):
raise ValueError(
f"{descr} `fill value` is set to be list, "
f"and param `append_header` must also be list of the same length.")
else:
self.fill_value = [self.fill_value]
for value in self.fill_value:
if type(value).__name__ not in ["float", "int", "long", "str"]:
raise ValueError(
f"{descr} fill value(s) must be float, int, or str. Received type {type(value)} instead.")
LOGGER.debug("Finish column expand parameter check!")
return True
__init__(self, append_header=None, method='manual', fill_value=1e-08, need_run=True)
special
¶Source code in federatedml/param/column_expand_param.py
def __init__(self, append_header=None, method="manual",
fill_value=consts.FLOAT_ZERO, need_run=True):
super(ColumnExpandParam, self).__init__()
self.append_header = [] if append_header is None else append_header
self.method = method
self.fill_value = fill_value
self.need_run = need_run
check(self)
¶Source code in federatedml/param/column_expand_param.py
def check(self):
descr = "column_expand param's "
if not isinstance(self.method, str):
raise ValueError(f"{descr}method {self.method} not supported, should be str type")
else:
user_input = self.method.lower()
if user_input == "manual":
self.method = consts.MANUAL
else:
raise ValueError(f"{descr} method {user_input} not supported")
BaseParam.check_boolean(self.need_run, descr=descr)
if not isinstance(self.append_header, list):
raise ValueError(f"{descr} append_header must be None or list of str. "
f"Received {type(self.append_header)} instead.")
for feature_name in self.append_header:
BaseParam.check_string(feature_name, descr + "append_header values")
if isinstance(self.fill_value, list):
if len(self.append_header) != len(self.fill_value):
raise ValueError(
f"{descr} `fill value` is set to be list, "
f"and param `append_header` must also be list of the same length.")
else:
self.fill_value = [self.fill_value]
for value in self.fill_value:
if type(value).__name__ not in ["float", "int", "long", "str"]:
raise ValueError(
f"{descr} fill value(s) must be float, int, or str. Received type {type(value)} instead.")
LOGGER.debug("Finish column expand parameter check!")
return True
cross_validation_param
¶
Classes¶
CrossValidationParam (BaseParam)
¶Define cross validation params
Parameters¶
int, default: 5
Specify how many splits used in KFold
str, default: 'Hetero'
Indicate what mode is current task
{'Guest', 'Host', 'Arbiter'}, default: 'Guest'
Indicate what role is current party
bool, default: True
Define whether do shuffle before KFold or not.
int, default: 1
Specify the random seed for numpy shuffle
bool, default False
Indicate if this module needed to be run
bool, default True
Indicate whether to output table of ids used by each fold, else return original input data returned ids are formatted as: {original_id}#fold{fold_num}#{train/validate}
{'score', 'instance'}, default score
Indicate whether to include original instance or predict score in the output fold history, only effective when output_fold_history set to True
Source code in federatedml/param/cross_validation_param.py
class CrossValidationParam(BaseParam):
"""
Define cross validation params
Parameters
----------
n_splits: int, default: 5
Specify how many splits used in KFold
mode: str, default: 'Hetero'
Indicate what mode is current task
role: {'Guest', 'Host', 'Arbiter'}, default: 'Guest'
Indicate what role is current party
shuffle: bool, default: True
Define whether do shuffle before KFold or not.
random_seed: int, default: 1
Specify the random seed for numpy shuffle
need_cv: bool, default False
Indicate if this module needed to be run
output_fold_history: bool, default True
Indicate whether to output table of ids used by each fold, else return original input data
returned ids are formatted as: {original_id}#fold{fold_num}#{train/validate}
history_value_type: {'score', 'instance'}, default score
Indicate whether to include original instance or predict score in the output fold history,
only effective when output_fold_history set to True
"""
def __init__(self, n_splits=5, mode=consts.HETERO, role=consts.GUEST, shuffle=True, random_seed=1,
need_cv=False, output_fold_history=True, history_value_type="score"):
super(CrossValidationParam, self).__init__()
self.n_splits = n_splits
self.mode = mode
self.role = role
self.shuffle = shuffle
self.random_seed = random_seed
# self.evaluate_param = copy.deepcopy(evaluate_param)
self.need_cv = need_cv
self.output_fold_history = output_fold_history
self.history_value_type = history_value_type
def check(self):
model_param_descr = "cross validation param's "
self.check_positive_integer(self.n_splits, model_param_descr)
self.check_valid_value(self.mode, model_param_descr, valid_values=[consts.HOMO, consts.HETERO])
self.check_valid_value(self.role, model_param_descr, valid_values=[consts.HOST, consts.GUEST, consts.ARBITER])
self.check_boolean(self.shuffle, model_param_descr)
self.check_boolean(self.output_fold_history, model_param_descr)
self.history_value_type = self.check_and_change_lower(
self.history_value_type, ["instance", "score"], model_param_descr)
if self.random_seed is not None:
self.check_positive_integer(self.random_seed, model_param_descr)
__init__(self, n_splits=5, mode='hetero', role='guest', shuffle=True, random_seed=1, need_cv=False, output_fold_history=True, history_value_type='score')
special
¶Source code in federatedml/param/cross_validation_param.py
def __init__(self, n_splits=5, mode=consts.HETERO, role=consts.GUEST, shuffle=True, random_seed=1,
need_cv=False, output_fold_history=True, history_value_type="score"):
super(CrossValidationParam, self).__init__()
self.n_splits = n_splits
self.mode = mode
self.role = role
self.shuffle = shuffle
self.random_seed = random_seed
# self.evaluate_param = copy.deepcopy(evaluate_param)
self.need_cv = need_cv
self.output_fold_history = output_fold_history
self.history_value_type = history_value_type
check(self)
¶Source code in federatedml/param/cross_validation_param.py
def check(self):
model_param_descr = "cross validation param's "
self.check_positive_integer(self.n_splits, model_param_descr)
self.check_valid_value(self.mode, model_param_descr, valid_values=[consts.HOMO, consts.HETERO])
self.check_valid_value(self.role, model_param_descr, valid_values=[consts.HOST, consts.GUEST, consts.ARBITER])
self.check_boolean(self.shuffle, model_param_descr)
self.check_boolean(self.output_fold_history, model_param_descr)
self.history_value_type = self.check_and_change_lower(
self.history_value_type, ["instance", "score"], model_param_descr)
if self.random_seed is not None:
self.check_positive_integer(self.random_seed, model_param_descr)
data_split_param
¶
Classes¶
DataSplitParam (BaseParam)
¶Define data split param that used in data split.
Parameters¶
random_state : None or int, default: None Specify the random state for shuffle.
test_size : float or int or None, default: 0.0 Specify test data set size. float value specifies fraction of input data set, int value specifies exact number of data instances
train_size : float or int or None, default: 0.8 Specify train data set size. float value specifies fraction of input data set, int value specifies exact number of data instances
validate_size : float or int or None, default: 0.2 Specify validate data set size. float value specifies fraction of input data set, int value specifies exact number of data instances
stratified : bool, default: False Define whether sampling should be stratified, according to label value.
shuffle : bool, default: True Define whether do shuffle before splitting or not.
split_points : None or list, default : None Specify the point(s) by which continuous label values are bucketed into bins for stratified split. eg.[0.2] for two bins or [0.1, 1, 3] for 4 bins
bool, default: True
Specify whether to run data split
Source code in federatedml/param/data_split_param.py
class DataSplitParam(BaseParam):
"""
Define data split param that used in data split.
Parameters
----------
random_state : None or int, default: None
Specify the random state for shuffle.
test_size : float or int or None, default: 0.0
Specify test data set size.
float value specifies fraction of input data set, int value specifies exact number of data instances
train_size : float or int or None, default: 0.8
Specify train data set size.
float value specifies fraction of input data set, int value specifies exact number of data instances
validate_size : float or int or None, default: 0.2
Specify validate data set size.
float value specifies fraction of input data set, int value specifies exact number of data instances
stratified : bool, default: False
Define whether sampling should be stratified, according to label value.
shuffle : bool, default: True
Define whether do shuffle before splitting or not.
split_points : None or list, default : None
Specify the point(s) by which continuous label values are bucketed into bins for stratified split.
eg.[0.2] for two bins or [0.1, 1, 3] for 4 bins
need_run: bool, default: True
Specify whether to run data split
"""
def __init__(self, random_state=None, test_size=None, train_size=None, validate_size=None, stratified=False,
shuffle=True, split_points=None, need_run=True):
super(DataSplitParam, self).__init__()
self.random_state = random_state
self.test_size = test_size
self.train_size = train_size
self.validate_size = validate_size
self.stratified = stratified
self.shuffle = shuffle
self.split_points = split_points
self.need_run = need_run
def check(self):
model_param_descr = "data split param's "
if self.random_state is not None:
if not isinstance(self.random_state, int):
raise ValueError(f"{model_param_descr} random state should be int type")
BaseParam.check_nonnegative_number(self.random_state, f"{model_param_descr} random_state ")
if self.test_size is not None:
BaseParam.check_nonnegative_number(self.test_size, f"{model_param_descr} test_size ")
if isinstance(self.test_size, float):
BaseParam.check_decimal_float(self.test_size, f"{model_param_descr} test_size ")
if self.train_size is not None:
BaseParam.check_nonnegative_number(self.train_size, f"{model_param_descr} train_size ")
if isinstance(self.train_size, float):
BaseParam.check_decimal_float(self.train_size, f"{model_param_descr} train_size ")
if self.validate_size is not None:
BaseParam.check_nonnegative_number(self.validate_size, f"{model_param_descr} validate_size ")
if isinstance(self.validate_size, float):
BaseParam.check_decimal_float(self.validate_size, f"{model_param_descr} validate_size ")
# use default size values if none given
if self.test_size is None and self.train_size is None and self.validate_size is None:
self.test_size = 0.0
self.train_size = 0.8
self.validate_size = 0.2
BaseParam.check_boolean(self.stratified, f"{model_param_descr} stratified ")
BaseParam.check_boolean(self.shuffle, f"{model_param_descr} shuffle ")
BaseParam.check_boolean(self.need_run, f"{model_param_descr} need run ")
if self.split_points is not None:
if not isinstance(self.split_points, list):
raise ValueError(f"{model_param_descr} split_points should be list type")
LOGGER.debug("Finish data_split parameter check!")
return True
__init__(self, random_state=None, test_size=None, train_size=None, validate_size=None, stratified=False, shuffle=True, split_points=None, need_run=True)
special
¶Source code in federatedml/param/data_split_param.py
def __init__(self, random_state=None, test_size=None, train_size=None, validate_size=None, stratified=False,
shuffle=True, split_points=None, need_run=True):
super(DataSplitParam, self).__init__()
self.random_state = random_state
self.test_size = test_size
self.train_size = train_size
self.validate_size = validate_size
self.stratified = stratified
self.shuffle = shuffle
self.split_points = split_points
self.need_run = need_run
check(self)
¶Source code in federatedml/param/data_split_param.py
def check(self):
model_param_descr = "data split param's "
if self.random_state is not None:
if not isinstance(self.random_state, int):
raise ValueError(f"{model_param_descr} random state should be int type")
BaseParam.check_nonnegative_number(self.random_state, f"{model_param_descr} random_state ")
if self.test_size is not None:
BaseParam.check_nonnegative_number(self.test_size, f"{model_param_descr} test_size ")
if isinstance(self.test_size, float):
BaseParam.check_decimal_float(self.test_size, f"{model_param_descr} test_size ")
if self.train_size is not None:
BaseParam.check_nonnegative_number(self.train_size, f"{model_param_descr} train_size ")
if isinstance(self.train_size, float):
BaseParam.check_decimal_float(self.train_size, f"{model_param_descr} train_size ")
if self.validate_size is not None:
BaseParam.check_nonnegative_number(self.validate_size, f"{model_param_descr} validate_size ")
if isinstance(self.validate_size, float):
BaseParam.check_decimal_float(self.validate_size, f"{model_param_descr} validate_size ")
# use default size values if none given
if self.test_size is None and self.train_size is None and self.validate_size is None:
self.test_size = 0.0
self.train_size = 0.8
self.validate_size = 0.2
BaseParam.check_boolean(self.stratified, f"{model_param_descr} stratified ")
BaseParam.check_boolean(self.shuffle, f"{model_param_descr} shuffle ")
BaseParam.check_boolean(self.need_run, f"{model_param_descr} need run ")
if self.split_points is not None:
if not isinstance(self.split_points, list):
raise ValueError(f"{model_param_descr} split_points should be list type")
LOGGER.debug("Finish data_split parameter check!")
return True
data_transform_param
¶
Classes¶
DataTransformParam (BaseParam)
¶Define data transform parameters that used in federated ml.
Parameters¶
input_format : {'dense', 'sparse', 'tag'} please have a look at this tutorial at "DataTransform" section of federatedml/util/README.md. Formally, dense input format data should be set to "dense", svm-light input format data should be set to "sparse", tag or tag:value input format data should be set to "tag".
delimitor : str the delimitor of data input, default: ','
data_type : int {'float64','float','int','int64','str','long'} the data type of data input
exclusive_data_type : dict the key of dict is col_name, the value is data_type, use to specified special data type of some features.
bool
use if input_format is 'tag', if tag_with_value is True, input column data format should be tag[delimitor]value, otherwise is tag only
str
use if input_format is 'tag' and 'tag_with_value' is True, delimitor of tag[delimitor]value column value.
missing_fill : bool need to fill missing value or not, accepted only True/False, default: False
default_value : None or object or list the value to replace missing value. if None, it will use default value define in federatedml/feature/imputer.py, if single object, will fill missing value with this object, if list, it's length should be the sample of input data' feature dimension, means that if some column happens to have missing values, it will replace it the value by element in the identical position of this list.
None or str
the method to replace missing value, should be one of [None, 'min', 'max', 'mean', 'designated']
None or list
element of list can be any type, or auto generated if value is None, define which values to be consider as missing
bool
need to replace outlier value or not, accepted only True/False, default: True
None or str
the method to replace missing value, should be one of [None, 'min', 'max', 'mean', 'designated']
None or list
element of list can be any type, which values should be regard as missing value
None or object or list
the value to replace outlier. if None, it will use default value define in federatedml/feature/imputer.py, if single object, will replace outlier with this object, if list, it's length should be the sample of input data' feature dimension, means that if some column happens to have outliers, it will replace it the value by element in the identical position of this list.
with_label : bool True if input data consist of label, False otherwise. default: 'false'
label_name : str column_name of the column where label locates, only use in dense-inputformat. default: 'y'
label_type : {'int','int64','float','float64','long','str'} use when with_label is True
output_format : {'dense', 'sparse'} output format
bool
True if dataset has match_id, default: False
Source code in federatedml/param/data_transform_param.py
class DataTransformParam(BaseParam):
"""
Define data transform parameters that used in federated ml.
Parameters
----------
input_format : {'dense', 'sparse', 'tag'}
please have a look at this tutorial at "DataTransform" section of federatedml/util/README.md.
Formally,
dense input format data should be set to "dense",
svm-light input format data should be set to "sparse",
tag or tag:value input format data should be set to "tag".
delimitor : str
the delimitor of data input, default: ','
data_type : int
{'float64','float','int','int64','str','long'}
the data type of data input
exclusive_data_type : dict
the key of dict is col_name, the value is data_type, use to specified special data type
of some features.
tag_with_value: bool
use if input_format is 'tag', if tag_with_value is True,
input column data format should be tag[delimitor]value, otherwise is tag only
tag_value_delimitor: str
use if input_format is 'tag' and 'tag_with_value' is True,
delimitor of tag[delimitor]value column value.
missing_fill : bool
need to fill missing value or not, accepted only True/False, default: False
default_value : None or object or list
the value to replace missing value.
if None, it will use default value define in federatedml/feature/imputer.py,
if single object, will fill missing value with this object,
if list, it's length should be the sample of input data' feature dimension,
means that if some column happens to have missing values, it will replace it
the value by element in the identical position of this list.
missing_fill_method: None or str
the method to replace missing value, should be one of [None, 'min', 'max', 'mean', 'designated']
missing_impute: None or list
element of list can be any type, or auto generated if value is None, define which values to be consider as missing
outlier_replace: bool
need to replace outlier value or not, accepted only True/False, default: True
outlier_replace_method: None or str
the method to replace missing value, should be one of [None, 'min', 'max', 'mean', 'designated']
outlier_impute: None or list
element of list can be any type, which values should be regard as missing value
outlier_replace_value: None or object or list
the value to replace outlier.
if None, it will use default value define in federatedml/feature/imputer.py,
if single object, will replace outlier with this object,
if list, it's length should be the sample of input data' feature dimension,
means that if some column happens to have outliers, it will replace it
the value by element in the identical position of this list.
with_label : bool
True if input data consist of label, False otherwise. default: 'false'
label_name : str
column_name of the column where label locates, only use in dense-inputformat. default: 'y'
label_type : {'int','int64','float','float64','long','str'}
use when with_label is True
output_format : {'dense', 'sparse'}
output format
with_match_id: bool
True if dataset has match_id, default: False
"""
def __init__(self, input_format="dense", delimitor=',', data_type='float64',
exclusive_data_type=None,
tag_with_value=False, tag_value_delimitor=":",
missing_fill=False, default_value=0, missing_fill_method=None,
missing_impute=None, outlier_replace=False, outlier_replace_method=None,
outlier_impute=None, outlier_replace_value=0,
with_label=False, label_name='y',
label_type='int', output_format='dense', need_run=True,
with_match_id=False):
self.input_format = input_format
self.delimitor = delimitor
self.data_type = data_type
self.exclusive_data_type = exclusive_data_type
self.tag_with_value = tag_with_value
self.tag_value_delimitor = tag_value_delimitor
self.missing_fill = missing_fill
self.default_value = default_value
self.missing_fill_method = missing_fill_method
self.missing_impute = missing_impute
self.outlier_replace = outlier_replace
self.outlier_replace_method = outlier_replace_method
self.outlier_impute = outlier_impute
self.outlier_replace_value = outlier_replace_value
self.with_label = with_label
self.label_name = label_name
self.label_type = label_type
self.output_format = output_format
self.need_run = need_run
self.with_match_id = with_match_id
def check(self):
descr = "data_transform param's"
self.input_format = self.check_and_change_lower(self.input_format,
["dense", "sparse", "tag"],
descr)
self.output_format = self.check_and_change_lower(self.output_format,
["dense", "sparse"],
descr)
self.data_type = self.check_and_change_lower(self.data_type,
["int", "int64", "float", "float64", "str", "long"],
descr)
if type(self.missing_fill).__name__ != 'bool':
raise ValueError("data_transform param's missing_fill {} not supported".format(self.missing_fill))
if self.missing_fill_method is not None:
self.missing_fill_method = self.check_and_change_lower(self.missing_fill_method,
['min', 'max', 'mean', 'designated'],
descr)
if self.outlier_replace_method is not None:
self.outlier_replace_method = self.check_and_change_lower(self.outlier_replace_method,
['min', 'max', 'mean', 'designated'],
descr)
if type(self.with_label).__name__ != 'bool':
raise ValueError("data_transform param's with_label {} not supported".format(self.with_label))
if self.with_label:
if not isinstance(self.label_name, str):
raise ValueError("data transform param's label_name {} should be str".format(self.label_name))
self.label_type = self.check_and_change_lower(self.label_type,
["int", "int64", "float", "float64", "str", "long"],
descr)
if self.exclusive_data_type is not None and not isinstance(self.exclusive_data_type, dict):
raise ValueError("exclusive_data_type is should be None or a dict")
if not isinstance(self.with_match_id, bool):
raise ValueError("with_match_id should be boolean variable, but {} find".format(self.with_match_id))
return True
__init__(self, input_format='dense', delimitor=',', data_type='float64', exclusive_data_type=None, tag_with_value=False, tag_value_delimitor=':', missing_fill=False, default_value=0, missing_fill_method=None, missing_impute=None, outlier_replace=False, outlier_replace_method=None, outlier_impute=None, outlier_replace_value=0, with_label=False, label_name='y', label_type='int', output_format='dense', need_run=True, with_match_id=False)
special
¶Source code in federatedml/param/data_transform_param.py
def __init__(self, input_format="dense", delimitor=',', data_type='float64',
exclusive_data_type=None,
tag_with_value=False, tag_value_delimitor=":",
missing_fill=False, default_value=0, missing_fill_method=None,
missing_impute=None, outlier_replace=False, outlier_replace_method=None,
outlier_impute=None, outlier_replace_value=0,
with_label=False, label_name='y',
label_type='int', output_format='dense', need_run=True,
with_match_id=False):
self.input_format = input_format
self.delimitor = delimitor
self.data_type = data_type
self.exclusive_data_type = exclusive_data_type
self.tag_with_value = tag_with_value
self.tag_value_delimitor = tag_value_delimitor
self.missing_fill = missing_fill
self.default_value = default_value
self.missing_fill_method = missing_fill_method
self.missing_impute = missing_impute
self.outlier_replace = outlier_replace
self.outlier_replace_method = outlier_replace_method
self.outlier_impute = outlier_impute
self.outlier_replace_value = outlier_replace_value
self.with_label = with_label
self.label_name = label_name
self.label_type = label_type
self.output_format = output_format
self.need_run = need_run
self.with_match_id = with_match_id
check(self)
¶Source code in federatedml/param/data_transform_param.py
def check(self):
descr = "data_transform param's"
self.input_format = self.check_and_change_lower(self.input_format,
["dense", "sparse", "tag"],
descr)
self.output_format = self.check_and_change_lower(self.output_format,
["dense", "sparse"],
descr)
self.data_type = self.check_and_change_lower(self.data_type,
["int", "int64", "float", "float64", "str", "long"],
descr)
if type(self.missing_fill).__name__ != 'bool':
raise ValueError("data_transform param's missing_fill {} not supported".format(self.missing_fill))
if self.missing_fill_method is not None:
self.missing_fill_method = self.check_and_change_lower(self.missing_fill_method,
['min', 'max', 'mean', 'designated'],
descr)
if self.outlier_replace_method is not None:
self.outlier_replace_method = self.check_and_change_lower(self.outlier_replace_method,
['min', 'max', 'mean', 'designated'],
descr)
if type(self.with_label).__name__ != 'bool':
raise ValueError("data_transform param's with_label {} not supported".format(self.with_label))
if self.with_label:
if not isinstance(self.label_name, str):
raise ValueError("data transform param's label_name {} should be str".format(self.label_name))
self.label_type = self.check_and_change_lower(self.label_type,
["int", "int64", "float", "float64", "str", "long"],
descr)
if self.exclusive_data_type is not None and not isinstance(self.exclusive_data_type, dict):
raise ValueError("exclusive_data_type is should be None or a dict")
if not isinstance(self.with_match_id, bool):
raise ValueError("with_match_id should be boolean variable, but {} find".format(self.with_match_id))
return True
dataio_param
¶
Classes¶
DataIOParam (BaseParam)
¶Define dataio parameters that used in federated ml.
Parameters¶
input_format : {'dense', 'sparse', 'tag'} please have a look at this tutorial at "DataIO" section of federatedml/util/README.md. Formally, dense input format data should be set to "dense", svm-light input format data should be set to "sparse", tag or tag:value input format data should be set to "tag".
delimitor : str the delimitor of data input, default: ','
data_type : {'float64', 'float', 'int', 'int64', 'str', 'long'} the data type of data input
exclusive_data_type : dict the key of dict is col_name, the value is data_type, use to specified special data type of some features.
bool
use if input_format is 'tag', if tag_with_value is True, input column data format should be tag[delimitor]value, otherwise is tag only
str
use if input_format is 'tag' and 'tag_with_value' is True, delimitor of tag[delimitor]value column value.
missing_fill : bool need to fill missing value or not, accepted only True/False, default: False
default_value : None or object or list the value to replace missing value. if None, it will use default value define in federatedml/feature/imputer.py, if single object, will fill missing value with this object, if list, it's length should be the sample of input data' feature dimension, means that if some column happens to have missing values, it will replace it the value by element in the identical position of this list.
missing_fill_method : {None, 'min', 'max', 'mean', 'designated'} the method to replace missing value
None or list
element of list can be any type, or auto generated if value is None, define which values to be consider as missing
bool
need to replace outlier value or not, accepted only True/False, default: True
outlier_replace_method : {None, 'min', 'max', 'mean', 'designated'} the method to replace missing value
None or list
element of list can be any type, which values should be regard as missing value, default: None
outlier_replace_value : None or object or list the value to replace outlier. if None, it will use default value define in federatedml/feature/imputer.py, if single object, will replace outlier with this object, if list, it's length should be the sample of input data' feature dimension, means that if some column happens to have outliers, it will replace it the value by element in the identical position of this list.
with_label : bool True if input data consist of label, False otherwise. default: 'false'
label_name : str column_name of the column where label locates, only use in dense-inputformat. default: 'y'
label_type : {'int', 'int64', 'float', 'float64', 'long', 'str'} use when with_label is True.
output_format : {'dense', 'sparse'} output format
Source code in federatedml/param/dataio_param.py
class DataIOParam(BaseParam):
"""
Define dataio parameters that used in federated ml.
Parameters
----------
input_format : {'dense', 'sparse', 'tag'}
please have a look at this tutorial at "DataIO" section of federatedml/util/README.md.
Formally,
dense input format data should be set to "dense",
svm-light input format data should be set to "sparse",
tag or tag:value input format data should be set to "tag".
delimitor : str
the delimitor of data input, default: ','
data_type : {'float64', 'float', 'int', 'int64', 'str', 'long'}
the data type of data input
exclusive_data_type : dict
the key of dict is col_name, the value is data_type, use to specified special data type
of some features.
tag_with_value: bool
use if input_format is 'tag', if tag_with_value is True,
input column data format should be tag[delimitor]value, otherwise is tag only
tag_value_delimitor: str
use if input_format is 'tag' and 'tag_with_value' is True,
delimitor of tag[delimitor]value column value.
missing_fill : bool
need to fill missing value or not, accepted only True/False, default: False
default_value : None or object or list
the value to replace missing value.
if None, it will use default value define in federatedml/feature/imputer.py,
if single object, will fill missing value with this object,
if list, it's length should be the sample of input data' feature dimension,
means that if some column happens to have missing values, it will replace it
the value by element in the identical position of this list.
missing_fill_method : {None, 'min', 'max', 'mean', 'designated'}
the method to replace missing value
missing_impute: None or list
element of list can be any type, or auto generated if value is None, define which values to be consider as missing
outlier_replace: bool
need to replace outlier value or not, accepted only True/False, default: True
outlier_replace_method : {None, 'min', 'max', 'mean', 'designated'}
the method to replace missing value
outlier_impute: None or list
element of list can be any type, which values should be regard as missing value, default: None
outlier_replace_value : None or object or list
the value to replace outlier.
if None, it will use default value define in federatedml/feature/imputer.py,
if single object, will replace outlier with this object,
if list, it's length should be the sample of input data' feature dimension,
means that if some column happens to have outliers, it will replace it
the value by element in the identical position of this list.
with_label : bool
True if input data consist of label, False otherwise. default: 'false'
label_name : str
column_name of the column where label locates, only use in dense-inputformat. default: 'y'
label_type : {'int', 'int64', 'float', 'float64', 'long', 'str'}
use when with_label is True.
output_format : {'dense', 'sparse'}
output format
"""
def __init__(self, input_format="dense", delimitor=',', data_type='float64',
exclusive_data_type=None,
tag_with_value=False, tag_value_delimitor=":",
missing_fill=False, default_value=0, missing_fill_method=None,
missing_impute=None, outlier_replace=False, outlier_replace_method=None,
outlier_impute=None, outlier_replace_value=0,
with_label=False, label_name='y',
label_type='int', output_format='dense', need_run=True):
self.input_format = input_format
self.delimitor = delimitor
self.data_type = data_type
self.exclusive_data_type = exclusive_data_type
self.tag_with_value = tag_with_value
self.tag_value_delimitor = tag_value_delimitor
self.missing_fill = missing_fill
self.default_value = default_value
self.missing_fill_method = missing_fill_method
self.missing_impute = missing_impute
self.outlier_replace = outlier_replace
self.outlier_replace_method = outlier_replace_method
self.outlier_impute = outlier_impute
self.outlier_replace_value = outlier_replace_value
self.with_label = with_label
self.label_name = label_name
self.label_type = label_type
self.output_format = output_format
self.need_run = need_run
def check(self):
descr = "dataio param's"
self.input_format = self.check_and_change_lower(self.input_format,
["dense", "sparse", "tag"],
descr)
self.output_format = self.check_and_change_lower(self.output_format,
["dense", "sparse"],
descr)
self.data_type = self.check_and_change_lower(self.data_type,
["int", "int64", "float", "float64", "str", "long"],
descr)
if type(self.missing_fill).__name__ != 'bool':
raise ValueError("dataio param's missing_fill {} not supported".format(self.missing_fill))
if self.missing_fill_method is not None:
self.missing_fill_method = self.check_and_change_lower(self.missing_fill_method,
['min', 'max', 'mean', 'designated'],
descr)
if self.outlier_replace_method is not None:
self.outlier_replace_method = self.check_and_change_lower(self.outlier_replace_method,
['min', 'max', 'mean', 'designated'],
descr)
if type(self.with_label).__name__ != 'bool':
raise ValueError("dataio param's with_label {} not supported".format(self.with_label))
if self.with_label:
if not isinstance(self.label_name, str):
raise ValueError("dataio param's label_name {} should be str".format(self.label_name))
self.label_type = self.check_and_change_lower(self.label_type,
["int", "int64", "float", "float64", "str", "long"],
descr)
if self.exclusive_data_type is not None and not isinstance(self.exclusive_data_type, dict):
raise ValueError("exclusive_data_type is should be None or a dict")
return True
__init__(self, input_format='dense', delimitor=',', data_type='float64', exclusive_data_type=None, tag_with_value=False, tag_value_delimitor=':', missing_fill=False, default_value=0, missing_fill_method=None, missing_impute=None, outlier_replace=False, outlier_replace_method=None, outlier_impute=None, outlier_replace_value=0, with_label=False, label_name='y', label_type='int', output_format='dense', need_run=True)
special
¶Source code in federatedml/param/dataio_param.py
def __init__(self, input_format="dense", delimitor=',', data_type='float64',
exclusive_data_type=None,
tag_with_value=False, tag_value_delimitor=":",
missing_fill=False, default_value=0, missing_fill_method=None,
missing_impute=None, outlier_replace=False, outlier_replace_method=None,
outlier_impute=None, outlier_replace_value=0,
with_label=False, label_name='y',
label_type='int', output_format='dense', need_run=True):
self.input_format = input_format
self.delimitor = delimitor
self.data_type = data_type
self.exclusive_data_type = exclusive_data_type
self.tag_with_value = tag_with_value
self.tag_value_delimitor = tag_value_delimitor
self.missing_fill = missing_fill
self.default_value = default_value
self.missing_fill_method = missing_fill_method
self.missing_impute = missing_impute
self.outlier_replace = outlier_replace
self.outlier_replace_method = outlier_replace_method
self.outlier_impute = outlier_impute
self.outlier_replace_value = outlier_replace_value
self.with_label = with_label
self.label_name = label_name
self.label_type = label_type
self.output_format = output_format
self.need_run = need_run
check(self)
¶Source code in federatedml/param/dataio_param.py
def check(self):
descr = "dataio param's"
self.input_format = self.check_and_change_lower(self.input_format,
["dense", "sparse", "tag"],
descr)
self.output_format = self.check_and_change_lower(self.output_format,
["dense", "sparse"],
descr)
self.data_type = self.check_and_change_lower(self.data_type,
["int", "int64", "float", "float64", "str", "long"],
descr)
if type(self.missing_fill).__name__ != 'bool':
raise ValueError("dataio param's missing_fill {} not supported".format(self.missing_fill))
if self.missing_fill_method is not None:
self.missing_fill_method = self.check_and_change_lower(self.missing_fill_method,
['min', 'max', 'mean', 'designated'],
descr)
if self.outlier_replace_method is not None:
self.outlier_replace_method = self.check_and_change_lower(self.outlier_replace_method,
['min', 'max', 'mean', 'designated'],
descr)
if type(self.with_label).__name__ != 'bool':
raise ValueError("dataio param's with_label {} not supported".format(self.with_label))
if self.with_label:
if not isinstance(self.label_name, str):
raise ValueError("dataio param's label_name {} should be str".format(self.label_name))
self.label_type = self.check_and_change_lower(self.label_type,
["int", "int64", "float", "float64", "str", "long"],
descr)
if self.exclusive_data_type is not None and not isinstance(self.exclusive_data_type, dict):
raise ValueError("exclusive_data_type is should be None or a dict")
return True
encrypt_param
¶
Classes¶
EncryptParam (BaseParam)
¶Define encryption method that used in federated ml.
Parameters¶
method : {'Paillier'} If method is 'Paillier', Paillier encryption will be used for federated ml. To use non-encryption version in HomoLR, set this to None. For detail of Paillier encryption, please check out the paper mentioned in README file.
key_length : int, default: 1024 Used to specify the length of key in this encryption method.
Source code in federatedml/param/encrypt_param.py
class EncryptParam(BaseParam):
"""
Define encryption method that used in federated ml.
Parameters
----------
method : {'Paillier'}
If method is 'Paillier', Paillier encryption will be used for federated ml.
To use non-encryption version in HomoLR, set this to None.
For detail of Paillier encryption, please check out the paper mentioned in README file.
key_length : int, default: 1024
Used to specify the length of key in this encryption method.
"""
def __init__(self, method=consts.PAILLIER, key_length=1024):
super(EncryptParam, self).__init__()
self.method = method
self.key_length = key_length
def check(self):
if self.method is not None and type(self.method).__name__ != "str":
raise ValueError(
"encrypt_param's method {} not supported, should be str type".format(
self.method))
elif self.method is None:
pass
else:
user_input = self.method.lower()
if user_input == "paillier":
self.method = consts.PAILLIER
elif user_input == consts.ITERATIVEAFFINE.lower() or user_input == consts.RANDOM_ITERATIVEAFFINE:
LOGGER.warning('Iterative Affine and Random Iterative Affine are not supported in version>=1.7.1 '
'due to safety concerns, encrypt method will be reset to Paillier')
self.method = consts.PAILLIER
else:
raise ValueError(
"encrypt_param's method {} not supported".format(user_input))
if type(self.key_length).__name__ != "int":
raise ValueError(
"encrypt_param's key_length {} not supported, should be int type".format(self.key_length))
elif self.key_length <= 0:
raise ValueError(
"encrypt_param's key_length must be greater or equal to 1")
LOGGER.debug("Finish encrypt parameter check!")
return True
__init__(self, method='Paillier', key_length=1024)
special
¶Source code in federatedml/param/encrypt_param.py
def __init__(self, method=consts.PAILLIER, key_length=1024):
super(EncryptParam, self).__init__()
self.method = method
self.key_length = key_length
check(self)
¶Source code in federatedml/param/encrypt_param.py
def check(self):
if self.method is not None and type(self.method).__name__ != "str":
raise ValueError(
"encrypt_param's method {} not supported, should be str type".format(
self.method))
elif self.method is None:
pass
else:
user_input = self.method.lower()
if user_input == "paillier":
self.method = consts.PAILLIER
elif user_input == consts.ITERATIVEAFFINE.lower() or user_input == consts.RANDOM_ITERATIVEAFFINE:
LOGGER.warning('Iterative Affine and Random Iterative Affine are not supported in version>=1.7.1 '
'due to safety concerns, encrypt method will be reset to Paillier')
self.method = consts.PAILLIER
else:
raise ValueError(
"encrypt_param's method {} not supported".format(user_input))
if type(self.key_length).__name__ != "int":
raise ValueError(
"encrypt_param's key_length {} not supported, should be int type".format(self.key_length))
elif self.key_length <= 0:
raise ValueError(
"encrypt_param's key_length must be greater or equal to 1")
LOGGER.debug("Finish encrypt parameter check!")
return True
encrypted_mode_calculation_param
¶
Classes¶
EncryptedModeCalculatorParam (BaseParam)
¶Define the encrypted_mode_calulator parameters.
Parameters¶
{'strict', 'fast', 'balance', 'confusion_opt'}
encrypted mode, default: strict
float or int
numeric number in [0, 1], use when mode equals to 'balance', default: 1
Source code in federatedml/param/encrypted_mode_calculation_param.py
class EncryptedModeCalculatorParam(BaseParam):
"""
Define the encrypted_mode_calulator parameters.
Parameters
----------
mode: {'strict', 'fast', 'balance', 'confusion_opt'}
encrypted mode, default: strict
re_encrypted_rate: float or int
numeric number in [0, 1], use when mode equals to 'balance', default: 1
"""
def __init__(self, mode="strict", re_encrypted_rate=1):
self.mode = mode
self.re_encrypted_rate = re_encrypted_rate
def check(self):
descr = "encrypted_mode_calculator param"
self.mode = self.check_and_change_lower(self.mode,
["strict", "fast", "balance", "confusion_opt", "confusion_opt_balance"],
descr)
if self.mode != "strict":
LOGGER.warning("encrypted_mode_calculator will be remove in later version, "
"but in current version user can still use it, but it only supports strict mode, "
"other mode will be reset to strict for compatibility")
self.mode = "strict"
return True
__init__(self, mode='strict', re_encrypted_rate=1)
special
¶Source code in federatedml/param/encrypted_mode_calculation_param.py
def __init__(self, mode="strict", re_encrypted_rate=1):
self.mode = mode
self.re_encrypted_rate = re_encrypted_rate
check(self)
¶Source code in federatedml/param/encrypted_mode_calculation_param.py
def check(self):
descr = "encrypted_mode_calculator param"
self.mode = self.check_and_change_lower(self.mode,
["strict", "fast", "balance", "confusion_opt", "confusion_opt_balance"],
descr)
if self.mode != "strict":
LOGGER.warning("encrypted_mode_calculator will be remove in later version, "
"but in current version user can still use it, but it only supports strict mode, "
"other mode will be reset to strict for compatibility")
self.mode = "strict"
return True
evaluation_param
¶
Classes¶
EvaluateParam (BaseParam)
¶Define the evaluation method of binary/multiple classification and regression
Parameters¶
eval_type : {'binary', 'regression', 'multi'} support 'binary' for HomoLR, HeteroLR and Secureboosting, support 'regression' for Secureboosting, 'multi' is not support these version
unfold_multi_result : bool unfold multi result and get several one-vs-rest binary classification results
pos_label : int or float or str specify positive label type, depend on the data's label. this parameter effective only for 'binary'
bool, default True
Indicate if this module needed to be run
Source code in federatedml/param/evaluation_param.py
class EvaluateParam(BaseParam):
"""
Define the evaluation method of binary/multiple classification and regression
Parameters
----------
eval_type : {'binary', 'regression', 'multi'}
support 'binary' for HomoLR, HeteroLR and Secureboosting,
support 'regression' for Secureboosting,
'multi' is not support these version
unfold_multi_result : bool
unfold multi result and get several one-vs-rest binary classification results
pos_label : int or float or str
specify positive label type, depend on the data's label. this parameter effective only for 'binary'
need_run: bool, default True
Indicate if this module needed to be run
"""
def __init__(self, eval_type="binary", pos_label=1, need_run=True, metrics=None,
run_clustering_arbiter_metric=False, unfold_multi_result=False):
super().__init__()
self.eval_type = eval_type
self.pos_label = pos_label
self.need_run = need_run
self.metrics = metrics
self.unfold_multi_result = unfold_multi_result
self.run_clustering_arbiter_metric = run_clustering_arbiter_metric
self.default_metrics = {
consts.BINARY: consts.ALL_BINARY_METRICS,
consts.MULTY: consts.ALL_MULTI_METRICS,
consts.REGRESSION: consts.ALL_REGRESSION_METRICS,
consts.CLUSTERING: consts.ALL_CLUSTER_METRICS
}
self.allowed_metrics = {
consts.BINARY: consts.ALL_BINARY_METRICS,
consts.MULTY: consts.ALL_MULTI_METRICS,
consts.REGRESSION: consts.ALL_REGRESSION_METRICS,
consts.CLUSTERING: consts.ALL_CLUSTER_METRICS
}
def _use_single_value_default_metrics(self):
self.default_metrics = {
consts.BINARY: consts.DEFAULT_BINARY_METRIC,
consts.MULTY: consts.DEFAULT_MULTI_METRIC,
consts.REGRESSION: consts.DEFAULT_REGRESSION_METRIC,
consts.CLUSTERING: consts.DEFAULT_CLUSTER_METRIC
}
def _check_valid_metric(self, metrics_list):
metric_list = consts.ALL_METRIC_NAME
alias_name: dict = consts.ALIAS
full_name_list = []
metrics_list = [str.lower(i) for i in metrics_list]
for metric in metrics_list:
if metric in metric_list:
if metric not in full_name_list:
full_name_list.append(metric)
continue
valid_flag = False
for alias, full_name in alias_name.items():
if metric in alias:
if full_name not in full_name_list:
full_name_list.append(full_name)
valid_flag = True
break
if not valid_flag:
raise ValueError('metric {} is not supported'.format(metric))
allowed_metrics = self.allowed_metrics[self.eval_type]
for m in full_name_list:
if m not in allowed_metrics:
raise ValueError('metric {} is not used for {} task'.format(m, self.eval_type))
if consts.RECALL in full_name_list and consts.PRECISION not in full_name_list:
full_name_list.append(consts.PRECISION)
if consts.RECALL not in full_name_list and consts.PRECISION in full_name_list:
full_name_list.append(consts.RECALL)
return full_name_list
def check(self):
descr = "evaluate param's "
self.eval_type = self.check_and_change_lower(self.eval_type,
[consts.BINARY, consts.MULTY, consts.REGRESSION,
consts.CLUSTERING],
descr)
if type(self.pos_label).__name__ not in ["str", "float", "int"]:
raise ValueError(
"evaluate param's pos_label {} not supported, should be str or float or int type".format(
self.pos_label))
if type(self.need_run).__name__ != "bool":
raise ValueError(
"evaluate param's need_run {} not supported, should be bool".format(
self.need_run))
if self.metrics is None or len(self.metrics) == 0:
self.metrics = self.default_metrics[self.eval_type]
LOGGER.warning('use default metric {} for eval type {}'.format(self.metrics, self.eval_type))
self.check_boolean(self.unfold_multi_result, 'multi_result_unfold')
self.metrics = self._check_valid_metric(self.metrics)
LOGGER.info("Finish evaluation parameter check!")
return True
def check_single_value_default_metric(self):
self._use_single_value_default_metrics()
# in validation strategy, psi f1-score and confusion-mat pr-quantile are not supported in cur version
if self.metrics is None or len(self.metrics) == 0:
self.metrics = self.default_metrics[self.eval_type]
LOGGER.warning('use default metric {} for eval type {}'.format(self.metrics, self.eval_type))
ban_metric = [consts.PSI, consts.F1_SCORE, consts.CONFUSION_MAT, consts.QUANTILE_PR]
for metric in self.metrics:
if metric in ban_metric:
self.metrics.remove(metric)
self.check()
__init__(self, eval_type='binary', pos_label=1, need_run=True, metrics=None, run_clustering_arbiter_metric=False, unfold_multi_result=False)
special
¶Source code in federatedml/param/evaluation_param.py
def __init__(self, eval_type="binary", pos_label=1, need_run=True, metrics=None,
run_clustering_arbiter_metric=False, unfold_multi_result=False):
super().__init__()
self.eval_type = eval_type
self.pos_label = pos_label
self.need_run = need_run
self.metrics = metrics
self.unfold_multi_result = unfold_multi_result
self.run_clustering_arbiter_metric = run_clustering_arbiter_metric
self.default_metrics = {
consts.BINARY: consts.ALL_BINARY_METRICS,
consts.MULTY: consts.ALL_MULTI_METRICS,
consts.REGRESSION: consts.ALL_REGRESSION_METRICS,
consts.CLUSTERING: consts.ALL_CLUSTER_METRICS
}
self.allowed_metrics = {
consts.BINARY: consts.ALL_BINARY_METRICS,
consts.MULTY: consts.ALL_MULTI_METRICS,
consts.REGRESSION: consts.ALL_REGRESSION_METRICS,
consts.CLUSTERING: consts.ALL_CLUSTER_METRICS
}
check(self)
¶Source code in federatedml/param/evaluation_param.py
def check(self):
descr = "evaluate param's "
self.eval_type = self.check_and_change_lower(self.eval_type,
[consts.BINARY, consts.MULTY, consts.REGRESSION,
consts.CLUSTERING],
descr)
if type(self.pos_label).__name__ not in ["str", "float", "int"]:
raise ValueError(
"evaluate param's pos_label {} not supported, should be str or float or int type".format(
self.pos_label))
if type(self.need_run).__name__ != "bool":
raise ValueError(
"evaluate param's need_run {} not supported, should be bool".format(
self.need_run))
if self.metrics is None or len(self.metrics) == 0:
self.metrics = self.default_metrics[self.eval_type]
LOGGER.warning('use default metric {} for eval type {}'.format(self.metrics, self.eval_type))
self.check_boolean(self.unfold_multi_result, 'multi_result_unfold')
self.metrics = self._check_valid_metric(self.metrics)
LOGGER.info("Finish evaluation parameter check!")
return True
check_single_value_default_metric(self)
¶Source code in federatedml/param/evaluation_param.py
def check_single_value_default_metric(self):
self._use_single_value_default_metrics()
# in validation strategy, psi f1-score and confusion-mat pr-quantile are not supported in cur version
if self.metrics is None or len(self.metrics) == 0:
self.metrics = self.default_metrics[self.eval_type]
LOGGER.warning('use default metric {} for eval type {}'.format(self.metrics, self.eval_type))
ban_metric = [consts.PSI, consts.F1_SCORE, consts.CONFUSION_MAT, consts.QUANTILE_PR]
for metric in self.metrics:
if metric in ban_metric:
self.metrics.remove(metric)
self.check()
feature_binning_param
¶
Classes¶
TransformParam (BaseParam)
¶Define how to transfer the cols
Parameters¶
transform_cols : list of column index, default: -1 Specify which columns need to be transform. If column index is None, None of columns will be transformed. If it is -1, it will use same columns as cols in binning module.
list of string, default: []
Specify which columns need to calculated. Each element in the list represent for a column name in header.
{'bin_num', 'woe', None}
Specify which value these columns going to replace. 1. bin_num: Transfer original feature value to bin index in which this value belongs to. 2. woe: This is valid for guest party only. It will replace original value to its woe value 3. None: nothing will be replaced.
Source code in federatedml/param/feature_binning_param.py
class TransformParam(BaseParam):
"""
Define how to transfer the cols
Parameters
----------
transform_cols : list of column index, default: -1
Specify which columns need to be transform. If column index is None, None of columns will be transformed.
If it is -1, it will use same columns as cols in binning module.
transform_names: list of string, default: []
Specify which columns need to calculated. Each element in the list represent for a column name in header.
transform_type: {'bin_num', 'woe', None}
Specify which value these columns going to replace.
1. bin_num: Transfer original feature value to bin index in which this value belongs to.
2. woe: This is valid for guest party only. It will replace original value to its woe value
3. None: nothing will be replaced.
"""
def __init__(self, transform_cols=-1, transform_names=None, transform_type="bin_num"):
super(TransformParam, self).__init__()
self.transform_cols = transform_cols
self.transform_names = transform_names
self.transform_type = transform_type
def check(self):
descr = "Transform Param's "
if self.transform_cols is not None and self.transform_cols != -1:
self.check_defined_type(self.transform_cols, descr, ['list'])
self.check_defined_type(self.transform_names, descr, ['list', "NoneType"])
if self.transform_names is not None:
for name in self.transform_names:
if not isinstance(name, str):
raise ValueError("Elements in transform_names should be string type")
self.check_valid_value(self.transform_type, descr, ['bin_num', 'woe', None])
__init__(self, transform_cols=-1, transform_names=None, transform_type='bin_num')
special
¶Source code in federatedml/param/feature_binning_param.py
def __init__(self, transform_cols=-1, transform_names=None, transform_type="bin_num"):
super(TransformParam, self).__init__()
self.transform_cols = transform_cols
self.transform_names = transform_names
self.transform_type = transform_type
check(self)
¶Source code in federatedml/param/feature_binning_param.py
def check(self):
descr = "Transform Param's "
if self.transform_cols is not None and self.transform_cols != -1:
self.check_defined_type(self.transform_cols, descr, ['list'])
self.check_defined_type(self.transform_names, descr, ['list', "NoneType"])
if self.transform_names is not None:
for name in self.transform_names:
if not isinstance(name, str):
raise ValueError("Elements in transform_names should be string type")
self.check_valid_value(self.transform_type, descr, ['bin_num', 'woe', None])
OptimalBinningParam (BaseParam)
¶Indicate optimal binning params
Parameters¶
str, default: "iv"
The algorithm metric method. Support iv, gini, ks, chi-square
float, default: 0.05
The minimum percentage of each bucket
float, default: 1.0
The maximum percentage of each bucket
int, default 100
Number of bins when initialize
bool, default: True
Whether each bucket need event and non-event records
str default: quantile
Init bucket methods. Accept quantile and bucket.
Source code in federatedml/param/feature_binning_param.py
class OptimalBinningParam(BaseParam):
"""
Indicate optimal binning params
Parameters
----------
metric_method: str, default: "iv"
The algorithm metric method. Support iv, gini, ks, chi-square
min_bin_pct: float, default: 0.05
The minimum percentage of each bucket
max_bin_pct: float, default: 1.0
The maximum percentage of each bucket
init_bin_nums: int, default 100
Number of bins when initialize
mixture: bool, default: True
Whether each bucket need event and non-event records
init_bucket_method: str default: quantile
Init bucket methods. Accept quantile and bucket.
"""
def __init__(self, metric_method='iv', min_bin_pct=0.05, max_bin_pct=1.0,
init_bin_nums=1000, mixture=True, init_bucket_method='quantile'):
super().__init__()
self.init_bucket_method = init_bucket_method
self.metric_method = metric_method
self.max_bin = None
self.mixture = mixture
self.max_bin_pct = max_bin_pct
self.min_bin_pct = min_bin_pct
self.init_bin_nums = init_bin_nums
self.adjustment_factor = None
def check(self):
descr = "hetero binning's optimal binning param's"
self.check_string(self.metric_method, descr)
self.metric_method = self.metric_method.lower()
if self.metric_method in ['chi_square', 'chi-square']:
self.metric_method = 'chi_square'
self.check_valid_value(self.metric_method, descr, ['iv', 'gini', 'chi_square', 'ks'])
self.check_positive_integer(self.init_bin_nums, descr)
self.init_bucket_method = self.init_bucket_method.lower()
self.check_valid_value(self.init_bucket_method, descr, ['quantile', 'bucket'])
if self.max_bin_pct not in [1, 0]:
self.check_decimal_float(self.max_bin_pct, descr)
if self.min_bin_pct not in [1, 0]:
self.check_decimal_float(self.min_bin_pct, descr)
if self.min_bin_pct > self.max_bin_pct:
raise ValueError("Optimal binning's min_bin_pct should less or equal than max_bin_pct")
self.check_boolean(self.mixture, descr)
self.check_positive_integer(self.init_bin_nums, descr)
__init__(self, metric_method='iv', min_bin_pct=0.05, max_bin_pct=1.0, init_bin_nums=1000, mixture=True, init_bucket_method='quantile')
special
¶Source code in federatedml/param/feature_binning_param.py
def __init__(self, metric_method='iv', min_bin_pct=0.05, max_bin_pct=1.0,
init_bin_nums=1000, mixture=True, init_bucket_method='quantile'):
super().__init__()
self.init_bucket_method = init_bucket_method
self.metric_method = metric_method
self.max_bin = None
self.mixture = mixture
self.max_bin_pct = max_bin_pct
self.min_bin_pct = min_bin_pct
self.init_bin_nums = init_bin_nums
self.adjustment_factor = None
check(self)
¶Source code in federatedml/param/feature_binning_param.py
def check(self):
descr = "hetero binning's optimal binning param's"
self.check_string(self.metric_method, descr)
self.metric_method = self.metric_method.lower()
if self.metric_method in ['chi_square', 'chi-square']:
self.metric_method = 'chi_square'
self.check_valid_value(self.metric_method, descr, ['iv', 'gini', 'chi_square', 'ks'])
self.check_positive_integer(self.init_bin_nums, descr)
self.init_bucket_method = self.init_bucket_method.lower()
self.check_valid_value(self.init_bucket_method, descr, ['quantile', 'bucket'])
if self.max_bin_pct not in [1, 0]:
self.check_decimal_float(self.max_bin_pct, descr)
if self.min_bin_pct not in [1, 0]:
self.check_decimal_float(self.min_bin_pct, descr)
if self.min_bin_pct > self.max_bin_pct:
raise ValueError("Optimal binning's min_bin_pct should less or equal than max_bin_pct")
self.check_boolean(self.mixture, descr)
self.check_positive_integer(self.init_bin_nums, descr)
FeatureBinningParam (BaseParam)
¶Define the feature binning method
Parameters¶
method : str, 'quantile', 'bucket' or 'optimal', default: 'quantile' Binning method.
int, default: 10000
When the number of saved summaries exceed this threshold, it will call its compress function
int, default: 10000
The buffer size to store inserted observations. When head list reach this buffer size, the QuantileSummaries object start to generate summary(or stats) and insert into its sampled list.
float, 0 <= error < 1 default: 0.001
The error of tolerance of binning. The final split point comes from original data, and the rank of this value is close to the exact rank. More precisely, floor((p - 2 * error) * N) <= rank(x) <= ceil((p + 2 * error) * N) where p is the quantile in float, and N is total number of data.
int, bin_num > 0, default: 10
The max bin number for binning
bin_indexes : list of int or int, default: -1 Specify which columns need to be binned. -1 represent for all columns. If you need to indicate specific cols, provide a list of header index instead of -1.
bin_names : list of string, default: [] Specify which columns need to calculated. Each element in the list represent for a column name in header.
adjustment_factor : float, default: 0.5 the adjustment factor when calculating WOE. This is useful when there is no event or non-event in a bin. Please note that this parameter will NOT take effect for setting in host.
category_indexes : list of int or int, default: [] Specify which columns are category features. -1 represent for all columns. List of int indicate a set of such features. For category features, bin_obj will take its original values as split_points and treat them as have been binned. If this is not what you expect, please do NOT put it into this parameters.
The number of categories should not exceed bin_num set above.
category_names : list of string, default: [] Use column names to specify category features. Each element in the list represent for a column name in header.
local_only : bool, default: False Whether just provide binning method to guest party. If true, host party will do nothing. Warnings: This parameter will be deprecated in future version.
TransformParam
Define how to transfer the binned data.
bool, default True
Indicate if this module needed to be run
bool, default False
If true, binning will not calculate iv, woe etc. In this case, optimal-binning will not be supported.
Source code in federatedml/param/feature_binning_param.py
class FeatureBinningParam(BaseParam):
"""
Define the feature binning method
Parameters
----------
method : str, 'quantile', 'bucket' or 'optimal', default: 'quantile'
Binning method.
compress_thres: int, default: 10000
When the number of saved summaries exceed this threshold, it will call its compress function
head_size: int, default: 10000
The buffer size to store inserted observations. When head list reach this buffer size, the
QuantileSummaries object start to generate summary(or stats) and insert into its sampled list.
error: float, 0 <= error < 1 default: 0.001
The error of tolerance of binning. The final split point comes from original data, and the rank
of this value is close to the exact rank. More precisely,
floor((p - 2 * error) * N) <= rank(x) <= ceil((p + 2 * error) * N)
where p is the quantile in float, and N is total number of data.
bin_num: int, bin_num > 0, default: 10
The max bin number for binning
bin_indexes : list of int or int, default: -1
Specify which columns need to be binned. -1 represent for all columns. If you need to indicate specific
cols, provide a list of header index instead of -1.
bin_names : list of string, default: []
Specify which columns need to calculated. Each element in the list represent for a column name in header.
adjustment_factor : float, default: 0.5
the adjustment factor when calculating WOE. This is useful when there is no event or non-event in
a bin. Please note that this parameter will NOT take effect for setting in host.
category_indexes : list of int or int, default: []
Specify which columns are category features. -1 represent for all columns. List of int indicate a set of
such features. For category features, bin_obj will take its original values as split_points and treat them
as have been binned. If this is not what you expect, please do NOT put it into this parameters.
The number of categories should not exceed bin_num set above.
category_names : list of string, default: []
Use column names to specify category features. Each element in the list represent for a column name in header.
local_only : bool, default: False
Whether just provide binning method to guest party. If true, host party will do nothing.
Warnings: This parameter will be deprecated in future version.
transform_param: TransformParam
Define how to transfer the binned data.
need_run: bool, default True
Indicate if this module needed to be run
skip_static: bool, default False
If true, binning will not calculate iv, woe etc. In this case, optimal-binning
will not be supported.
"""
def __init__(self, method=consts.QUANTILE,
compress_thres=consts.DEFAULT_COMPRESS_THRESHOLD,
head_size=consts.DEFAULT_HEAD_SIZE,
error=consts.DEFAULT_RELATIVE_ERROR,
bin_num=consts.G_BIN_NUM, bin_indexes=-1, bin_names=None, adjustment_factor=0.5,
transform_param=TransformParam(),
local_only=False,
category_indexes=None, category_names=None,
need_run=True, skip_static=False):
super(FeatureBinningParam, self).__init__()
self.method = method
self.compress_thres = compress_thres
self.head_size = head_size
self.error = error
self.adjustment_factor = adjustment_factor
self.bin_num = bin_num
self.bin_indexes = bin_indexes
self.bin_names = bin_names
self.category_indexes = category_indexes
self.category_names = category_names
self.transform_param = copy.deepcopy(transform_param)
self.need_run = need_run
self.skip_static = skip_static
self.local_only = local_only
def check(self):
descr = "Binning param's"
self.check_string(self.method, descr)
self.method = self.method.lower()
self.check_positive_integer(self.compress_thres, descr)
self.check_positive_integer(self.head_size, descr)
self.check_decimal_float(self.error, descr)
self.check_positive_integer(self.bin_num, descr)
if self.bin_indexes != -1:
self.check_defined_type(self.bin_indexes, descr, ['list', 'RepeatedScalarContainer', "NoneType"])
self.check_defined_type(self.bin_names, descr, ['list', "NoneType"])
self.check_defined_type(self.category_indexes, descr, ['list', "NoneType"])
self.check_defined_type(self.category_names, descr, ['list', "NoneType"])
self.check_open_unit_interval(self.adjustment_factor, descr)
self.check_boolean(self.local_only, descr)
__init__(self, method='quantile', compress_thres=10000, head_size=10000, error=0.0001, bin_num=10, bin_indexes=-1, bin_names=None, adjustment_factor=0.5, transform_param=<federatedml.param.feature_binning_param.TransformParam object at 0x7f3a40c2c590>, local_only=False, category_indexes=None, category_names=None, need_run=True, skip_static=False)
special
¶Source code in federatedml/param/feature_binning_param.py
def __init__(self, method=consts.QUANTILE,
compress_thres=consts.DEFAULT_COMPRESS_THRESHOLD,
head_size=consts.DEFAULT_HEAD_SIZE,
error=consts.DEFAULT_RELATIVE_ERROR,
bin_num=consts.G_BIN_NUM, bin_indexes=-1, bin_names=None, adjustment_factor=0.5,
transform_param=TransformParam(),
local_only=False,
category_indexes=None, category_names=None,
need_run=True, skip_static=False):
super(FeatureBinningParam, self).__init__()
self.method = method
self.compress_thres = compress_thres
self.head_size = head_size
self.error = error
self.adjustment_factor = adjustment_factor
self.bin_num = bin_num
self.bin_indexes = bin_indexes
self.bin_names = bin_names
self.category_indexes = category_indexes
self.category_names = category_names
self.transform_param = copy.deepcopy(transform_param)
self.need_run = need_run
self.skip_static = skip_static
self.local_only = local_only
check(self)
¶Source code in federatedml/param/feature_binning_param.py
def check(self):
descr = "Binning param's"
self.check_string(self.method, descr)
self.method = self.method.lower()
self.check_positive_integer(self.compress_thres, descr)
self.check_positive_integer(self.head_size, descr)
self.check_decimal_float(self.error, descr)
self.check_positive_integer(self.bin_num, descr)
if self.bin_indexes != -1:
self.check_defined_type(self.bin_indexes, descr, ['list', 'RepeatedScalarContainer', "NoneType"])
self.check_defined_type(self.bin_names, descr, ['list', "NoneType"])
self.check_defined_type(self.category_indexes, descr, ['list', "NoneType"])
self.check_defined_type(self.category_names, descr, ['list', "NoneType"])
self.check_open_unit_interval(self.adjustment_factor, descr)
self.check_boolean(self.local_only, descr)
HeteroFeatureBinningParam (FeatureBinningParam)
¶Source code in federatedml/param/feature_binning_param.py
class HeteroFeatureBinningParam(FeatureBinningParam):
def __init__(self, method=consts.QUANTILE,
compress_thres=consts.DEFAULT_COMPRESS_THRESHOLD,
head_size=consts.DEFAULT_HEAD_SIZE,
error=consts.DEFAULT_RELATIVE_ERROR,
bin_num=consts.G_BIN_NUM, bin_indexes=-1, bin_names=None, adjustment_factor=0.5,
transform_param=TransformParam(), optimal_binning_param=OptimalBinningParam(),
local_only=False, category_indexes=None, category_names=None,
encrypt_param=EncryptParam(),
need_run=True, skip_static=False):
super(HeteroFeatureBinningParam, self).__init__(method=method, compress_thres=compress_thres,
head_size=head_size, error=error,
bin_num=bin_num, bin_indexes=bin_indexes,
bin_names=bin_names, adjustment_factor=adjustment_factor,
transform_param=transform_param,
category_indexes=category_indexes,
category_names=category_names,
need_run=need_run, local_only=local_only,
skip_static=skip_static)
self.optimal_binning_param = copy.deepcopy(optimal_binning_param)
self.encrypt_param = encrypt_param
def check(self):
descr = "Hetero Binning param's"
super(HeteroFeatureBinningParam, self).check()
self.check_valid_value(self.method, descr, [consts.QUANTILE, consts.BUCKET, consts.OPTIMAL])
self.optimal_binning_param.check()
self.encrypt_param.check()
if self.encrypt_param.method != consts.PAILLIER:
raise ValueError("Feature Binning support Paillier encrypt method only.")
if self.skip_static and self.method == consts.OPTIMAL:
raise ValueError("When skip_static, optimal binning is not supported.")
self.transform_param.check()
if self.skip_static and self.transform_param.transform_type == 'woe':
raise ValueError("To use woe transform, skip_static should set as False")
__init__(self, method='quantile', compress_thres=10000, head_size=10000, error=0.0001, bin_num=10, bin_indexes=-1, bin_names=None, adjustment_factor=0.5, transform_param=<federatedml.param.feature_binning_param.TransformParam object at 0x7f3a40c2ca10>, optimal_binning_param=<federatedml.param.feature_binning_param.OptimalBinningParam object at 0x7f3a40c2ca50>, local_only=False, category_indexes=None, category_names=None, encrypt_param=<federatedml.param.encrypt_param.EncryptParam object at 0x7f3a40c2cb90>, need_run=True, skip_static=False)
special
¶Source code in federatedml/param/feature_binning_param.py
def __init__(self, method=consts.QUANTILE,
compress_thres=consts.DEFAULT_COMPRESS_THRESHOLD,
head_size=consts.DEFAULT_HEAD_SIZE,
error=consts.DEFAULT_RELATIVE_ERROR,
bin_num=consts.G_BIN_NUM, bin_indexes=-1, bin_names=None, adjustment_factor=0.5,
transform_param=TransformParam(), optimal_binning_param=OptimalBinningParam(),
local_only=False, category_indexes=None, category_names=None,
encrypt_param=EncryptParam(),
need_run=True, skip_static=False):
super(HeteroFeatureBinningParam, self).__init__(method=method, compress_thres=compress_thres,
head_size=head_size, error=error,
bin_num=bin_num, bin_indexes=bin_indexes,
bin_names=bin_names, adjustment_factor=adjustment_factor,
transform_param=transform_param,
category_indexes=category_indexes,
category_names=category_names,
need_run=need_run, local_only=local_only,
skip_static=skip_static)
self.optimal_binning_param = copy.deepcopy(optimal_binning_param)
self.encrypt_param = encrypt_param
check(self)
¶Source code in federatedml/param/feature_binning_param.py
def check(self):
descr = "Hetero Binning param's"
super(HeteroFeatureBinningParam, self).check()
self.check_valid_value(self.method, descr, [consts.QUANTILE, consts.BUCKET, consts.OPTIMAL])
self.optimal_binning_param.check()
self.encrypt_param.check()
if self.encrypt_param.method != consts.PAILLIER:
raise ValueError("Feature Binning support Paillier encrypt method only.")
if self.skip_static and self.method == consts.OPTIMAL:
raise ValueError("When skip_static, optimal binning is not supported.")
self.transform_param.check()
if self.skip_static and self.transform_param.transform_type == 'woe':
raise ValueError("To use woe transform, skip_static should set as False")
HomoFeatureBinningParam (FeatureBinningParam)
¶Source code in federatedml/param/feature_binning_param.py
class HomoFeatureBinningParam(FeatureBinningParam):
def __init__(self, method=consts.VIRTUAL_SUMMARY,
compress_thres=consts.DEFAULT_COMPRESS_THRESHOLD,
head_size=consts.DEFAULT_HEAD_SIZE,
error=consts.DEFAULT_RELATIVE_ERROR,
sample_bins=100,
bin_num=consts.G_BIN_NUM, bin_indexes=-1, bin_names=None, adjustment_factor=0.5,
transform_param=TransformParam(),
category_indexes=None, category_names=None,
need_run=True, skip_static=False, max_iter=100):
super(HomoFeatureBinningParam, self).__init__(method=method, compress_thres=compress_thres,
head_size=head_size, error=error,
bin_num=bin_num, bin_indexes=bin_indexes,
bin_names=bin_names, adjustment_factor=adjustment_factor,
transform_param=transform_param,
category_indexes=category_indexes, category_names=category_names,
need_run=need_run,
skip_static=skip_static)
self.sample_bins = sample_bins
self.max_iter = max_iter
def check(self):
descr = "homo binning param's"
super(HomoFeatureBinningParam, self).check()
self.check_string(self.method, descr)
self.method = self.method.lower()
self.check_valid_value(self.method, descr, [consts.VIRTUAL_SUMMARY, consts.RECURSIVE_QUERY])
self.check_positive_integer(self.max_iter, descr)
if self.max_iter > 100:
raise ValueError("Max iter is not allowed exceed 100")
__init__(self, method='virtual_summary', compress_thres=10000, head_size=10000, error=0.0001, sample_bins=100, bin_num=10, bin_indexes=-1, bin_names=None, adjustment_factor=0.5, transform_param=<federatedml.param.feature_binning_param.TransformParam object at 0x7f3a40c2cb10>, category_indexes=None, category_names=None, need_run=True, skip_static=False, max_iter=100)
special
¶Source code in federatedml/param/feature_binning_param.py
def __init__(self, method=consts.VIRTUAL_SUMMARY,
compress_thres=consts.DEFAULT_COMPRESS_THRESHOLD,
head_size=consts.DEFAULT_HEAD_SIZE,
error=consts.DEFAULT_RELATIVE_ERROR,
sample_bins=100,
bin_num=consts.G_BIN_NUM, bin_indexes=-1, bin_names=None, adjustment_factor=0.5,
transform_param=TransformParam(),
category_indexes=None, category_names=None,
need_run=True, skip_static=False, max_iter=100):
super(HomoFeatureBinningParam, self).__init__(method=method, compress_thres=compress_thres,
head_size=head_size, error=error,
bin_num=bin_num, bin_indexes=bin_indexes,
bin_names=bin_names, adjustment_factor=adjustment_factor,
transform_param=transform_param,
category_indexes=category_indexes, category_names=category_names,
need_run=need_run,
skip_static=skip_static)
self.sample_bins = sample_bins
self.max_iter = max_iter
check(self)
¶Source code in federatedml/param/feature_binning_param.py
def check(self):
descr = "homo binning param's"
super(HomoFeatureBinningParam, self).check()
self.check_string(self.method, descr)
self.method = self.method.lower()
self.check_valid_value(self.method, descr, [consts.VIRTUAL_SUMMARY, consts.RECURSIVE_QUERY])
self.check_positive_integer(self.max_iter, descr)
if self.max_iter > 100:
raise ValueError("Max iter is not allowed exceed 100")
feature_imputation_param
¶
Classes¶
FeatureImputationParam (BaseParam)
¶Define feature imputation parameters
Parameters¶
default_value : None or single object type or list the value to replace missing value. if None, it will use default value defined in federatedml/feature/imputer.py, if single object, will fill missing value with this object, if list, it's length should be the same as input data' feature dimension, means that if some column happens to have missing values, it will replace it the value by element in the identical position of this list.
missing_fill_method : [None, 'min', 'max', 'mean', 'designated'] the method to replace missing value
None or dict of (column name, missing_fill_method) pairs
specifies method to replace missing value for each column; any column not specified will take missing_fill_method, if missing_fill_method is None, unspecified column will not be imputed;
missing_impute : None or list element of list can be any type, or auto generated if value is None, define which values to be consider as missing, default: None
bool, default True
need run or not
Source code in federatedml/param/feature_imputation_param.py
class FeatureImputationParam(BaseParam):
"""
Define feature imputation parameters
Parameters
----------
default_value : None or single object type or list
the value to replace missing value.
if None, it will use default value defined in federatedml/feature/imputer.py,
if single object, will fill missing value with this object,
if list, it's length should be the same as input data' feature dimension,
means that if some column happens to have missing values, it will replace it
the value by element in the identical position of this list.
missing_fill_method : [None, 'min', 'max', 'mean', 'designated']
the method to replace missing value
col_missing_fill_method: None or dict of (column name, missing_fill_method) pairs
specifies method to replace missing value for each column;
any column not specified will take missing_fill_method,
if missing_fill_method is None, unspecified column will not be imputed;
missing_impute : None or list
element of list can be any type, or auto generated if value is None, define which values to be consider as missing, default: None
need_run: bool, default True
need run or not
"""
def __init__(self, default_value=0, missing_fill_method=None, col_missing_fill_method=None,
missing_impute=None, need_run=True):
self.default_value = default_value
self.missing_fill_method = missing_fill_method
self.col_missing_fill_method = col_missing_fill_method
self.missing_impute = missing_impute
self.need_run = need_run
def check(self):
descr = "feature imputation param's "
self.check_boolean(self.need_run, descr + "need_run")
if self.missing_fill_method is not None:
self.missing_fill_method = self.check_and_change_lower(self.missing_fill_method,
['min', 'max', 'mean', 'designated'],
f"{descr}missing_fill_method ")
if self.col_missing_fill_method:
if not isinstance(self.col_missing_fill_method, dict):
raise ValueError(f"{descr}col_missing_fill_method should be a dict")
for k, v in self.col_missing_fill_method.items():
if not isinstance(k, str):
raise ValueError(f"{descr}col_missing_fill_method should contain str key(s) only")
v = self.check_and_change_lower(v,
['min', 'max', 'mean', 'designated'],
f"per column method specified in {descr} col_missing_fill_method dict")
self.col_missing_fill_method[k] = v
if self.missing_impute:
if not isinstance(self.missing_impute, list):
raise ValueError(f"{descr}missing_impute must be None or list.")
return True
__init__(self, default_value=0, missing_fill_method=None, col_missing_fill_method=None, missing_impute=None, need_run=True)
special
¶Source code in federatedml/param/feature_imputation_param.py
def __init__(self, default_value=0, missing_fill_method=None, col_missing_fill_method=None,
missing_impute=None, need_run=True):
self.default_value = default_value
self.missing_fill_method = missing_fill_method
self.col_missing_fill_method = col_missing_fill_method
self.missing_impute = missing_impute
self.need_run = need_run
check(self)
¶Source code in federatedml/param/feature_imputation_param.py
def check(self):
descr = "feature imputation param's "
self.check_boolean(self.need_run, descr + "need_run")
if self.missing_fill_method is not None:
self.missing_fill_method = self.check_and_change_lower(self.missing_fill_method,
['min', 'max', 'mean', 'designated'],
f"{descr}missing_fill_method ")
if self.col_missing_fill_method:
if not isinstance(self.col_missing_fill_method, dict):
raise ValueError(f"{descr}col_missing_fill_method should be a dict")
for k, v in self.col_missing_fill_method.items():
if not isinstance(k, str):
raise ValueError(f"{descr}col_missing_fill_method should contain str key(s) only")
v = self.check_and_change_lower(v,
['min', 'max', 'mean', 'designated'],
f"per column method specified in {descr} col_missing_fill_method dict")
self.col_missing_fill_method[k] = v
if self.missing_impute:
if not isinstance(self.missing_impute, list):
raise ValueError(f"{descr}missing_impute must be None or list.")
return True
feature_selection_param
¶
deprecated_param_list
¶Classes¶
UniqueValueParam (BaseParam)
¶Use the difference between max-value and min-value to judge.
Parameters¶
eps : float, default: 1e-5 The column(s) will be filtered if its difference is smaller than eps.
Source code in federatedml/param/feature_selection_param.py
class UniqueValueParam(BaseParam):
"""
Use the difference between max-value and min-value to judge.
Parameters
----------
eps : float, default: 1e-5
The column(s) will be filtered if its difference is smaller than eps.
"""
def __init__(self, eps=1e-5):
self.eps = eps
def check(self):
descr = "Unique value param's"
self.check_positive_number(self.eps, descr)
return True
__init__(self, eps=1e-05)
special
¶Source code in federatedml/param/feature_selection_param.py
def __init__(self, eps=1e-5):
self.eps = eps
check(self)
¶Source code in federatedml/param/feature_selection_param.py
def check(self):
descr = "Unique value param's"
self.check_positive_number(self.eps, descr)
return True
IVValueSelectionParam (BaseParam)
¶Use information values to select features.
Parameters¶
float, default: 1.0
Used if iv_value_thres method is used in feature selection.
List of float or None, default: None
Set threshold for different host. If None, use same threshold as guest. If provided, the order should map with the host id setting.
Source code in federatedml/param/feature_selection_param.py
class IVValueSelectionParam(BaseParam):
"""
Use information values to select features.
Parameters
----------
value_threshold: float, default: 1.0
Used if iv_value_thres method is used in feature selection.
host_thresholds: List of float or None, default: None
Set threshold for different host. If None, use same threshold as guest. If provided, the order should map with
the host id setting.
"""
def __init__(self, value_threshold=0.0, host_thresholds=None, local_only=False):
super().__init__()
self.value_threshold = value_threshold
self.host_thresholds = host_thresholds
self.local_only = local_only
def check(self):
if not isinstance(self.value_threshold, (float, int)):
raise ValueError("IV selection param's value_threshold should be float or int")
if self.host_thresholds is not None:
if not isinstance(self.host_thresholds, list):
raise ValueError("IV selection param's host_threshold should be list or None")
if not isinstance(self.local_only, bool):
raise ValueError("IV selection param's local_only should be bool")
return True
__init__(self, value_threshold=0.0, host_thresholds=None, local_only=False)
special
¶Source code in federatedml/param/feature_selection_param.py
def __init__(self, value_threshold=0.0, host_thresholds=None, local_only=False):
super().__init__()
self.value_threshold = value_threshold
self.host_thresholds = host_thresholds
self.local_only = local_only
check(self)
¶Source code in federatedml/param/feature_selection_param.py
def check(self):
if not isinstance(self.value_threshold, (float, int)):
raise ValueError("IV selection param's value_threshold should be float or int")
if self.host_thresholds is not None:
if not isinstance(self.host_thresholds, list):
raise ValueError("IV selection param's host_threshold should be list or None")
if not isinstance(self.local_only, bool):
raise ValueError("IV selection param's local_only should be bool")
return True
IVPercentileSelectionParam (BaseParam)
¶Use information values to select features.
Parameters¶
float
0 <= percentile_threshold <= 1.0, default: 1.0, Percentile threshold for iv_percentile method
Source code in federatedml/param/feature_selection_param.py
class IVPercentileSelectionParam(BaseParam):
"""
Use information values to select features.
Parameters
----------
percentile_threshold: float
0 <= percentile_threshold <= 1.0, default: 1.0, Percentile threshold for iv_percentile method
"""
def __init__(self, percentile_threshold=1.0, local_only=False):
super().__init__()
self.percentile_threshold = percentile_threshold
self.local_only = local_only
def check(self):
descr = "IV selection param's"
if self.percentile_threshold != 0 or self.percentile_threshold != 1:
self.check_decimal_float(self.percentile_threshold, descr)
self.check_boolean(self.local_only, descr)
return True
__init__(self, percentile_threshold=1.0, local_only=False)
special
¶Source code in federatedml/param/feature_selection_param.py
def __init__(self, percentile_threshold=1.0, local_only=False):
super().__init__()
self.percentile_threshold = percentile_threshold
self.local_only = local_only
check(self)
¶Source code in federatedml/param/feature_selection_param.py
def check(self):
descr = "IV selection param's"
if self.percentile_threshold != 0 or self.percentile_threshold != 1:
self.check_decimal_float(self.percentile_threshold, descr)
self.check_boolean(self.local_only, descr)
return True
IVTopKParam (BaseParam)
¶Use information values to select features.
Parameters¶
int
should be greater than 0, default: 10, Percentile threshold for iv_percentile method
Source code in federatedml/param/feature_selection_param.py
class IVTopKParam(BaseParam):
"""
Use information values to select features.
Parameters
----------
k: int
should be greater than 0, default: 10, Percentile threshold for iv_percentile method
"""
def __init__(self, k=10, local_only=False):
super().__init__()
self.k = k
self.local_only = local_only
def check(self):
descr = "IV selection param's"
self.check_positive_integer(self.k, descr)
self.check_boolean(self.local_only, descr)
return True
__init__(self, k=10, local_only=False)
special
¶Source code in federatedml/param/feature_selection_param.py
def __init__(self, k=10, local_only=False):
super().__init__()
self.k = k
self.local_only = local_only
check(self)
¶Source code in federatedml/param/feature_selection_param.py
def check(self):
descr = "IV selection param's"
self.check_positive_integer(self.k, descr)
self.check_boolean(self.local_only, descr)
return True
VarianceOfCoeSelectionParam (BaseParam)
¶Use coefficient of variation to select features. When judging, the absolute value will be used.
Parameters¶
float, default: 1.0
Used if coefficient_of_variation_value_thres method is used in feature selection. Filter those columns who has smaller coefficient of variance than the threshold.
Source code in federatedml/param/feature_selection_param.py
class VarianceOfCoeSelectionParam(BaseParam):
"""
Use coefficient of variation to select features. When judging, the absolute value will be used.
Parameters
----------
value_threshold: float, default: 1.0
Used if coefficient_of_variation_value_thres method is used in feature selection. Filter those
columns who has smaller coefficient of variance than the threshold.
"""
def __init__(self, value_threshold=1.0):
self.value_threshold = value_threshold
def check(self):
descr = "Coff of Variances param's"
self.check_positive_number(self.value_threshold, descr)
return True
__init__(self, value_threshold=1.0)
special
¶Source code in federatedml/param/feature_selection_param.py
def __init__(self, value_threshold=1.0):
self.value_threshold = value_threshold
check(self)
¶Source code in federatedml/param/feature_selection_param.py
def check(self):
descr = "Coff of Variances param's"
self.check_positive_number(self.value_threshold, descr)
return True
OutlierColsSelectionParam (BaseParam)
¶Given percentile and threshold. Judge if this quantile point is larger than threshold. Filter those larger ones.
Parameters¶
float, [0., 1.] default: 1.0
The percentile points to compare.
float, default: 1.0
Percentile threshold for coefficient_of_variation_percentile method
Source code in federatedml/param/feature_selection_param.py
class OutlierColsSelectionParam(BaseParam):
"""
Given percentile and threshold. Judge if this quantile point is larger than threshold. Filter those larger ones.
Parameters
----------
percentile: float, [0., 1.] default: 1.0
The percentile points to compare.
upper_threshold: float, default: 1.0
Percentile threshold for coefficient_of_variation_percentile method
"""
def __init__(self, percentile=1.0, upper_threshold=1.0):
self.percentile = percentile
self.upper_threshold = upper_threshold
def check(self):
descr = "Outlier Filter param's"
self.check_decimal_float(self.percentile, descr)
self.check_defined_type(self.upper_threshold, descr, ['float', 'int'])
return True
__init__(self, percentile=1.0, upper_threshold=1.0)
special
¶Source code in federatedml/param/feature_selection_param.py
def __init__(self, percentile=1.0, upper_threshold=1.0):
self.percentile = percentile
self.upper_threshold = upper_threshold
check(self)
¶Source code in federatedml/param/feature_selection_param.py
def check(self):
descr = "Outlier Filter param's"
self.check_decimal_float(self.percentile, descr)
self.check_defined_type(self.upper_threshold, descr, ['float', 'int'])
return True
CommonFilterParam (BaseParam)
¶All of the following parameters can set with a single value or a list of those values. When setting one single value, it means using only one metric to filter while a list represent for using multiple metrics.
Please note that if some of the following values has been set as list, all of them should have same length. Otherwise, error will be raised. And if there exist a list type parameter, the metrics should be in list type.
Parameters¶
str or list, default: depends on the specific filter
Indicate what metrics are used in this filter
str, default: threshold
Should be one of "threshold", "top_k" or "top_percentile"
bool, default: True
When filtering, taking highest values or not.
float or int, default: 1
If filter type is threshold, this is the threshold value. If it is "top_k", this is the k value. If it is top_percentile, this is the percentile threshold.
List of float or List of List of float or None, default: None
Set threshold for different host. If None, use same threshold as guest. If provided, the order should map with the host id setting.
bool, default: True
Whether select federated with other parties or based on local variables
Source code in federatedml/param/feature_selection_param.py
class CommonFilterParam(BaseParam):
"""
All of the following parameters can set with a single value or a list of those values.
When setting one single value, it means using only one metric to filter while
a list represent for using multiple metrics.
Please note that if some of the following values has been set as list, all of them
should have same length. Otherwise, error will be raised. And if there exist a list
type parameter, the metrics should be in list type.
Parameters
----------
metrics: str or list, default: depends on the specific filter
Indicate what metrics are used in this filter
filter_type: str, default: threshold
Should be one of "threshold", "top_k" or "top_percentile"
take_high: bool, default: True
When filtering, taking highest values or not.
threshold: float or int, default: 1
If filter type is threshold, this is the threshold value.
If it is "top_k", this is the k value.
If it is top_percentile, this is the percentile threshold.
host_thresholds: List of float or List of List of float or None, default: None
Set threshold for different host. If None, use same threshold as guest. If provided, the order should map with
the host id setting.
select_federated: bool, default: True
Whether select federated with other parties or based on local variables
"""
def __init__(self, metrics, filter_type='threshold', take_high=True, threshold=1,
host_thresholds=None, select_federated=True):
super().__init__()
self.metrics = metrics
self.filter_type = filter_type
self.take_high = take_high
self.threshold = threshold
self.host_thresholds = host_thresholds
self.select_federated = select_federated
def check(self):
self._convert_to_list(param_names=["filter_type", "take_high",
"threshold", "select_federated"])
for v in self.filter_type:
if v not in ["threshold", "top_k", "top_percentile"]:
raise ValueError('filter_type should be one of '
'"threshold", "top_k", "top_percentile"')
descr = "hetero feature selection param's"
for v in self.take_high:
self.check_boolean(v, descr)
for idx, v in enumerate(self.threshold):
if self.filter_type[idx] == "threshold":
if not isinstance(v, (float, int)):
raise ValueError(descr + f"{v} should be a float or int")
elif self.filter_type[idx] == 'top_k':
self.check_positive_integer(v, descr)
else:
if not (v == 0 or v == 1):
self.check_decimal_float(v, descr)
if self.host_thresholds is not None:
if not isinstance(self.host_thresholds, list):
raise ValueError("IV selection param's host_threshold should be list or None")
assert isinstance(self.select_federated, list)
for v in self.select_federated:
self.check_boolean(v, descr)
def _convert_to_list(self, param_names):
if not isinstance(self.metrics, list):
for value_name in param_names:
v = getattr(self, value_name)
if isinstance(v, list):
raise ValueError(f"{value_name}: {v} should not be a list when "
f"metrics: {self.metrics} is not a list")
setattr(self, value_name, [v])
setattr(self, "metrics", [self.metrics])
else:
expected_length = len(self.metrics)
for value_name in param_names:
v = getattr(self, value_name)
if isinstance(v, list):
if len(v) != expected_length:
raise ValueError(f"The parameter {v} should have same length "
f"with metrics")
else:
new_v = [v] * expected_length
setattr(self, value_name, new_v)
__init__(self, metrics, filter_type='threshold', take_high=True, threshold=1, host_thresholds=None, select_federated=True)
special
¶Source code in federatedml/param/feature_selection_param.py
def __init__(self, metrics, filter_type='threshold', take_high=True, threshold=1,
host_thresholds=None, select_federated=True):
super().__init__()
self.metrics = metrics
self.filter_type = filter_type
self.take_high = take_high
self.threshold = threshold
self.host_thresholds = host_thresholds
self.select_federated = select_federated
check(self)
¶Source code in federatedml/param/feature_selection_param.py
def check(self):
self._convert_to_list(param_names=["filter_type", "take_high",
"threshold", "select_federated"])
for v in self.filter_type:
if v not in ["threshold", "top_k", "top_percentile"]:
raise ValueError('filter_type should be one of '
'"threshold", "top_k", "top_percentile"')
descr = "hetero feature selection param's"
for v in self.take_high:
self.check_boolean(v, descr)
for idx, v in enumerate(self.threshold):
if self.filter_type[idx] == "threshold":
if not isinstance(v, (float, int)):
raise ValueError(descr + f"{v} should be a float or int")
elif self.filter_type[idx] == 'top_k':
self.check_positive_integer(v, descr)
else:
if not (v == 0 or v == 1):
self.check_decimal_float(v, descr)
if self.host_thresholds is not None:
if not isinstance(self.host_thresholds, list):
raise ValueError("IV selection param's host_threshold should be list or None")
assert isinstance(self.select_federated, list)
for v in self.select_federated:
self.check_boolean(v, descr)
IVFilterParam (CommonFilterParam)
¶Parameters¶
str or list, default: "average"
Indicate how to merge multi-class iv results. Support "average", "min" and "max".
Source code in federatedml/param/feature_selection_param.py
class IVFilterParam(CommonFilterParam):
"""
Parameters
----------
mul_class_merge_type: str or list, default: "average"
Indicate how to merge multi-class iv results. Support "average", "min" and "max".
"""
def __init__(self, filter_type='threshold', threshold=1,
host_thresholds=None, select_federated=True, mul_class_merge_type="average"):
super().__init__(metrics='iv', filter_type=filter_type, take_high=True, threshold=threshold,
host_thresholds=host_thresholds, select_federated=select_federated)
self.mul_class_merge_type = mul_class_merge_type
def check(self):
super(IVFilterParam, self).check()
self._convert_to_list(param_names=["mul_class_merge_type"])
__init__(self, filter_type='threshold', threshold=1, host_thresholds=None, select_federated=True, mul_class_merge_type='average')
special
¶Source code in federatedml/param/feature_selection_param.py
def __init__(self, filter_type='threshold', threshold=1,
host_thresholds=None, select_federated=True, mul_class_merge_type="average"):
super().__init__(metrics='iv', filter_type=filter_type, take_high=True, threshold=threshold,
host_thresholds=host_thresholds, select_federated=select_federated)
self.mul_class_merge_type = mul_class_merge_type
check(self)
¶Source code in federatedml/param/feature_selection_param.py
def check(self):
super(IVFilterParam, self).check()
self._convert_to_list(param_names=["mul_class_merge_type"])
CorrelationFilterParam (BaseParam)
¶This filter follow this specific rules: 1. Sort all the columns from high to low based on specific metric, eg. iv. 2. Traverse each sorted column. If there exists other columns with whom the absolute values of correlation are larger than threshold, they will be filtered.
Parameters¶
str, default: iv
Specify which metric to be used to sort features.
float or int, default: 0.1
Correlation threshold
bool, default: True
Whether select federated with other parties or based on local variables
Source code in federatedml/param/feature_selection_param.py
class CorrelationFilterParam(BaseParam):
"""
This filter follow this specific rules:
1. Sort all the columns from high to low based on specific metric, eg. iv.
2. Traverse each sorted column. If there exists other columns with whom the
absolute values of correlation are larger than threshold, they will be filtered.
Parameters
----------
sort_metric: str, default: iv
Specify which metric to be used to sort features.
threshold: float or int, default: 0.1
Correlation threshold
select_federated: bool, default: True
Whether select federated with other parties or based on local variables
"""
def __init__(self, sort_metric='iv', threshold=0.1, select_federated=True):
super().__init__()
self.sort_metric = sort_metric
self.threshold = threshold
self.select_federated = select_federated
def check(self):
descr = "Correlation Filter param's"
self.sort_metric = self.sort_metric.lower()
support_metrics = ['iv']
if self.sort_metric not in support_metrics:
raise ValueError(f"sort_metric in Correlation Filter should be one of {support_metrics}")
self.check_positive_number(self.threshold, descr)
__init__(self, sort_metric='iv', threshold=0.1, select_federated=True)
special
¶Source code in federatedml/param/feature_selection_param.py
def __init__(self, sort_metric='iv', threshold=0.1, select_federated=True):
super().__init__()
self.sort_metric = sort_metric
self.threshold = threshold
self.select_federated = select_federated
check(self)
¶Source code in federatedml/param/feature_selection_param.py
def check(self):
descr = "Correlation Filter param's"
self.sort_metric = self.sort_metric.lower()
support_metrics = ['iv']
if self.sort_metric not in support_metrics:
raise ValueError(f"sort_metric in Correlation Filter should be one of {support_metrics}")
self.check_positive_number(self.threshold, descr)
PercentageValueParam (BaseParam)
¶Filter the columns that have a value that exceeds a certain percentage.
Parameters¶
float, [0.1, 1.], default: 1.0
The upper percentage threshold for filtering, upper_pct should not be less than 0.1.
Source code in federatedml/param/feature_selection_param.py
class PercentageValueParam(BaseParam):
"""
Filter the columns that have a value that exceeds a certain percentage.
Parameters
----------
upper_pct: float, [0.1, 1.], default: 1.0
The upper percentage threshold for filtering, upper_pct should not be less than 0.1.
"""
def __init__(self, upper_pct=1.0):
super().__init__()
self.upper_pct = upper_pct
def check(self):
descr = "Percentage Filter param's"
if self.upper_pct not in [0, 1]:
self.check_decimal_float(self.upper_pct, descr)
if self.upper_pct < consts.PERCENTAGE_VALUE_LIMIT:
raise ValueError(descr + f" {self.upper_pct} not supported,"
f" should not be smaller than {consts.PERCENTAGE_VALUE_LIMIT}")
return True
__init__(self, upper_pct=1.0)
special
¶Source code in federatedml/param/feature_selection_param.py
def __init__(self, upper_pct=1.0):
super().__init__()
self.upper_pct = upper_pct
check(self)
¶Source code in federatedml/param/feature_selection_param.py
def check(self):
descr = "Percentage Filter param's"
if self.upper_pct not in [0, 1]:
self.check_decimal_float(self.upper_pct, descr)
if self.upper_pct < consts.PERCENTAGE_VALUE_LIMIT:
raise ValueError(descr + f" {self.upper_pct} not supported,"
f" should not be smaller than {consts.PERCENTAGE_VALUE_LIMIT}")
return True
ManuallyFilterParam (BaseParam)
¶Specified columns that need to be filtered. If exist, it will be filtered directly, otherwise, ignore it.
Both Filter_out or left parameters only works for this specific filter. For instances, if you set some columns left in this filter but those columns are filtered by other filters, those columns will NOT left in final.
Please note that (left_col_indexes & left_col_names) cannot use with (filter_out_indexes & filter_out_names) simultaneously.
Parameters¶
list of int, default: None
Specify columns' indexes to be filtered out
filter_out_names : list of string, default: None Specify columns' names to be filtered out
list of int, default: None
Specify left_col_index
list of string, default: None
Specify left col names
Source code in federatedml/param/feature_selection_param.py
class ManuallyFilterParam(BaseParam):
"""
Specified columns that need to be filtered. If exist, it will be filtered directly, otherwise, ignore it.
Both Filter_out or left parameters only works for this specific filter. For instances, if you set some columns left
in this filter but those columns are filtered by other filters, those columns will NOT left in final.
Please note that (left_col_indexes & left_col_names) cannot use with (filter_out_indexes & filter_out_names) simultaneously.
Parameters
----------
filter_out_indexes: list of int, default: None
Specify columns' indexes to be filtered out
filter_out_names : list of string, default: None
Specify columns' names to be filtered out
left_col_indexes: list of int, default: None
Specify left_col_index
left_col_names: list of string, default: None
Specify left col names
"""
def __init__(self, filter_out_indexes=None, filter_out_names=None, left_col_indexes=None,
left_col_names=None):
super().__init__()
self.filter_out_indexes = filter_out_indexes
self.filter_out_names = filter_out_names
self.left_col_indexes = left_col_indexes
self.left_col_names = left_col_names
def check(self):
descr = "Manually Filter param's"
self.check_defined_type(self.filter_out_indexes, descr, ['list', 'NoneType'])
self.check_defined_type(self.filter_out_names, descr, ['list', 'NoneType'])
self.check_defined_type(self.left_col_indexes, descr, ['list', 'NoneType'])
self.check_defined_type(self.left_col_names, descr, ['list', 'NoneType'])
if (self.filter_out_indexes or self.filter_out_names) is not None and \
(self.left_col_names or self.left_col_indexes) is not None:
raise ValueError("(left_col_indexes & left_col_names) cannot use with"
" (filter_out_indexes & filter_out_names) simultaneously")
return True
__init__(self, filter_out_indexes=None, filter_out_names=None, left_col_indexes=None, left_col_names=None)
special
¶Source code in federatedml/param/feature_selection_param.py
def __init__(self, filter_out_indexes=None, filter_out_names=None, left_col_indexes=None,
left_col_names=None):
super().__init__()
self.filter_out_indexes = filter_out_indexes
self.filter_out_names = filter_out_names
self.left_col_indexes = left_col_indexes
self.left_col_names = left_col_names
check(self)
¶Source code in federatedml/param/feature_selection_param.py
def check(self):
descr = "Manually Filter param's"
self.check_defined_type(self.filter_out_indexes, descr, ['list', 'NoneType'])
self.check_defined_type(self.filter_out_names, descr, ['list', 'NoneType'])
self.check_defined_type(self.left_col_indexes, descr, ['list', 'NoneType'])
self.check_defined_type(self.left_col_names, descr, ['list', 'NoneType'])
if (self.filter_out_indexes or self.filter_out_names) is not None and \
(self.left_col_names or self.left_col_indexes) is not None:
raise ValueError("(left_col_indexes & left_col_names) cannot use with"
" (filter_out_indexes & filter_out_names) simultaneously")
return True
FeatureSelectionParam (BaseParam)
¶Define the feature selection parameters.
Parameters¶
list or int, default: -1
Specify which columns need to calculated. -1 represent for all columns.
select_names : list of string, default: [] Specify which columns need to calculated. Each element in the list represent for a column name in header.
list of ["manually", "iv_filter", "statistic_filter", "psi_filter", “hetero_sbt_filter", "homo_sbt_filter", "hetero_fast_sbt_filter", "percentage_value", "vif_filter", "correlation_filter"], default: ["manually"]
The following methods will be deprecated in future version: "unique_value", "iv_value_thres", "iv_percentile", "coefficient_of_variation_value_thres", "outlier_cols"
Specify the filter methods used in feature selection. The orders of filter used is depended on this list. Please be notified that, if a percentile method is used after some certain filter method, the percentile represent for the ratio of rest features.
e.g. If you have 10 features at the beginning. After first filter method, you have 8 rest. Then, you want top 80% highest iv feature. Here, we will choose floor(0.8 * 8) = 6 features instead of 8.
UniqueValueParam
filter the columns if all values in this feature is the same
IVValueSelectionParam
Use information value to filter columns. If this method is set, a float threshold need to be provided. Filter those columns whose iv is smaller than threshold. Will be deprecated in the future.
IVPercentileSelectionParam
Use information value to filter columns. If this method is set, a float ratio threshold need to be provided. Pick floor(ratio * feature_num) features with higher iv. If multiple features around the threshold are same, all those columns will be keep. Will be deprecated in the future.
VarianceOfCoeSelectionParam
Use coefficient of variation to judge whether filtered or not. Will be deprecated in the future.
OutlierColsSelectionParam
Filter columns whose certain percentile value is larger than a threshold. Will be deprecated in the future.
PercentageValueParam
Filter the columns that have a value that exceeds a certain percentage.
IVFilterParam
Setting how to filter base on iv. It support take high mode only. All of "threshold", "top_k" and "top_percentile" are accepted. Check more details in CommonFilterParam. To use this filter, hetero-feature-binning module has to be provided.
CommonFilterParam
Setting how to filter base on statistic values. All of "threshold", "top_k" and "top_percentile" are accepted. Check more details in CommonFilterParam. To use this filter, data_statistic module has to be provided.
CommonFilterParam
Setting how to filter base on psi values. All of "threshold", "top_k" and "top_percentile" are accepted. Its take_high properties should be False to choose lower psi features. Check more details in CommonFilterParam. To use this filter, data_statistic module has to be provided.
bool, default True
Indicate if this module needed to be run
Source code in federatedml/param/feature_selection_param.py
class FeatureSelectionParam(BaseParam):
"""
Define the feature selection parameters.
Parameters
----------
select_col_indexes: list or int, default: -1
Specify which columns need to calculated. -1 represent for all columns.
select_names : list of string, default: []
Specify which columns need to calculated. Each element in the list represent for a column name in header.
filter_methods: list of ["manually", "iv_filter", "statistic_filter", "psi_filter", “hetero_sbt_filter", "homo_sbt_filter", "hetero_fast_sbt_filter", "percentage_value", "vif_filter", "correlation_filter"], default: ["manually"]
The following methods will be deprecated in future version:
"unique_value", "iv_value_thres", "iv_percentile",
"coefficient_of_variation_value_thres", "outlier_cols"
Specify the filter methods used in feature selection. The orders of filter used is depended on this list.
Please be notified that, if a percentile method is used after some certain filter method,
the percentile represent for the ratio of rest features.
e.g. If you have 10 features at the beginning. After first filter method, you have 8 rest. Then, you want
top 80% highest iv feature. Here, we will choose floor(0.8 * 8) = 6 features instead of 8.
unique_param: UniqueValueParam
filter the columns if all values in this feature is the same
iv_value_param: IVValueSelectionParam
Use information value to filter columns. If this method is set, a float threshold need to be provided.
Filter those columns whose iv is smaller than threshold. Will be deprecated in the future.
iv_percentile_param: IVPercentileSelectionParam
Use information value to filter columns. If this method is set, a float ratio threshold
need to be provided. Pick floor(ratio * feature_num) features with higher iv. If multiple features around
the threshold are same, all those columns will be keep. Will be deprecated in the future.
variance_coe_param: VarianceOfCoeSelectionParam
Use coefficient of variation to judge whether filtered or not.
Will be deprecated in the future.
outlier_param: OutlierColsSelectionParam
Filter columns whose certain percentile value is larger than a threshold.
Will be deprecated in the future.
percentage_value_param: PercentageValueParam
Filter the columns that have a value that exceeds a certain percentage.
iv_param: IVFilterParam
Setting how to filter base on iv. It support take high mode only. All of "threshold",
"top_k" and "top_percentile" are accepted. Check more details in CommonFilterParam. To
use this filter, hetero-feature-binning module has to be provided.
statistic_param: CommonFilterParam
Setting how to filter base on statistic values. All of "threshold",
"top_k" and "top_percentile" are accepted. Check more details in CommonFilterParam.
To use this filter, data_statistic module has to be provided.
psi_param: CommonFilterParam
Setting how to filter base on psi values. All of "threshold",
"top_k" and "top_percentile" are accepted. Its take_high properties should be False
to choose lower psi features. Check more details in CommonFilterParam.
To use this filter, data_statistic module has to be provided.
need_run: bool, default True
Indicate if this module needed to be run
"""
def __init__(self, select_col_indexes=-1, select_names=None, filter_methods=None,
unique_param=UniqueValueParam(),
iv_value_param=IVValueSelectionParam(),
iv_percentile_param=IVPercentileSelectionParam(),
iv_top_k_param=IVTopKParam(),
variance_coe_param=VarianceOfCoeSelectionParam(),
outlier_param=OutlierColsSelectionParam(),
manually_param=ManuallyFilterParam(),
percentage_value_param=PercentageValueParam(),
iv_param=IVFilterParam(),
statistic_param=CommonFilterParam(metrics=consts.MEAN),
psi_param=CommonFilterParam(metrics=consts.PSI,
take_high=False),
vif_param=CommonFilterParam(metrics=consts.VIF,
threshold=5.0,
take_high=False),
sbt_param=CommonFilterParam(metrics=consts.FEATURE_IMPORTANCE),
correlation_param=CorrelationFilterParam(),
need_run=True
):
super(FeatureSelectionParam, self).__init__()
self.correlation_param = correlation_param
self.vif_param = vif_param
self.select_col_indexes = select_col_indexes
if select_names is None:
self.select_names = []
else:
self.select_names = select_names
if filter_methods is None:
self.filter_methods = [consts.MANUALLY_FILTER]
else:
self.filter_methods = filter_methods
# deprecate in the future
self.unique_param = copy.deepcopy(unique_param)
self.iv_value_param = copy.deepcopy(iv_value_param)
self.iv_percentile_param = copy.deepcopy(iv_percentile_param)
self.iv_top_k_param = copy.deepcopy(iv_top_k_param)
self.variance_coe_param = copy.deepcopy(variance_coe_param)
self.outlier_param = copy.deepcopy(outlier_param)
self.percentage_value_param = copy.deepcopy(percentage_value_param)
self.manually_param = copy.deepcopy(manually_param)
self.iv_param = copy.deepcopy(iv_param)
self.statistic_param = copy.deepcopy(statistic_param)
self.psi_param = copy.deepcopy(psi_param)
self.sbt_param = copy.deepcopy(sbt_param)
self.need_run = need_run
def check(self):
descr = "hetero feature selection param's"
self.check_defined_type(self.filter_methods, descr, ['list'])
for idx, method in enumerate(self.filter_methods):
method = method.lower()
self.check_valid_value(method, descr, [consts.UNIQUE_VALUE, consts.IV_VALUE_THRES, consts.IV_PERCENTILE,
consts.COEFFICIENT_OF_VARIATION_VALUE_THRES, consts.OUTLIER_COLS,
consts.MANUALLY_FILTER, consts.PERCENTAGE_VALUE,
consts.IV_FILTER, consts.STATISTIC_FILTER, consts.IV_TOP_K,
consts.PSI_FILTER, consts.HETERO_SBT_FILTER,
consts.HOMO_SBT_FILTER, consts.HETERO_FAST_SBT_FILTER,
consts.VIF_FILTER, consts.CORRELATION_FILTER])
self.filter_methods[idx] = method
self.check_defined_type(self.select_col_indexes, descr, ['list', 'int'])
self.unique_param.check()
self.iv_value_param.check()
self.iv_percentile_param.check()
self.iv_top_k_param.check()
self.variance_coe_param.check()
self.outlier_param.check()
self.manually_param.check()
self.percentage_value_param.check()
self.iv_param.check()
for th in self.iv_param.take_high:
if not th:
raise ValueError("Iv filter should take higher iv features")
for m in self.iv_param.metrics:
if m != consts.IV:
raise ValueError("For iv filter, metrics should be 'iv'")
self.statistic_param.check()
self.psi_param.check()
for th in self.psi_param.take_high:
if th:
raise ValueError("PSI filter should take lower psi features")
for m in self.psi_param.metrics:
if m != consts.PSI:
raise ValueError("For psi filter, metrics should be 'psi'")
self.sbt_param.check()
for th in self.sbt_param.take_high:
if not th:
raise ValueError("SBT filter should take higher feature_importance features")
for m in self.sbt_param.metrics:
if m != consts.FEATURE_IMPORTANCE:
raise ValueError("For SBT filter, metrics should be 'feature_importance'")
self.vif_param.check()
for m in self.vif_param.metrics:
if m != consts.VIF:
raise ValueError("For VIF filter, metrics should be 'vif'")
self.correlation_param.check()
self._warn_to_deprecate_param("iv_value_param", descr, "iv_param")
self._warn_to_deprecate_param("iv_percentile_param", descr, "iv_param")
self._warn_to_deprecate_param("iv_top_k_param", descr, "iv_param")
self._warn_to_deprecate_param("variance_coe_param", descr, "statistic_param")
self._warn_to_deprecate_param("unique_param", descr, "statistic_param")
self._warn_to_deprecate_param("outlier_param", descr, "statistic_param")
__init__(self, select_col_indexes=-1, select_names=None, filter_methods=None, unique_param=<federatedml.param.feature_selection_param.UniqueValueParam object at 0x7f3a40bc45d0>, iv_value_param=<federatedml.param.feature_selection_param.IVValueSelectionParam object at 0x7f3a40c4fe90>, iv_percentile_param=<federatedml.param.feature_selection_param.IVPercentileSelectionParam object at 0x7f3a40c2cbd0>, iv_top_k_param=<federatedml.param.feature_selection_param.IVTopKParam object at 0x7f3a40bc4750>, variance_coe_param=<federatedml.param.feature_selection_param.VarianceOfCoeSelectionParam object at 0x7f3a40bc46d0>, outlier_param=<federatedml.param.feature_selection_param.OutlierColsSelectionParam object at 0x7f3a40c2cfd0>, manually_param=<federatedml.param.feature_selection_param.ManuallyFilterParam object at 0x7f3a40bc4610>, percentage_value_param=<federatedml.param.feature_selection_param.PercentageValueParam object at 0x7f3a40bc4790>, iv_param=<federatedml.param.feature_selection_param.IVFilterParam object at 0x7f3a40c35110>, statistic_param=<federatedml.param.feature_selection_param.CommonFilterParam object at 0x7f3a40c35290>, psi_param=<federatedml.param.feature_selection_param.CommonFilterParam object at 0x7f3a40c351d0>, vif_param=<federatedml.param.feature_selection_param.CommonFilterParam object at 0x7f3a40c35210>, sbt_param=<federatedml.param.feature_selection_param.CommonFilterParam object at 0x7f3a40c35310>, correlation_param=<federatedml.param.feature_selection_param.CorrelationFilterParam object at 0x7f3a40c35350>, need_run=True)
special
¶Source code in federatedml/param/feature_selection_param.py
def __init__(self, select_col_indexes=-1, select_names=None, filter_methods=None,
unique_param=UniqueValueParam(),
iv_value_param=IVValueSelectionParam(),
iv_percentile_param=IVPercentileSelectionParam(),
iv_top_k_param=IVTopKParam(),
variance_coe_param=VarianceOfCoeSelectionParam(),
outlier_param=OutlierColsSelectionParam(),
manually_param=ManuallyFilterParam(),
percentage_value_param=PercentageValueParam(),
iv_param=IVFilterParam(),
statistic_param=CommonFilterParam(metrics=consts.MEAN),
psi_param=CommonFilterParam(metrics=consts.PSI,
take_high=False),
vif_param=CommonFilterParam(metrics=consts.VIF,
threshold=5.0,
take_high=False),
sbt_param=CommonFilterParam(metrics=consts.FEATURE_IMPORTANCE),
correlation_param=CorrelationFilterParam(),
need_run=True
):
super(FeatureSelectionParam, self).__init__()
self.correlation_param = correlation_param
self.vif_param = vif_param
self.select_col_indexes = select_col_indexes
if select_names is None:
self.select_names = []
else:
self.select_names = select_names
if filter_methods is None:
self.filter_methods = [consts.MANUALLY_FILTER]
else:
self.filter_methods = filter_methods
# deprecate in the future
self.unique_param = copy.deepcopy(unique_param)
self.iv_value_param = copy.deepcopy(iv_value_param)
self.iv_percentile_param = copy.deepcopy(iv_percentile_param)
self.iv_top_k_param = copy.deepcopy(iv_top_k_param)
self.variance_coe_param = copy.deepcopy(variance_coe_param)
self.outlier_param = copy.deepcopy(outlier_param)
self.percentage_value_param = copy.deepcopy(percentage_value_param)
self.manually_param = copy.deepcopy(manually_param)
self.iv_param = copy.deepcopy(iv_param)
self.statistic_param = copy.deepcopy(statistic_param)
self.psi_param = copy.deepcopy(psi_param)
self.sbt_param = copy.deepcopy(sbt_param)
self.need_run = need_run
check(self)
¶Source code in federatedml/param/feature_selection_param.py
def check(self):
descr = "hetero feature selection param's"
self.check_defined_type(self.filter_methods, descr, ['list'])
for idx, method in enumerate(self.filter_methods):
method = method.lower()
self.check_valid_value(method, descr, [consts.UNIQUE_VALUE, consts.IV_VALUE_THRES, consts.IV_PERCENTILE,
consts.COEFFICIENT_OF_VARIATION_VALUE_THRES, consts.OUTLIER_COLS,
consts.MANUALLY_FILTER, consts.PERCENTAGE_VALUE,
consts.IV_FILTER, consts.STATISTIC_FILTER, consts.IV_TOP_K,
consts.PSI_FILTER, consts.HETERO_SBT_FILTER,
consts.HOMO_SBT_FILTER, consts.HETERO_FAST_SBT_FILTER,
consts.VIF_FILTER, consts.CORRELATION_FILTER])
self.filter_methods[idx] = method
self.check_defined_type(self.select_col_indexes, descr, ['list', 'int'])
self.unique_param.check()
self.iv_value_param.check()
self.iv_percentile_param.check()
self.iv_top_k_param.check()
self.variance_coe_param.check()
self.outlier_param.check()
self.manually_param.check()
self.percentage_value_param.check()
self.iv_param.check()
for th in self.iv_param.take_high:
if not th:
raise ValueError("Iv filter should take higher iv features")
for m in self.iv_param.metrics:
if m != consts.IV:
raise ValueError("For iv filter, metrics should be 'iv'")
self.statistic_param.check()
self.psi_param.check()
for th in self.psi_param.take_high:
if th:
raise ValueError("PSI filter should take lower psi features")
for m in self.psi_param.metrics:
if m != consts.PSI:
raise ValueError("For psi filter, metrics should be 'psi'")
self.sbt_param.check()
for th in self.sbt_param.take_high:
if not th:
raise ValueError("SBT filter should take higher feature_importance features")
for m in self.sbt_param.metrics:
if m != consts.FEATURE_IMPORTANCE:
raise ValueError("For SBT filter, metrics should be 'feature_importance'")
self.vif_param.check()
for m in self.vif_param.metrics:
if m != consts.VIF:
raise ValueError("For VIF filter, metrics should be 'vif'")
self.correlation_param.check()
self._warn_to_deprecate_param("iv_value_param", descr, "iv_param")
self._warn_to_deprecate_param("iv_percentile_param", descr, "iv_param")
self._warn_to_deprecate_param("iv_top_k_param", descr, "iv_param")
self._warn_to_deprecate_param("variance_coe_param", descr, "statistic_param")
self._warn_to_deprecate_param("unique_param", descr, "statistic_param")
self._warn_to_deprecate_param("outlier_param", descr, "statistic_param")
feldman_verifiable_sum_param
¶
Classes¶
FeldmanVerifiableSumParam (BaseParam)
¶Define how to transfer the cols
Parameters¶
sum_cols : list of column index, default: None Specify which columns need to be sum. If column index is None, each of columns will be sum.
q_n : int, positive integer less than or equal to 16, default: 6 q_n is the number of significant decimal digit, If the data type is a float, the maximum significant digit is 16. The sum of integer and significant decimal digits should be less than or equal to 16.
Source code in federatedml/param/feldman_verifiable_sum_param.py
class FeldmanVerifiableSumParam(BaseParam):
"""
Define how to transfer the cols
Parameters
----------
sum_cols : list of column index, default: None
Specify which columns need to be sum. If column index is None, each of columns will be sum.
q_n : int, positive integer less than or equal to 16, default: 6
q_n is the number of significant decimal digit, If the data type is a float,
the maximum significant digit is 16. The sum of integer and significant decimal digits should
be less than or equal to 16.
"""
def __init__(self, sum_cols=None, q_n=6):
self.sum_cols = sum_cols
if sum_cols is None:
self.sum_cols = []
self.q_n = q_n
def check(self):
if isinstance(self.sum_cols, list):
for idx in self.sum_cols:
if not isinstance(idx, int):
raise ValueError(f"type mismatch, column_indexes with element {idx}(type is {type(idx)})")
if not isinstance(self.q_n, int):
raise ValueError(f"Init param's q_n {self.q_n} not supported, should be int type", type is {type(self.q_n)})
if self.q_n < 0:
raise ValueError(f"param's q_n {self.q_n} not supported, should be non-negative int value")
elif self.q_n > 16:
raise ValueError(f"param's q_n {self.q_n} not supported, should be less than or equal to 16")
__init__(self, sum_cols=None, q_n=6)
special
¶Source code in federatedml/param/feldman_verifiable_sum_param.py
def __init__(self, sum_cols=None, q_n=6):
self.sum_cols = sum_cols
if sum_cols is None:
self.sum_cols = []
self.q_n = q_n
check(self)
¶Source code in federatedml/param/feldman_verifiable_sum_param.py
def check(self):
if isinstance(self.sum_cols, list):
for idx in self.sum_cols:
if not isinstance(idx, int):
raise ValueError(f"type mismatch, column_indexes with element {idx}(type is {type(idx)})")
if not isinstance(self.q_n, int):
raise ValueError(f"Init param's q_n {self.q_n} not supported, should be int type", type is {type(self.q_n)})
if self.q_n < 0:
raise ValueError(f"param's q_n {self.q_n} not supported, should be non-negative int value")
elif self.q_n > 16:
raise ValueError(f"param's q_n {self.q_n} not supported, should be less than or equal to 16")
ftl_param
¶
deprecated_param_list
¶Classes¶
FTLParam (BaseParam)
¶Source code in federatedml/param/ftl_param.py
class FTLParam(BaseParam):
def __init__(self, alpha=1, tol=0.000001,
n_iter_no_change=False, validation_freqs=None, optimizer={'optimizer': 'Adam', 'learning_rate': 0.01},
nn_define={}, epochs=1, intersect_param=IntersectParam(consts.RSA), config_type='keras', batch_size=-1,
encrypte_param=EncryptParam(),
encrypted_mode_calculator_param=EncryptedModeCalculatorParam(mode="confusion_opt"),
predict_param=PredictParam(), mode='plain', communication_efficient=False,
local_round=5, callback_param=CallbackParam()):
"""
Parameters
----------
alpha : float
a loss coefficient defined in paper, it defines the importance of alignment loss
tol : float
loss tolerance
n_iter_no_change : bool
check loss convergence or not
validation_freqs : None or positive integer or container object in python
Do validation in training process or Not.
if equals None, will not do validation in train process;
if equals positive integer, will validate data every validation_freqs epochs passes;
if container object in python, will validate data if epochs belong to this container.
e.g. validation_freqs = [10, 15], will validate data when epoch equals to 10 and 15.
The default value is None, 1 is suggested. You can set it to a number larger than 1 in order to
speed up training by skipping validation rounds. When it is larger than 1, a number which is
divisible by "epochs" is recommended, otherwise, you will miss the validation scores
of last training epoch.
optimizer : str or dict
optimizer method, accept following types:
1. a string, one of "Adadelta", "Adagrad", "Adam", "Adamax", "Nadam", "RMSprop", "SGD"
2. a dict, with a required key-value pair keyed by "optimizer",
with optional key-value pairs such as learning rate.
defaults to "SGD"
nn_define : dict
a dict represents the structure of neural network, it can be output by tf-keras
epochs : int
epochs num
intersect_param
define the intersect method
config_type : {'tf-keras'}
config type
batch_size : int
batch size when computing transformed feature embedding, -1 use full data.
encrypte_param
encrypted param
encrypted_mode_calculator_param
encrypted mode calculator param:
predict_param
predict param
mode: {"plain", "encrypted"}
plain: will not use any encrypt algorithms, data exchanged in plaintext
encrypted: use paillier to encrypt gradients
communication_efficient: bool
will use communication efficient or not. when communication efficient is enabled, FTL model will
update gradients by several local rounds using intermediate data
local_round: int
local update round when using communication efficient
"""
super(FTLParam, self).__init__()
self.alpha = alpha
self.tol = tol
self.n_iter_no_change = n_iter_no_change
self.validation_freqs = validation_freqs
self.optimizer = optimizer
self.nn_define = nn_define
self.epochs = epochs
self.intersect_param = copy.deepcopy(intersect_param)
self.config_type = config_type
self.batch_size = batch_size
self.encrypted_mode_calculator_param = copy.deepcopy(encrypted_mode_calculator_param)
self.encrypt_param = copy.deepcopy(encrypte_param)
self.predict_param = copy.deepcopy(predict_param)
self.mode = mode
self.communication_efficient = communication_efficient
self.local_round = local_round
self.callback_param = copy.deepcopy(callback_param)
def check(self):
self.intersect_param.check()
self.encrypt_param.check()
self.encrypted_mode_calculator_param.check()
self.optimizer = self._parse_optimizer(self.optimizer)
supported_config_type = ["keras"]
if self.config_type not in supported_config_type:
raise ValueError(f"config_type should be one of {supported_config_type}")
if not isinstance(self.tol, (int, float)):
raise ValueError("tol should be numeric")
if not isinstance(self.epochs, int) or self.epochs <= 0:
raise ValueError("epochs should be a positive integer")
if self.nn_define and not isinstance(self.nn_define, dict):
raise ValueError("bottom_nn_define should be a dict defining the structure of neural network")
if self.batch_size != -1:
if not isinstance(self.batch_size, int) \
or self.batch_size < consts.MIN_BATCH_SIZE:
raise ValueError(
" {} not supported, should be larger than 10 or -1 represent for all data".format(self.batch_size))
for p in deprecated_param_list:
# if self._warn_to_deprecate_param(p, "", ""):
if self._deprecated_params_set.get(p):
if "callback_param" in self.get_user_feeded():
raise ValueError(f"{p} and callback param should not be set simultaneously,"
f"{self._deprecated_params_set}, {self.get_user_feeded()}")
else:
self.callback_param.callbacks = ["PerformanceEvaluate"]
break
descr = "ftl's"
if self._warn_to_deprecate_param("validation_freqs", descr, "callback_param's 'validation_freqs'"):
self.callback_param.validation_freqs = self.validation_freqs
if self._warn_to_deprecate_param("metrics", descr, "callback_param's 'metrics'"):
self.callback_param.metrics = self.metrics
if self.validation_freqs is None:
pass
elif isinstance(self.validation_freqs, int):
if self.validation_freqs < 1:
raise ValueError("validation_freqs should be larger than 0 when it's integer")
elif not isinstance(self.validation_freqs, collections.Container):
raise ValueError("validation_freqs should be None or positive integer or container")
assert isinstance(self.communication_efficient, bool), 'communication efficient must be a boolean'
assert self.mode in [
'encrypted', 'plain'], 'mode options: encrpyted or plain, but {} is offered'.format(
self.mode)
self.check_positive_integer(self.epochs, 'epochs')
self.check_positive_number(self.alpha, 'alpha')
self.check_positive_integer(self.local_round, 'local round')
@staticmethod
def _parse_optimizer(opt):
"""
Examples:
1. "optimize": "SGD"
2. "optimize": {
"optimizer": "SGD",
"learning_rate": 0.05
}
"""
kwargs = {}
if isinstance(opt, str):
return SimpleNamespace(optimizer=opt, kwargs=kwargs)
elif isinstance(opt, dict):
optimizer = opt.get("optimizer", kwargs)
if not optimizer:
raise ValueError(f"optimizer config: {opt} invalid")
kwargs = {k: v for k, v in opt.items() if k != "optimizer"}
return SimpleNamespace(optimizer=optimizer, kwargs=kwargs)
else:
raise ValueError(f"invalid type for optimize: {type(opt)}")
__init__(self, alpha=1, tol=1e-06, n_iter_no_change=False, validation_freqs=None, optimizer={'optimizer': 'Adam', 'learning_rate': 0.01}, nn_define={}, epochs=1, intersect_param=<federatedml.param.intersect_param.IntersectParam object at 0x7f3a40c355d0>, config_type='keras', batch_size=-1, encrypte_param=<federatedml.param.encrypt_param.EncryptParam object at 0x7f3a40c35990>, encrypted_mode_calculator_param=<federatedml.param.encrypted_mode_calculation_param.EncryptedModeCalculatorParam object at 0x7f3a40c357d0>, predict_param=<federatedml.param.predict_param.PredictParam object at 0x7f3a40c35f90>, mode='plain', communication_efficient=False, local_round=5, callback_param=<federatedml.param.callback_param.CallbackParam object at 0x7f3a40c35610>)
special
¶Parameters¶
alpha : float a loss coefficient defined in paper, it defines the importance of alignment loss tol : float loss tolerance n_iter_no_change : bool check loss convergence or not validation_freqs : None or positive integer or container object in python Do validation in training process or Not. if equals None, will not do validation in train process; if equals positive integer, will validate data every validation_freqs epochs passes; if container object in python, will validate data if epochs belong to this container. e.g. validation_freqs = [10, 15], will validate data when epoch equals to 10 and 15. The default value is None, 1 is suggested. You can set it to a number larger than 1 in order to speed up training by skipping validation rounds. When it is larger than 1, a number which is divisible by "epochs" is recommended, otherwise, you will miss the validation scores of last training epoch. optimizer : str or dict optimizer method, accept following types: 1. a string, one of "Adadelta", "Adagrad", "Adam", "Adamax", "Nadam", "RMSprop", "SGD" 2. a dict, with a required key-value pair keyed by "optimizer", with optional key-value pairs such as learning rate. defaults to "SGD" nn_define : dict a dict represents the structure of neural network, it can be output by tf-keras epochs : int epochs num intersect_param define the intersect method config_type : {'tf-keras'} config type batch_size : int batch size when computing transformed feature embedding, -1 use full data. encrypte_param encrypted param encrypted_mode_calculator_param encrypted mode calculator param: predict_param predict param
{"plain", "encrypted"}
plain: will not use any encrypt algorithms, data exchanged in plaintext encrypted: use paillier to encrypt gradients
bool
will use communication efficient or not. when communication efficient is enabled, FTL model will update gradients by several local rounds using intermediate data
int
local update round when using communication efficient
Source code in federatedml/param/ftl_param.py
def __init__(self, alpha=1, tol=0.000001,
n_iter_no_change=False, validation_freqs=None, optimizer={'optimizer': 'Adam', 'learning_rate': 0.01},
nn_define={}, epochs=1, intersect_param=IntersectParam(consts.RSA), config_type='keras', batch_size=-1,
encrypte_param=EncryptParam(),
encrypted_mode_calculator_param=EncryptedModeCalculatorParam(mode="confusion_opt"),
predict_param=PredictParam(), mode='plain', communication_efficient=False,
local_round=5, callback_param=CallbackParam()):
"""
Parameters
----------
alpha : float
a loss coefficient defined in paper, it defines the importance of alignment loss
tol : float
loss tolerance
n_iter_no_change : bool
check loss convergence or not
validation_freqs : None or positive integer or container object in python
Do validation in training process or Not.
if equals None, will not do validation in train process;
if equals positive integer, will validate data every validation_freqs epochs passes;
if container object in python, will validate data if epochs belong to this container.
e.g. validation_freqs = [10, 15], will validate data when epoch equals to 10 and 15.
The default value is None, 1 is suggested. You can set it to a number larger than 1 in order to
speed up training by skipping validation rounds. When it is larger than 1, a number which is
divisible by "epochs" is recommended, otherwise, you will miss the validation scores
of last training epoch.
optimizer : str or dict
optimizer method, accept following types:
1. a string, one of "Adadelta", "Adagrad", "Adam", "Adamax", "Nadam", "RMSprop", "SGD"
2. a dict, with a required key-value pair keyed by "optimizer",
with optional key-value pairs such as learning rate.
defaults to "SGD"
nn_define : dict
a dict represents the structure of neural network, it can be output by tf-keras
epochs : int
epochs num
intersect_param
define the intersect method
config_type : {'tf-keras'}
config type
batch_size : int
batch size when computing transformed feature embedding, -1 use full data.
encrypte_param
encrypted param
encrypted_mode_calculator_param
encrypted mode calculator param:
predict_param
predict param
mode: {"plain", "encrypted"}
plain: will not use any encrypt algorithms, data exchanged in plaintext
encrypted: use paillier to encrypt gradients
communication_efficient: bool
will use communication efficient or not. when communication efficient is enabled, FTL model will
update gradients by several local rounds using intermediate data
local_round: int
local update round when using communication efficient
"""
super(FTLParam, self).__init__()
self.alpha = alpha
self.tol = tol
self.n_iter_no_change = n_iter_no_change
self.validation_freqs = validation_freqs
self.optimizer = optimizer
self.nn_define = nn_define
self.epochs = epochs
self.intersect_param = copy.deepcopy(intersect_param)
self.config_type = config_type
self.batch_size = batch_size
self.encrypted_mode_calculator_param = copy.deepcopy(encrypted_mode_calculator_param)
self.encrypt_param = copy.deepcopy(encrypte_param)
self.predict_param = copy.deepcopy(predict_param)
self.mode = mode
self.communication_efficient = communication_efficient
self.local_round = local_round
self.callback_param = copy.deepcopy(callback_param)
check(self)
¶Source code in federatedml/param/ftl_param.py
def check(self):
self.intersect_param.check()
self.encrypt_param.check()
self.encrypted_mode_calculator_param.check()
self.optimizer = self._parse_optimizer(self.optimizer)
supported_config_type = ["keras"]
if self.config_type not in supported_config_type:
raise ValueError(f"config_type should be one of {supported_config_type}")
if not isinstance(self.tol, (int, float)):
raise ValueError("tol should be numeric")
if not isinstance(self.epochs, int) or self.epochs <= 0:
raise ValueError("epochs should be a positive integer")
if self.nn_define and not isinstance(self.nn_define, dict):
raise ValueError("bottom_nn_define should be a dict defining the structure of neural network")
if self.batch_size != -1:
if not isinstance(self.batch_size, int) \
or self.batch_size < consts.MIN_BATCH_SIZE:
raise ValueError(
" {} not supported, should be larger than 10 or -1 represent for all data".format(self.batch_size))
for p in deprecated_param_list:
# if self._warn_to_deprecate_param(p, "", ""):
if self._deprecated_params_set.get(p):
if "callback_param" in self.get_user_feeded():
raise ValueError(f"{p} and callback param should not be set simultaneously,"
f"{self._deprecated_params_set}, {self.get_user_feeded()}")
else:
self.callback_param.callbacks = ["PerformanceEvaluate"]
break
descr = "ftl's"
if self._warn_to_deprecate_param("validation_freqs", descr, "callback_param's 'validation_freqs'"):
self.callback_param.validation_freqs = self.validation_freqs
if self._warn_to_deprecate_param("metrics", descr, "callback_param's 'metrics'"):
self.callback_param.metrics = self.metrics
if self.validation_freqs is None:
pass
elif isinstance(self.validation_freqs, int):
if self.validation_freqs < 1:
raise ValueError("validation_freqs should be larger than 0 when it's integer")
elif not isinstance(self.validation_freqs, collections.Container):
raise ValueError("validation_freqs should be None or positive integer or container")
assert isinstance(self.communication_efficient, bool), 'communication efficient must be a boolean'
assert self.mode in [
'encrypted', 'plain'], 'mode options: encrpyted or plain, but {} is offered'.format(
self.mode)
self.check_positive_integer(self.epochs, 'epochs')
self.check_positive_number(self.alpha, 'alpha')
self.check_positive_integer(self.local_round, 'local round')
glm_param
¶
Classes¶
LinearModelParam (BaseParam)
¶Parameters used for GLM.
Parameters¶
penalty : {'L2' or 'L1'} Penalty method used in LinR. Please note that, when using encrypted version in HeteroLinR, 'L1' is not supported.
tol : float, default: 1e-4 The tolerance of convergence
alpha : float, default: 1.0 Regularization strength coefficient.
optimizer : {'sgd', 'rmsprop', 'adam', 'sqn', 'adagrad', 'nesterov_momentum_sgd'} Optimize method
batch_size : int, default: -1 Batch size when updating model. -1 means use all data in a batch. i.e. Not to use mini-batch strategy.
learning_rate : float, default: 0.01 Learning rate
max_iter : int, default: 20 The maximum iteration for training.
InitParam object, default: default InitParam object
Init param method object.
early_stop : {'diff', 'abs', 'weight_dff'} Method used to judge convergence. a) diff: Use difference of loss between two iterations to judge whether converge. b) abs: Use the absolute value of loss to judge whether converge. i.e. if loss < tol, it is converged. c) weight_diff: Use difference between weights of two consecutive iterations
EncryptParam object, default: default EncryptParam object
encrypt param
CrossValidationParam object, default: default CrossValidationParam object
cv param
int or float, default: 1
Decay rate for learning rate. learning rate will follow the following decay schedule. lr = lr0/(1+decay*t) if decay_sqrt is False. If decay_sqrt is True, lr = lr0 / sqrt(1+decay*t) where t is the iter number.
Bool, default: True
lr = lr0/(1+decay*t) if decay_sqrt is False, otherwise, lr = lr0 / sqrt(1+decay*t)
int, list, tuple, set, or None
validation frequency during training, required when using early stopping. The default value is None, 1 is suggested. You can set it to a number larger than 1 in order to speed up training by skipping validation rounds. When it is larger than 1, a number which is divisible by "max_iter" is recommended, otherwise, you will miss the validation scores of the last training iteration.
int, default: None
If positive number specified, at every specified training rounds, program checks for early stopping criteria. Validation_freqs must also be set when using early stopping.
list or None, default: None
Specify which metrics to be used when performing evaluation during training process. If metrics have not improved at early_stopping rounds, trianing stops before convergence. If set as empty, default metrics will be used. For regression tasks, default metrics are ['root_mean_squared_error', 'mean_absolute_error']
bool, default: False
Indicate whether to use the first metric in metrics
as the only criterion for early stopping judgement.
None or integer
if not None, use floating_point_precision-bit to speed up calculation, e.g.: convert an x to round(x * 2**floating_point_precision) during Paillier operation, divide the result by 2**floating_point_precision in the end.
CallbackParam object
callback param
Source code in federatedml/param/glm_param.py
class LinearModelParam(BaseParam):
"""
Parameters used for GLM.
Parameters
----------
penalty : {'L2' or 'L1'}
Penalty method used in LinR. Please note that, when using encrypted version in HeteroLinR,
'L1' is not supported.
tol : float, default: 1e-4
The tolerance of convergence
alpha : float, default: 1.0
Regularization strength coefficient.
optimizer : {'sgd', 'rmsprop', 'adam', 'sqn', 'adagrad', 'nesterov_momentum_sgd'}
Optimize method
batch_size : int, default: -1
Batch size when updating model. -1 means use all data in a batch. i.e. Not to use mini-batch strategy.
learning_rate : float, default: 0.01
Learning rate
max_iter : int, default: 20
The maximum iteration for training.
init_param: InitParam object, default: default InitParam object
Init param method object.
early_stop : {'diff', 'abs', 'weight_dff'}
Method used to judge convergence.
a) diff: Use difference of loss between two iterations to judge whether converge.
b) abs: Use the absolute value of loss to judge whether converge. i.e. if loss < tol, it is converged.
c) weight_diff: Use difference between weights of two consecutive iterations
encrypt_param: EncryptParam object, default: default EncryptParam object
encrypt param
cv_param: CrossValidationParam object, default: default CrossValidationParam object
cv param
decay: int or float, default: 1
Decay rate for learning rate. learning rate will follow the following decay schedule.
lr = lr0/(1+decay*t) if decay_sqrt is False. If decay_sqrt is True, lr = lr0 / sqrt(1+decay*t)
where t is the iter number.
decay_sqrt: Bool, default: True
lr = lr0/(1+decay*t) if decay_sqrt is False, otherwise, lr = lr0 / sqrt(1+decay*t)
validation_freqs: int, list, tuple, set, or None
validation frequency during training, required when using early stopping.
The default value is None, 1 is suggested. You can set it to a number larger than 1 in order to speed up training by skipping validation rounds.
When it is larger than 1, a number which is divisible by "max_iter" is recommended, otherwise, you will miss the validation scores of the last training iteration.
early_stopping_rounds: int, default: None
If positive number specified, at every specified training rounds, program checks for early stopping criteria.
Validation_freqs must also be set when using early stopping.
metrics: list or None, default: None
Specify which metrics to be used when performing evaluation during training process. If metrics have not improved at early_stopping rounds, trianing stops before convergence.
If set as empty, default metrics will be used. For regression tasks, default metrics are ['root_mean_squared_error', 'mean_absolute_error']
use_first_metric_only: bool, default: False
Indicate whether to use the first metric in `metrics` as the only criterion for early stopping judgement.
floating_point_precision: None or integer
if not None, use floating_point_precision-bit to speed up calculation,
e.g.: convert an x to round(x * 2**floating_point_precision) during Paillier operation, divide
the result by 2**floating_point_precision in the end.
callback_param: CallbackParam object
callback param
"""
def __init__(self, penalty='L2',
tol=1e-4, alpha=1.0, optimizer='sgd',
batch_size=-1, learning_rate=0.01, init_param=InitParam(),
max_iter=100, early_stop='diff',
encrypt_param=EncryptParam(),
cv_param=CrossValidationParam(), decay=1, decay_sqrt=True, validation_freqs=None,
early_stopping_rounds=None, stepwise_param=StepwiseParam(), metrics=None, use_first_metric_only=False,
floating_point_precision=23, callback_param=CallbackParam()):
super(LinearModelParam, self).__init__()
self.penalty = penalty
self.tol = tol
self.alpha = alpha
self.optimizer = optimizer
self.batch_size = batch_size
self.learning_rate = learning_rate
self.init_param = copy.deepcopy(init_param)
self.max_iter = max_iter
self.early_stop = early_stop
self.encrypt_param = encrypt_param
self.cv_param = copy.deepcopy(cv_param)
self.decay = decay
self.decay_sqrt = decay_sqrt
self.validation_freqs = validation_freqs
self.early_stopping_rounds = early_stopping_rounds
self.stepwise_param = copy.deepcopy(stepwise_param)
self.metrics = metrics or []
self.use_first_metric_only = use_first_metric_only
self.floating_point_precision = floating_point_precision
self.callback_param = copy.deepcopy(callback_param)
def check(self):
descr = "linear model param's "
if self.penalty is None:
self.penalty = 'NONE'
elif type(self.penalty).__name__ != "str":
raise ValueError(
descr + "penalty {} not supported, should be str type".format(self.penalty))
self.penalty = self.penalty.upper()
if self.penalty not in [consts.L1_PENALTY, consts.L2_PENALTY, consts.NONE.upper()]:
raise ValueError(
"penalty {} not supported, penalty should be 'L1', 'L2' or 'NONE'".format(self.penalty))
if type(self.tol).__name__ not in ["int", "float"]:
raise ValueError(
descr + "tol {} not supported, should be float type".format(self.tol))
if type(self.alpha).__name__ not in ["int", "float"]:
raise ValueError(
descr + "alpha {} not supported, should be float type".format(self.alpha))
if type(self.optimizer).__name__ != "str":
raise ValueError(
descr + "optimizer {} not supported, should be str type".format(self.optimizer))
else:
self.optimizer = self.optimizer.lower()
if self.optimizer not in ['sgd', 'rmsprop', 'adam', 'adagrad', 'sqn', 'nesterov_momentum_sgd']:
raise ValueError(
descr + "optimizer not supported, optimizer should be"
" 'sgd', 'rmsprop', 'adam', 'sqn', 'adagrad', or 'nesterov_momentum_sgd'")
if type(self.batch_size).__name__ not in ["int", "long"]:
raise ValueError(
descr + "batch_size {} not supported, should be int type".format(self.batch_size))
if self.batch_size != -1:
if type(self.batch_size).__name__ not in ["int", "long"] \
or self.batch_size < consts.MIN_BATCH_SIZE:
raise ValueError(descr + " {} not supported, should be larger than {} or "
"-1 represent for all data".format(self.batch_size, consts.MIN_BATCH_SIZE))
if type(self.learning_rate).__name__ not in ["int", "float"]:
raise ValueError(
descr + "learning_rate {} not supported, should be float type".format(
self.learning_rate))
self.init_param.check()
if type(self.max_iter).__name__ != "int":
raise ValueError(
descr + "max_iter {} not supported, should be int type".format(self.max_iter))
elif self.max_iter <= 0:
raise ValueError(
descr + "max_iter must be greater or equal to 1")
if type(self.early_stop).__name__ != "str":
raise ValueError(
descr + "early_stop {} not supported, should be str type".format(
self.early_stop))
else:
self.early_stop = self.early_stop.lower()
if self.early_stop not in ['diff', 'abs', 'weight_diff']:
raise ValueError(
descr + "early_stop not supported, early_stop should be 'weight_diff', 'diff' or 'abs'")
self.encrypt_param.check()
if type(self.decay).__name__ not in ["int", "float"]:
raise ValueError(
descr + "decay {} not supported, should be 'int' or 'float'".format(self.decay)
)
if type(self.decay_sqrt).__name__ not in ["bool"]:
raise ValueError(
descr + "decay_sqrt {} not supported, should be 'bool'".format(self.decay)
)
self.stepwise_param.check()
for p in ["early_stopping_rounds", "validation_freqs", "metrics",
"use_first_metric_only"]:
if self._warn_to_deprecate_param(p, "", ""):
if "callback_param" in self.get_user_feeded():
raise ValueError(f"{p} and callback param should not be set simultaneously")
else:
self.callback_param.callbacks = ["PerformanceEvaluate"]
break
if self._warn_to_deprecate_param("validation_freqs", descr, "callback_param's 'validation_freqs'"):
self.callback_param.validation_freqs = self.validation_freqs
if self._warn_to_deprecate_param("early_stopping_rounds", descr, "callback_param's 'early_stopping_rounds'"):
self.callback_param.early_stopping_rounds = self.early_stopping_rounds
if self._warn_to_deprecate_param("metrics", descr, "callback_param's 'metrics'"):
self.callback_param.metrics = self.metrics
if self._warn_to_deprecate_param("use_first_metric_only", descr, "callback_param's 'use_first_metric_only'"):
self.callback_param.use_first_metric_only = self.use_first_metric_only
if self.floating_point_precision is not None and \
(not isinstance(self.floating_point_precision, int) or
self.floating_point_precision < 0 or self.floating_point_precision > 64):
raise ValueError("floating point precision should be null or a integer between 0 and 64")
self.callback_param.check()
return True
__init__(self, penalty='L2', tol=0.0001, alpha=1.0, optimizer='sgd', batch_size=-1, learning_rate=0.01, init_param=<federatedml.param.init_model_param.InitParam object at 0x7f3a40c5c550>, max_iter=100, early_stop='diff', encrypt_param=<federatedml.param.encrypt_param.EncryptParam object at 0x7f3a40c5ca90>, cv_param=<federatedml.param.cross_validation_param.CrossValidationParam object at 0x7f3a40c5cad0>, decay=1, decay_sqrt=True, validation_freqs=None, early_stopping_rounds=None, stepwise_param=<federatedml.param.stepwise_param.StepwiseParam object at 0x7f3a40c5cbd0>, metrics=None, use_first_metric_only=False, floating_point_precision=23, callback_param=<federatedml.param.callback_param.CallbackParam object at 0x7f3a40c5cb90>)
special
¶Source code in federatedml/param/glm_param.py
def __init__(self, penalty='L2',
tol=1e-4, alpha=1.0, optimizer='sgd',
batch_size=-1, learning_rate=0.01, init_param=InitParam(),
max_iter=100, early_stop='diff',
encrypt_param=EncryptParam(),
cv_param=CrossValidationParam(), decay=1, decay_sqrt=True, validation_freqs=None,
early_stopping_rounds=None, stepwise_param=StepwiseParam(), metrics=None, use_first_metric_only=False,
floating_point_precision=23, callback_param=CallbackParam()):
super(LinearModelParam, self).__init__()
self.penalty = penalty
self.tol = tol
self.alpha = alpha
self.optimizer = optimizer
self.batch_size = batch_size
self.learning_rate = learning_rate
self.init_param = copy.deepcopy(init_param)
self.max_iter = max_iter
self.early_stop = early_stop
self.encrypt_param = encrypt_param
self.cv_param = copy.deepcopy(cv_param)
self.decay = decay
self.decay_sqrt = decay_sqrt
self.validation_freqs = validation_freqs
self.early_stopping_rounds = early_stopping_rounds
self.stepwise_param = copy.deepcopy(stepwise_param)
self.metrics = metrics or []
self.use_first_metric_only = use_first_metric_only
self.floating_point_precision = floating_point_precision
self.callback_param = copy.deepcopy(callback_param)
check(self)
¶Source code in federatedml/param/glm_param.py
def check(self):
descr = "linear model param's "
if self.penalty is None:
self.penalty = 'NONE'
elif type(self.penalty).__name__ != "str":
raise ValueError(
descr + "penalty {} not supported, should be str type".format(self.penalty))
self.penalty = self.penalty.upper()
if self.penalty not in [consts.L1_PENALTY, consts.L2_PENALTY, consts.NONE.upper()]:
raise ValueError(
"penalty {} not supported, penalty should be 'L1', 'L2' or 'NONE'".format(self.penalty))
if type(self.tol).__name__ not in ["int", "float"]:
raise ValueError(
descr + "tol {} not supported, should be float type".format(self.tol))
if type(self.alpha).__name__ not in ["int", "float"]:
raise ValueError(
descr + "alpha {} not supported, should be float type".format(self.alpha))
if type(self.optimizer).__name__ != "str":
raise ValueError(
descr + "optimizer {} not supported, should be str type".format(self.optimizer))
else:
self.optimizer = self.optimizer.lower()
if self.optimizer not in ['sgd', 'rmsprop', 'adam', 'adagrad', 'sqn', 'nesterov_momentum_sgd']:
raise ValueError(
descr + "optimizer not supported, optimizer should be"
" 'sgd', 'rmsprop', 'adam', 'sqn', 'adagrad', or 'nesterov_momentum_sgd'")
if type(self.batch_size).__name__ not in ["int", "long"]:
raise ValueError(
descr + "batch_size {} not supported, should be int type".format(self.batch_size))
if self.batch_size != -1:
if type(self.batch_size).__name__ not in ["int", "long"] \
or self.batch_size < consts.MIN_BATCH_SIZE:
raise ValueError(descr + " {} not supported, should be larger than {} or "
"-1 represent for all data".format(self.batch_size, consts.MIN_BATCH_SIZE))
if type(self.learning_rate).__name__ not in ["int", "float"]:
raise ValueError(
descr + "learning_rate {} not supported, should be float type".format(
self.learning_rate))
self.init_param.check()
if type(self.max_iter).__name__ != "int":
raise ValueError(
descr + "max_iter {} not supported, should be int type".format(self.max_iter))
elif self.max_iter <= 0:
raise ValueError(
descr + "max_iter must be greater or equal to 1")
if type(self.early_stop).__name__ != "str":
raise ValueError(
descr + "early_stop {} not supported, should be str type".format(
self.early_stop))
else:
self.early_stop = self.early_stop.lower()
if self.early_stop not in ['diff', 'abs', 'weight_diff']:
raise ValueError(
descr + "early_stop not supported, early_stop should be 'weight_diff', 'diff' or 'abs'")
self.encrypt_param.check()
if type(self.decay).__name__ not in ["int", "float"]:
raise ValueError(
descr + "decay {} not supported, should be 'int' or 'float'".format(self.decay)
)
if type(self.decay_sqrt).__name__ not in ["bool"]:
raise ValueError(
descr + "decay_sqrt {} not supported, should be 'bool'".format(self.decay)
)
self.stepwise_param.check()
for p in ["early_stopping_rounds", "validation_freqs", "metrics",
"use_first_metric_only"]:
if self._warn_to_deprecate_param(p, "", ""):
if "callback_param" in self.get_user_feeded():
raise ValueError(f"{p} and callback param should not be set simultaneously")
else:
self.callback_param.callbacks = ["PerformanceEvaluate"]
break
if self._warn_to_deprecate_param("validation_freqs", descr, "callback_param's 'validation_freqs'"):
self.callback_param.validation_freqs = self.validation_freqs
if self._warn_to_deprecate_param("early_stopping_rounds", descr, "callback_param's 'early_stopping_rounds'"):
self.callback_param.early_stopping_rounds = self.early_stopping_rounds
if self._warn_to_deprecate_param("metrics", descr, "callback_param's 'metrics'"):
self.callback_param.metrics = self.metrics
if self._warn_to_deprecate_param("use_first_metric_only", descr, "callback_param's 'use_first_metric_only'"):
self.callback_param.use_first_metric_only = self.use_first_metric_only
if self.floating_point_precision is not None and \
(not isinstance(self.floating_point_precision, int) or
self.floating_point_precision < 0 or self.floating_point_precision > 64):
raise ValueError("floating point precision should be null or a integer between 0 and 64")
self.callback_param.check()
return True
hetero_kmeans_param
¶
Classes¶
KmeansParam (BaseParam)
¶Parameters used for K-means.¶
k : int, default 5 The number of the centroids to generate. should be larger than 1 and less than 100 in this version max_iter : int, default 300. Maximum number of iterations of the hetero-k-means algorithm to run. tol : float, default 0.001. tol random_stat : None or int random seed
Source code in federatedml/param/hetero_kmeans_param.py
class KmeansParam(BaseParam):
"""
Parameters used for K-means.
----------
k : int, default 5
The number of the centroids to generate.
should be larger than 1 and less than 100 in this version
max_iter : int, default 300.
Maximum number of iterations of the hetero-k-means algorithm to run.
tol : float, default 0.001.
tol
random_stat : None or int
random seed
"""
def __init__(self, k=5, max_iter=300, tol=0.001, random_stat=None):
super(KmeansParam, self).__init__()
self.k = k
self.max_iter = max_iter
self.tol = tol
self.random_stat = random_stat
def check(self):
descr = "Kmeans_param's"
if not isinstance(self.k, int):
raise ValueError(
descr + "k {} not supported, should be int type".format(self.k))
elif self.k <= 1:
raise ValueError(
descr + "k {} not supported, should be larger than 1")
elif self.k > 100:
raise ValueError(
descr + "k {} not supported, should be less than 100 in this version")
if not isinstance(self.max_iter, int):
raise ValueError(
descr + "max_iter not supported, should be int type".format(self.max_iter))
elif self.max_iter <= 0:
raise ValueError(
descr + "max_iter not supported, should be larger than 0".format(self.max_iter))
if not isinstance(self.tol, (float, int)):
raise ValueError(
descr + "tol not supported, should be float type".format(self.tol))
elif self.tol < 0:
raise ValueError(
descr + "tol not supported, should be larger than or equal to 0".format(self.tol))
if self.random_stat is not None:
if not isinstance(self.random_stat, int):
raise ValueError(descr + "random_stat not supported, should be int type".format(self.random_stat))
elif self.random_stat < 0:
raise ValueError(
descr + "random_stat not supported, should be larger than/equal to 0".format(self.random_stat))
__init__(self, k=5, max_iter=300, tol=0.001, random_stat=None)
special
¶Source code in federatedml/param/hetero_kmeans_param.py
def __init__(self, k=5, max_iter=300, tol=0.001, random_stat=None):
super(KmeansParam, self).__init__()
self.k = k
self.max_iter = max_iter
self.tol = tol
self.random_stat = random_stat
check(self)
¶Source code in federatedml/param/hetero_kmeans_param.py
def check(self):
descr = "Kmeans_param's"
if not isinstance(self.k, int):
raise ValueError(
descr + "k {} not supported, should be int type".format(self.k))
elif self.k <= 1:
raise ValueError(
descr + "k {} not supported, should be larger than 1")
elif self.k > 100:
raise ValueError(
descr + "k {} not supported, should be less than 100 in this version")
if not isinstance(self.max_iter, int):
raise ValueError(
descr + "max_iter not supported, should be int type".format(self.max_iter))
elif self.max_iter <= 0:
raise ValueError(
descr + "max_iter not supported, should be larger than 0".format(self.max_iter))
if not isinstance(self.tol, (float, int)):
raise ValueError(
descr + "tol not supported, should be float type".format(self.tol))
elif self.tol < 0:
raise ValueError(
descr + "tol not supported, should be larger than or equal to 0".format(self.tol))
if self.random_stat is not None:
if not isinstance(self.random_stat, int):
raise ValueError(descr + "random_stat not supported, should be int type".format(self.random_stat))
elif self.random_stat < 0:
raise ValueError(
descr + "random_stat not supported, should be larger than/equal to 0".format(self.random_stat))
hetero_nn_param
¶
Classes¶
SelectorParam
¶Parameters used for Homo Neural Network.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
method |
None or str back propagation select method, accept "relative" only, default: None |
None |
|
selective_size |
int deque size to use, store the most recent selective_size historical loss, default: 1024 |
1024 |
|
beta |
int sample whose selective probability >= power(np.random, beta) will be selected |
1 |
|
min_prob |
Numeric selective probability is max(min_prob, rank_rate) |
0 |
Source code in federatedml/param/hetero_nn_param.py
class SelectorParam(object):
"""
Parameters used for Homo Neural Network.
Args:
method: None or str
back propagation select method, accept "relative" only, default: None
selective_size: int
deque size to use, store the most recent selective_size historical loss, default: 1024
beta: int
sample whose selective probability >= power(np.random, beta) will be selected
min_prob: Numeric
selective probability is max(min_prob, rank_rate)
"""
def __init__(self, method=None, beta=1, selective_size=consts.SELECTIVE_SIZE, min_prob=0, random_state=None):
self.method = method
self.selective_size = selective_size
self.beta = beta
self.min_prob = min_prob
self.random_state = random_state
def check(self):
if self.method is not None and self.method not in ["relative"]:
raise ValueError('selective method should be None be "relative"')
if not isinstance(self.selective_size, int) or self.selective_size <= 0:
raise ValueError("selective size should be a positive integer")
if not isinstance(self.beta, int):
raise ValueError("beta should be integer")
if not isinstance(self.min_prob, (float, int)):
raise ValueError("min_prob should be numeric")
__init__(self, method=None, beta=1, selective_size=1024, min_prob=0, random_state=None)
special
¶Source code in federatedml/param/hetero_nn_param.py
def __init__(self, method=None, beta=1, selective_size=consts.SELECTIVE_SIZE, min_prob=0, random_state=None):
self.method = method
self.selective_size = selective_size
self.beta = beta
self.min_prob = min_prob
self.random_state = random_state
check(self)
¶Source code in federatedml/param/hetero_nn_param.py
def check(self):
if self.method is not None and self.method not in ["relative"]:
raise ValueError('selective method should be None be "relative"')
if not isinstance(self.selective_size, int) or self.selective_size <= 0:
raise ValueError("selective size should be a positive integer")
if not isinstance(self.beta, int):
raise ValueError("beta should be integer")
if not isinstance(self.min_prob, (float, int)):
raise ValueError("min_prob should be numeric")
HeteroNNParam (BaseParam)
¶Parameters used for Hetero Neural Network.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
task_type |
str, task type of hetero nn model, one of 'classification', 'regression'. |
'classification' |
|
config_type |
str, accept "keras" only. |
'keras' |
|
bottom_nn_define |
a dict represents the structure of bottom neural network. |
None |
|
interactive_layer_define |
a dict represents the structure of interactive layer. |
None |
|
interactive_layer_lr |
float, the learning rate of interactive layer. |
0.9 |
|
top_nn_define |
a dict represents the structure of top neural network. |
None |
|
optimizer |
optimizer method, accept following types: 1. a string, one of "Adadelta", "Adagrad", "Adam", "Adamax", "Nadam", "RMSprop", "SGD" 2. a dict, with a required key-value pair keyed by "optimizer", with optional key-value pairs such as learning rate. defaults to "SGD" |
'SGD' |
|
loss |
str, a string to define loss function used |
None |
|
epochs |
int, the maximum iteration for aggregation in training. |
100 |
|
batch_size |
int, batch size when updating model. -1 means use all data in a batch. i.e. Not to use mini-batch strategy. defaults to -1. |
-1 |
|
early_stop |
str, accept 'diff' only in this version, default: 'diff' Method used to judge converge or not. a) diff: Use difference of loss between two iterations to judge whether converge. |
'diff' |
|
floating_point_precision |
None or integer, if not None, means use floating_point_precision-bit to speed up calculation, e.g.: convert an x to round(x * 2**floating_point_precision) during Paillier operation, divide the result by 2**floating_point_precision in the end. |
23 |
|
drop_out_keep_rate |
float, should betweend 0 and 1, if not equals to 1.0, will enabled drop out |
1.0 |
|
callback_param |
CallbackParam object |
<federatedml.param.callback_param.CallbackParam object at 0x7f3a40bd2e50> |
Source code in federatedml/param/hetero_nn_param.py
class HeteroNNParam(BaseParam):
"""
Parameters used for Hetero Neural Network.
Args:
task_type: str, task type of hetero nn model, one of 'classification', 'regression'.
config_type: str, accept "keras" only.
bottom_nn_define: a dict represents the structure of bottom neural network.
interactive_layer_define: a dict represents the structure of interactive layer.
interactive_layer_lr: float, the learning rate of interactive layer.
top_nn_define: a dict represents the structure of top neural network.
optimizer: optimizer method, accept following types:
1. a string, one of "Adadelta", "Adagrad", "Adam", "Adamax", "Nadam", "RMSprop", "SGD"
2. a dict, with a required key-value pair keyed by "optimizer",
with optional key-value pairs such as learning rate.
defaults to "SGD"
loss: str, a string to define loss function used
epochs: int, the maximum iteration for aggregation in training.
batch_size : int, batch size when updating model.
-1 means use all data in a batch. i.e. Not to use mini-batch strategy.
defaults to -1.
early_stop : str, accept 'diff' only in this version, default: 'diff'
Method used to judge converge or not.
a) diff: Use difference of loss between two iterations to judge whether converge.
floating_point_precision: None or integer, if not None, means use floating_point_precision-bit to speed up calculation,
e.g.: convert an x to round(x * 2**floating_point_precision) during Paillier operation, divide
the result by 2**floating_point_precision in the end.
drop_out_keep_rate: float, should betweend 0 and 1, if not equals to 1.0, will enabled drop out
callback_param: CallbackParam object
"""
def __init__(self,
task_type='classification',
config_type="keras",
bottom_nn_define=None,
top_nn_define=None,
interactive_layer_define=None,
interactive_layer_lr=0.9,
optimizer='SGD',
loss=None,
epochs=100,
batch_size=-1,
early_stop="diff",
tol=1e-5,
encrypt_param=EncryptParam(),
encrypted_mode_calculator_param=EncryptedModeCalculatorParam(),
predict_param=PredictParam(),
cv_param=CrossValidationParam(),
validation_freqs=None,
early_stopping_rounds=None,
metrics=None,
use_first_metric_only=True,
selector_param=SelectorParam(),
floating_point_precision=23,
drop_out_keep_rate=1.0,
callback_param=CallbackParam()):
super(HeteroNNParam, self).__init__()
self.task_type = task_type
self.config_type = config_type
self.bottom_nn_define = bottom_nn_define
self.interactive_layer_define = interactive_layer_define
self.interactive_layer_lr = interactive_layer_lr
self.top_nn_define = top_nn_define
self.batch_size = batch_size
self.epochs = epochs
self.early_stop = early_stop
self.tol = tol
self.optimizer = optimizer
self.loss = loss
self.validation_freqs = validation_freqs
self.early_stopping_rounds = early_stopping_rounds
self.metrics = metrics or []
self.use_first_metric_only = use_first_metric_only
self.encrypt_param = copy.deepcopy(encrypt_param)
self.encrypted_model_calculator_param = encrypted_mode_calculator_param
self.predict_param = copy.deepcopy(predict_param)
self.cv_param = copy.deepcopy(cv_param)
self.selector_param = selector_param
self.floating_point_precision = floating_point_precision
self.drop_out_keep_rate = drop_out_keep_rate
self.callback_param = copy.deepcopy(callback_param)
def check(self):
self.optimizer = self._parse_optimizer(self.optimizer)
supported_config_type = ["keras"]
if self.task_type not in ["classification", "regression"]:
raise ValueError("config_type should be classification or regression")
if self.config_type not in supported_config_type:
raise ValueError(f"config_type should be one of {supported_config_type}")
if not isinstance(self.tol, (int, float)):
raise ValueError("tol should be numeric")
if not isinstance(self.epochs, int) or self.epochs <= 0:
raise ValueError("epochs should be a positive integer")
if self.bottom_nn_define and not isinstance(self.bottom_nn_define, dict):
raise ValueError("bottom_nn_define should be a dict defining the structure of neural network")
if self.top_nn_define and not isinstance(self.top_nn_define, dict):
raise ValueError("top_nn_define should be a dict defining the structure of neural network")
if self.interactive_layer_define is not None and not isinstance(self.interactive_layer_define, dict):
raise ValueError(
"the interactive_layer_define should be a dict defining the structure of interactive layer")
if self.batch_size != -1:
if not isinstance(self.batch_size, int) \
or self.batch_size < consts.MIN_BATCH_SIZE:
raise ValueError(
" {} not supported, should be larger than 10 or -1 represent for all data".format(self.batch_size))
if self.early_stop != "diff":
raise ValueError("early stop should be diff in this version")
if self.metrics is not None and not isinstance(self.metrics, list):
raise ValueError("metrics should be a list")
if self.floating_point_precision is not None and \
(not isinstance(self.floating_point_precision, int) or
self.floating_point_precision < 0 or self.floating_point_precision > 63):
raise ValueError("floating point precision should be null or a integer between 0 and 63")
if not isinstance(self.drop_out_keep_rate, (float, int)) or self.drop_out_keep_rate < 0.0 or \
self.drop_out_keep_rate > 1.0:
raise ValueError("drop_out_keep_rate should be in range [0.0, 1.0]")
self.encrypt_param.check()
self.encrypted_model_calculator_param.check()
self.predict_param.check()
self.selector_param.check()
descr = "hetero nn param's "
for p in ["early_stopping_rounds", "validation_freqs",
"use_first_metric_only"]:
if self._deprecated_params_set.get(p):
if "callback_param" in self.get_user_feeded():
raise ValueError(f"{p} and callback param should not be set simultaneously,"
f"{self._deprecated_params_set}, {self.get_user_feeded()}")
else:
self.callback_param.callbacks = ["PerformanceEvaluate"]
break
if self._warn_to_deprecate_param("validation_freqs", descr, "callback_param's 'validation_freqs'"):
self.callback_param.validation_freqs = self.validation_freqs
if self._warn_to_deprecate_param("early_stopping_rounds", descr, "callback_param's 'early_stopping_rounds'"):
self.callback_param.early_stopping_rounds = self.early_stopping_rounds
if self._warn_to_deprecate_param("metrics", descr, "callback_param's 'metrics'"):
if self.metrics:
self.callback_param.metrics = self.metrics
if self._warn_to_deprecate_param("use_first_metric_only", descr, "callback_param's 'use_first_metric_only'"):
self.callback_param.use_first_metric_only = self.use_first_metric_only
@staticmethod
def _parse_optimizer(opt):
"""
Examples:
1. "optimize": "SGD"
2. "optimize": {
"optimizer": "SGD",
"learning_rate": 0.05
}
"""
kwargs = {}
if isinstance(opt, str):
return SimpleNamespace(optimizer=opt, kwargs=kwargs)
elif isinstance(opt, dict):
optimizer = opt.get("optimizer", kwargs)
if not optimizer:
raise ValueError(f"optimizer config: {opt} invalid")
kwargs = {k: v for k, v in opt.items() if k != "optimizer"}
return SimpleNamespace(optimizer=optimizer, kwargs=kwargs)
elif opt is None:
return None
else:
raise ValueError(f"invalid type for optimize: {type(opt)}")
__init__(self, task_type='classification', config_type='keras', bottom_nn_define=None, top_nn_define=None, interactive_layer_define=None, interactive_layer_lr=0.9, optimizer='SGD', loss=None, epochs=100, batch_size=-1, early_stop='diff', tol=1e-05, encrypt_param=<federatedml.param.encrypt_param.EncryptParam object at 0x7f3a40bd2c10>, encrypted_mode_calculator_param=<federatedml.param.encrypted_mode_calculation_param.EncryptedModeCalculatorParam object at 0x7f3a40bd2dd0>, predict_param=<federatedml.param.predict_param.PredictParam object at 0x7f3a40bd2e90>, cv_param=<federatedml.param.cross_validation_param.CrossValidationParam object at 0x7f3a40bd2d10>, validation_freqs=None, early_stopping_rounds=None, metrics=None, use_first_metric_only=True, selector_param=<federatedml.param.hetero_nn_param.SelectorParam object at 0x7f3a40bd2f90>, floating_point_precision=23, drop_out_keep_rate=1.0, callback_param=<federatedml.param.callback_param.CallbackParam object at 0x7f3a40bd2e50>)
special
¶Source code in federatedml/param/hetero_nn_param.py
def __init__(self,
task_type='classification',
config_type="keras",
bottom_nn_define=None,
top_nn_define=None,
interactive_layer_define=None,
interactive_layer_lr=0.9,
optimizer='SGD',
loss=None,
epochs=100,
batch_size=-1,
early_stop="diff",
tol=1e-5,
encrypt_param=EncryptParam(),
encrypted_mode_calculator_param=EncryptedModeCalculatorParam(),
predict_param=PredictParam(),
cv_param=CrossValidationParam(),
validation_freqs=None,
early_stopping_rounds=None,
metrics=None,
use_first_metric_only=True,
selector_param=SelectorParam(),
floating_point_precision=23,
drop_out_keep_rate=1.0,
callback_param=CallbackParam()):
super(HeteroNNParam, self).__init__()
self.task_type = task_type
self.config_type = config_type
self.bottom_nn_define = bottom_nn_define
self.interactive_layer_define = interactive_layer_define
self.interactive_layer_lr = interactive_layer_lr
self.top_nn_define = top_nn_define
self.batch_size = batch_size
self.epochs = epochs
self.early_stop = early_stop
self.tol = tol
self.optimizer = optimizer
self.loss = loss
self.validation_freqs = validation_freqs
self.early_stopping_rounds = early_stopping_rounds
self.metrics = metrics or []
self.use_first_metric_only = use_first_metric_only
self.encrypt_param = copy.deepcopy(encrypt_param)
self.encrypted_model_calculator_param = encrypted_mode_calculator_param
self.predict_param = copy.deepcopy(predict_param)
self.cv_param = copy.deepcopy(cv_param)
self.selector_param = selector_param
self.floating_point_precision = floating_point_precision
self.drop_out_keep_rate = drop_out_keep_rate
self.callback_param = copy.deepcopy(callback_param)
check(self)
¶Source code in federatedml/param/hetero_nn_param.py
def check(self):
self.optimizer = self._parse_optimizer(self.optimizer)
supported_config_type = ["keras"]
if self.task_type not in ["classification", "regression"]:
raise ValueError("config_type should be classification or regression")
if self.config_type not in supported_config_type:
raise ValueError(f"config_type should be one of {supported_config_type}")
if not isinstance(self.tol, (int, float)):
raise ValueError("tol should be numeric")
if not isinstance(self.epochs, int) or self.epochs <= 0:
raise ValueError("epochs should be a positive integer")
if self.bottom_nn_define and not isinstance(self.bottom_nn_define, dict):
raise ValueError("bottom_nn_define should be a dict defining the structure of neural network")
if self.top_nn_define and not isinstance(self.top_nn_define, dict):
raise ValueError("top_nn_define should be a dict defining the structure of neural network")
if self.interactive_layer_define is not None and not isinstance(self.interactive_layer_define, dict):
raise ValueError(
"the interactive_layer_define should be a dict defining the structure of interactive layer")
if self.batch_size != -1:
if not isinstance(self.batch_size, int) \
or self.batch_size < consts.MIN_BATCH_SIZE:
raise ValueError(
" {} not supported, should be larger than 10 or -1 represent for all data".format(self.batch_size))
if self.early_stop != "diff":
raise ValueError("early stop should be diff in this version")
if self.metrics is not None and not isinstance(self.metrics, list):
raise ValueError("metrics should be a list")
if self.floating_point_precision is not None and \
(not isinstance(self.floating_point_precision, int) or
self.floating_point_precision < 0 or self.floating_point_precision > 63):
raise ValueError("floating point precision should be null or a integer between 0 and 63")
if not isinstance(self.drop_out_keep_rate, (float, int)) or self.drop_out_keep_rate < 0.0 or \
self.drop_out_keep_rate > 1.0:
raise ValueError("drop_out_keep_rate should be in range [0.0, 1.0]")
self.encrypt_param.check()
self.encrypted_model_calculator_param.check()
self.predict_param.check()
self.selector_param.check()
descr = "hetero nn param's "
for p in ["early_stopping_rounds", "validation_freqs",
"use_first_metric_only"]:
if self._deprecated_params_set.get(p):
if "callback_param" in self.get_user_feeded():
raise ValueError(f"{p} and callback param should not be set simultaneously,"
f"{self._deprecated_params_set}, {self.get_user_feeded()}")
else:
self.callback_param.callbacks = ["PerformanceEvaluate"]
break
if self._warn_to_deprecate_param("validation_freqs", descr, "callback_param's 'validation_freqs'"):
self.callback_param.validation_freqs = self.validation_freqs
if self._warn_to_deprecate_param("early_stopping_rounds", descr, "callback_param's 'early_stopping_rounds'"):
self.callback_param.early_stopping_rounds = self.early_stopping_rounds
if self._warn_to_deprecate_param("metrics", descr, "callback_param's 'metrics'"):
if self.metrics:
self.callback_param.metrics = self.metrics
if self._warn_to_deprecate_param("use_first_metric_only", descr, "callback_param's 'use_first_metric_only'"):
self.callback_param.use_first_metric_only = self.use_first_metric_only
hetero_sshe_linr_param
¶
Classes¶
HeteroSSHELinRParam (LinearModelParam)
¶Parameters used for Hetero SSHE Linear Regression.
Parameters¶
penalty : {'L2' or 'L1'} Penalty method used in LinR. Please note that, when using encrypted version in HeteroLinR, 'L1' is not supported.
tol : float, default: 1e-4 The tolerance of convergence
alpha : float, default: 1.0 Regularization strength coefficient.
optimizer : {'sgd', 'rmsprop', 'adam', 'adagrad'} Optimize method
batch_size : int, default: -1 Batch size when updating model. -1 means use all data in a batch. i.e. Not to use mini-batch strategy.
learning_rate : float, default: 0.01 Learning rate
max_iter : int, default: 20 The maximum iteration for training.
InitParam object, default: default InitParam object
Init param method object.
early_stop : {'diff', 'abs', 'weight_dff'} Method used to judge convergence. a) diff: Use difference of loss between two iterations to judge whether converge. b) abs: Use the absolute value of loss to judge whether converge. i.e. if loss < tol, it is converged. c) weight_diff: Use difference between weights of two consecutive iterations
EncryptParam object, default: default EncryptParam object
encrypt param
EncryptedModeCalculatorParam object, default: default EncryptedModeCalculatorParam object
encrypted mode calculator param
CrossValidationParam object, default: default CrossValidationParam object
cv param
int or float, default: 1
Decay rate for learning rate. learning rate will follow the following decay schedule. lr = lr0/(1+decay*t) if decay_sqrt is False. If decay_sqrt is True, lr = lr0 / sqrt(1+decay*t) where t is the iter number.
Bool, default: True
lr = lr0/(1+decay*t) if decay_sqrt is False, otherwise, lr = lr0 / sqrt(1+decay*t)
CallbackParam object
callback param
str, "respectively", "encrypted_reveal_in_host", default: "respectively"
"respectively": Means guest and host can reveal their own part of weights only. "encrypted_reveal_in_host": Means host can be revealed his weights in encrypted mode, and guest can be revealed in normal mode.
bool, default: False
Whether reconstruct model weights every iteration. If so, Regularization is available. The performance will be better as well since the algorithm process is simplified.
Source code in federatedml/param/hetero_sshe_linr_param.py
class HeteroSSHELinRParam(LinearModelParam):
"""
Parameters used for Hetero SSHE Linear Regression.
Parameters
----------
penalty : {'L2' or 'L1'}
Penalty method used in LinR. Please note that, when using encrypted version in HeteroLinR,
'L1' is not supported.
tol : float, default: 1e-4
The tolerance of convergence
alpha : float, default: 1.0
Regularization strength coefficient.
optimizer : {'sgd', 'rmsprop', 'adam', 'adagrad'}
Optimize method
batch_size : int, default: -1
Batch size when updating model. -1 means use all data in a batch. i.e. Not to use mini-batch strategy.
learning_rate : float, default: 0.01
Learning rate
max_iter : int, default: 20
The maximum iteration for training.
init_param: InitParam object, default: default InitParam object
Init param method object.
early_stop : {'diff', 'abs', 'weight_dff'}
Method used to judge convergence.
a) diff: Use difference of loss between two iterations to judge whether converge.
b) abs: Use the absolute value of loss to judge whether converge. i.e. if loss < tol, it is converged.
c) weight_diff: Use difference between weights of two consecutive iterations
encrypt_param: EncryptParam object, default: default EncryptParam object
encrypt param
encrypted_mode_calculator_param: EncryptedModeCalculatorParam object, default: default EncryptedModeCalculatorParam object
encrypted mode calculator param
cv_param: CrossValidationParam object, default: default CrossValidationParam object
cv param
decay: int or float, default: 1
Decay rate for learning rate. learning rate will follow the following decay schedule.
lr = lr0/(1+decay*t) if decay_sqrt is False. If decay_sqrt is True, lr = lr0 / sqrt(1+decay*t)
where t is the iter number.
decay_sqrt: Bool, default: True
lr = lr0/(1+decay*t) if decay_sqrt is False, otherwise, lr = lr0 / sqrt(1+decay*t)
callback_param: CallbackParam object
callback param
reveal_strategy: str, "respectively", "encrypted_reveal_in_host", default: "respectively"
"respectively": Means guest and host can reveal their own part of weights only.
"encrypted_reveal_in_host": Means host can be revealed his weights in encrypted mode, and guest can be revealed in normal mode.
reveal_every_iter: bool, default: False
Whether reconstruct model weights every iteration. If so, Regularization is available.
The performance will be better as well since the algorithm process is simplified.
"""
def __init__(self, penalty='L2',
tol=1e-4, alpha=1.0, optimizer='sgd',
batch_size=-1, learning_rate=0.01, init_param=InitParam(),
max_iter=20, early_stop='diff',
encrypt_param=EncryptParam(),
encrypted_mode_calculator_param=EncryptedModeCalculatorParam(),
cv_param=CrossValidationParam(), decay=1, decay_sqrt=True,
callback_param=CallbackParam(),
use_mix_rand=True,
reveal_strategy="respectively",
reveal_every_iter=False
):
super(HeteroSSHELinRParam, self).__init__(penalty=penalty, tol=tol, alpha=alpha, optimizer=optimizer,
batch_size=batch_size, learning_rate=learning_rate,
init_param=init_param, max_iter=max_iter, early_stop=early_stop,
encrypt_param=encrypt_param, cv_param=cv_param, decay=decay,
decay_sqrt=decay_sqrt,
callback_param=callback_param)
self.encrypted_mode_calculator_param = copy.deepcopy(encrypted_mode_calculator_param)
self.use_mix_rand = use_mix_rand
self.reveal_strategy = reveal_strategy
self.reveal_every_iter = reveal_every_iter
def check(self):
descr = "sshe linear_regression_param's "
super(HeteroSSHELinRParam, self).check()
if self.encrypt_param.method != consts.PAILLIER:
raise ValueError(
descr + "encrypt method supports 'Paillier' only")
self.check_boolean(self.reveal_every_iter, descr)
if self.penalty is None:
pass
elif type(self.penalty).__name__ != "str":
raise ValueError(
f"{descr} penalty {self.penalty} not supported, should be str type")
else:
self.penalty = self.penalty.upper()
"""
if self.penalty not in [consts.L1_PENALTY, consts.L2_PENALTY]:
raise ValueError(
"logistic_param's penalty not supported, penalty should be 'L1', 'L2' or 'none'")
"""
if not self.reveal_every_iter:
if self.penalty not in [consts.L2_PENALTY, consts.NONE.upper()]:
raise ValueError(
f"penalty should be 'L2' or 'none', when reveal_every_iter is False"
)
if type(self.optimizer).__name__ != "str":
raise ValueError(
f"{descr} optimizer {self.optimizer} not supported, should be str type")
else:
self.optimizer = self.optimizer.lower()
if self.reveal_every_iter:
if self.optimizer not in ['sgd', 'rmsprop', 'adam', 'adagrad']:
raise ValueError(
"When reveal_every_iter is True, "
f"{descr} optimizer not supported, optimizer should be"
" 'sgd', 'rmsprop', 'adam', or 'adagrad'")
else:
if self.optimizer not in ['sgd']:
raise ValueError("When reveal_every_iter is False, "
f"{descr} optimizer not supported, optimizer should be"
" 'sgd'")
if self.callback_param.validation_freqs is not None:
if self.reveal_every_iter is False:
raise ValueError(f"When reveal_every_iter is False, validation every iter"
f" is not supported.")
self.reveal_strategy = self.check_and_change_lower(self.reveal_strategy,
["respectively", "encrypted_reveal_in_host"],
f"{descr} reveal_strategy")
if self.reveal_strategy == "encrypted_reveal_in_host" and self.reveal_every_iter:
raise PermissionError("reveal strategy: encrypted_reveal_in_host mode is not allow to reveal every iter.")
return True
__init__(self, penalty='L2', tol=0.0001, alpha=1.0, optimizer='sgd', batch_size=-1, learning_rate=0.01, init_param=<federatedml.param.init_model_param.InitParam object at 0x7f3a40ab6850>, max_iter=20, early_stop='diff', encrypt_param=<federatedml.param.encrypt_param.EncryptParam object at 0x7f3a40ab6c90>, encrypted_mode_calculator_param=<federatedml.param.encrypted_mode_calculation_param.EncryptedModeCalculatorParam object at 0x7f3a40ab6c10>, cv_param=<federatedml.param.cross_validation_param.CrossValidationParam object at 0x7f3a40ab6710>, decay=1, decay_sqrt=True, callback_param=<federatedml.param.callback_param.CallbackParam object at 0x7f3a40b34150>, use_mix_rand=True, reveal_strategy='respectively', reveal_every_iter=False)
special
¶Source code in federatedml/param/hetero_sshe_linr_param.py
def __init__(self, penalty='L2',
tol=1e-4, alpha=1.0, optimizer='sgd',
batch_size=-1, learning_rate=0.01, init_param=InitParam(),
max_iter=20, early_stop='diff',
encrypt_param=EncryptParam(),
encrypted_mode_calculator_param=EncryptedModeCalculatorParam(),
cv_param=CrossValidationParam(), decay=1, decay_sqrt=True,
callback_param=CallbackParam(),
use_mix_rand=True,
reveal_strategy="respectively",
reveal_every_iter=False
):
super(HeteroSSHELinRParam, self).__init__(penalty=penalty, tol=tol, alpha=alpha, optimizer=optimizer,
batch_size=batch_size, learning_rate=learning_rate,
init_param=init_param, max_iter=max_iter, early_stop=early_stop,
encrypt_param=encrypt_param, cv_param=cv_param, decay=decay,
decay_sqrt=decay_sqrt,
callback_param=callback_param)
self.encrypted_mode_calculator_param = copy.deepcopy(encrypted_mode_calculator_param)
self.use_mix_rand = use_mix_rand
self.reveal_strategy = reveal_strategy
self.reveal_every_iter = reveal_every_iter
check(self)
¶Source code in federatedml/param/hetero_sshe_linr_param.py
def check(self):
descr = "sshe linear_regression_param's "
super(HeteroSSHELinRParam, self).check()
if self.encrypt_param.method != consts.PAILLIER:
raise ValueError(
descr + "encrypt method supports 'Paillier' only")
self.check_boolean(self.reveal_every_iter, descr)
if self.penalty is None:
pass
elif type(self.penalty).__name__ != "str":
raise ValueError(
f"{descr} penalty {self.penalty} not supported, should be str type")
else:
self.penalty = self.penalty.upper()
"""
if self.penalty not in [consts.L1_PENALTY, consts.L2_PENALTY]:
raise ValueError(
"logistic_param's penalty not supported, penalty should be 'L1', 'L2' or 'none'")
"""
if not self.reveal_every_iter:
if self.penalty not in [consts.L2_PENALTY, consts.NONE.upper()]:
raise ValueError(
f"penalty should be 'L2' or 'none', when reveal_every_iter is False"
)
if type(self.optimizer).__name__ != "str":
raise ValueError(
f"{descr} optimizer {self.optimizer} not supported, should be str type")
else:
self.optimizer = self.optimizer.lower()
if self.reveal_every_iter:
if self.optimizer not in ['sgd', 'rmsprop', 'adam', 'adagrad']:
raise ValueError(
"When reveal_every_iter is True, "
f"{descr} optimizer not supported, optimizer should be"
" 'sgd', 'rmsprop', 'adam', or 'adagrad'")
else:
if self.optimizer not in ['sgd']:
raise ValueError("When reveal_every_iter is False, "
f"{descr} optimizer not supported, optimizer should be"
" 'sgd'")
if self.callback_param.validation_freqs is not None:
if self.reveal_every_iter is False:
raise ValueError(f"When reveal_every_iter is False, validation every iter"
f" is not supported.")
self.reveal_strategy = self.check_and_change_lower(self.reveal_strategy,
["respectively", "encrypted_reveal_in_host"],
f"{descr} reveal_strategy")
if self.reveal_strategy == "encrypted_reveal_in_host" and self.reveal_every_iter:
raise PermissionError("reveal strategy: encrypted_reveal_in_host mode is not allow to reveal every iter.")
return True
hetero_sshe_lr_param
¶
Classes¶
HeteroSSHELRParam (LogisticParam)
¶Parameters used for Hetero SSHE Logistic Regression
Parameters¶
penalty : str, 'L1', 'L2' or None. default: 'L2' Penalty method used in LR. If it is not None, weights are required to be reconstruct every iter.
tol : float, default: 1e-4 The tolerance of convergence
alpha : float, default: 1.0 Regularization strength coefficient.
optimizer : str, 'sgd', 'rmsprop', 'adam', 'nesterov_momentum_sgd', or 'adagrad', default: 'sgd' Optimizer
batch_size : int, default: -1 Batch size when updating model. -1 means use all data in a batch. i.e. Not to use mini-batch strategy.
learning_rate : float, default: 0.01 Learning rate
max_iter : int, default: 100 The maximum iteration for training.
early_stop : str, 'diff', 'weight_diff' or 'abs', default: 'diff' Method used to judge converge or not. a) diff: Use difference of loss between two iterations to judge whether converge. b) weight_diff: Use difference between weights of two consecutive iterations c) abs: Use the absolute value of loss to judge whether converge. i.e. if loss < eps, it is converged.
int or float, default: 1
Decay rate for learning rate. learning rate will follow the following decay schedule. lr = lr0/(1+decay*t) if decay_sqrt is False. If decay_sqrt is True, lr = lr0 / sqrt(1+decay*t) where t is the iter number.
Bool, default: True
lr = lr0/(1+decay*t) if decay_sqrt is False, otherwise, lr = lr0 / sqrt(1+decay*t)
EncryptParam object, default: default EncryptParam object
encrypt param
PredictParam object, default: default PredictParam object
predict param
CrossValidationParam object, default: default CrossValidationParam object
cv param
str, 'ovr', default: 'ovr'
If it is a multi_class task, indicate what strategy to use. Currently, support 'ovr' short for one_vs_rest only.
str, "respectively", "encrypted_reveal_in_host", default: "respectively"
"respectively": Means guest and host can reveal their own part of weights only. "encrypted_reveal_in_host": Means host can be revealed his weights in encrypted mode, and guest can be revealed in normal mode.
bool, default: False
Whether reconstruct model weights every iteration. If so, Regularization is available. The performance will be better as well since the algorithm process is simplified.
Source code in federatedml/param/hetero_sshe_lr_param.py
class HeteroSSHELRParam(LogisticParam):
"""
Parameters used for Hetero SSHE Logistic Regression
Parameters
----------
penalty : str, 'L1', 'L2' or None. default: 'L2'
Penalty method used in LR. If it is not None, weights are required to be reconstruct every iter.
tol : float, default: 1e-4
The tolerance of convergence
alpha : float, default: 1.0
Regularization strength coefficient.
optimizer : str, 'sgd', 'rmsprop', 'adam', 'nesterov_momentum_sgd', or 'adagrad', default: 'sgd'
Optimizer
batch_size : int, default: -1
Batch size when updating model. -1 means use all data in a batch. i.e. Not to use mini-batch strategy.
learning_rate : float, default: 0.01
Learning rate
max_iter : int, default: 100
The maximum iteration for training.
early_stop : str, 'diff', 'weight_diff' or 'abs', default: 'diff'
Method used to judge converge or not.
a) diff: Use difference of loss between two iterations to judge whether converge.
b) weight_diff: Use difference between weights of two consecutive iterations
c) abs: Use the absolute value of loss to judge whether converge. i.e. if loss < eps, it is converged.
decay: int or float, default: 1
Decay rate for learning rate. learning rate will follow the following decay schedule.
lr = lr0/(1+decay*t) if decay_sqrt is False. If decay_sqrt is True, lr = lr0 / sqrt(1+decay*t)
where t is the iter number.
decay_sqrt: Bool, default: True
lr = lr0/(1+decay*t) if decay_sqrt is False, otherwise, lr = lr0 / sqrt(1+decay*t)
encrypt_param: EncryptParam object, default: default EncryptParam object
encrypt param
predict_param: PredictParam object, default: default PredictParam object
predict param
cv_param: CrossValidationParam object, default: default CrossValidationParam object
cv param
multi_class: str, 'ovr', default: 'ovr'
If it is a multi_class task, indicate what strategy to use. Currently, support 'ovr' short for one_vs_rest only.
reveal_strategy: str, "respectively", "encrypted_reveal_in_host", default: "respectively"
"respectively": Means guest and host can reveal their own part of weights only.
"encrypted_reveal_in_host": Means host can be revealed his weights in encrypted mode, and guest can be revealed in normal mode.
reveal_every_iter: bool, default: False
Whether reconstruct model weights every iteration. If so, Regularization is available.
The performance will be better as well since the algorithm process is simplified.
"""
def __init__(self, penalty='L2',
tol=1e-4, alpha=1.0, optimizer='sgd',
batch_size=-1, learning_rate=0.01, init_param=InitParam(),
max_iter=100, early_stop='diff', encrypt_param=EncryptParam(),
predict_param=PredictParam(), cv_param=CrossValidationParam(),
decay=1, decay_sqrt=True,
multi_class='ovr', use_mix_rand=True,
reveal_strategy="respectively",
reveal_every_iter=False,
callback_param=CallbackParam(),
encrypted_mode_calculator_param=EncryptedModeCalculatorParam()
):
super(HeteroSSHELRParam, self).__init__(penalty=penalty, tol=tol, alpha=alpha, optimizer=optimizer,
batch_size=batch_size,
learning_rate=learning_rate,
init_param=init_param, max_iter=max_iter, early_stop=early_stop,
predict_param=predict_param, cv_param=cv_param,
decay=decay,
decay_sqrt=decay_sqrt, multi_class=multi_class,
encrypt_param=encrypt_param, callback_param=callback_param)
self.use_mix_rand = use_mix_rand
self.reveal_strategy = reveal_strategy
self.reveal_every_iter = reveal_every_iter
self.encrypted_mode_calculator_param = copy.deepcopy(encrypted_mode_calculator_param)
def check(self):
descr = "logistic_param's"
super(HeteroSSHELRParam, self).check()
self.check_boolean(self.reveal_every_iter, descr)
if self.penalty is None:
pass
elif type(self.penalty).__name__ != "str":
raise ValueError(
"logistic_param's penalty {} not supported, should be str type".format(self.penalty))
else:
self.penalty = self.penalty.upper()
"""
if self.penalty not in [consts.L1_PENALTY, consts.L2_PENALTY, consts.NONE.upper()]:
raise ValueError(
"logistic_param's penalty not supported, penalty should be 'L1', 'L2' or 'NONE'")
"""
if not self.reveal_every_iter:
if self.penalty not in [consts.L2_PENALTY, consts.NONE.upper()]:
raise ValueError(
f"penalty should be 'L2' or 'none', when reveal_every_iter is False"
)
if type(self.optimizer).__name__ != "str":
raise ValueError(
"logistic_param's optimizer {} not supported, should be str type".format(self.optimizer))
else:
self.optimizer = self.optimizer.lower()
if self.reveal_every_iter:
if self.optimizer not in ['sgd', 'rmsprop', 'adam', 'adagrad', 'nesterov_momentum_sgd']:
raise ValueError(
"When reveal_every_iter is True, "
"sshe logistic_param's optimizer not supported, optimizer should be"
" 'sgd', 'rmsprop', 'adam', 'nesterov_momentum_sgd', or 'adagrad'")
else:
if self.optimizer not in ['sgd', 'nesterov_momentum_sgd']:
raise ValueError("When reveal_every_iter is False, "
"sshe logistic_param's optimizer not supported, optimizer should be"
" 'sgd', 'nesterov_momentum_sgd'")
if self.encrypt_param.method not in [consts.PAILLIER, None]:
raise ValueError(
"logistic_param's encrypted method support 'Paillier' or None only")
if self.callback_param.validation_freqs is not None:
if self.reveal_every_iter is False:
raise ValueError(f"When reveal_every_iter is False, validation every iter"
f" is not supported.")
self.reveal_strategy = self.check_and_change_lower(self.reveal_strategy,
["respectively", "encrypted_reveal_in_host"],
f"{descr} reveal_strategy")
if self.reveal_strategy == "encrypted_reveal_in_host" and self.reveal_every_iter:
raise PermissionError("reveal strategy: encrypted_reveal_in_host mode is not allow to reveal every iter.")
self.encrypted_mode_calculator_param.check()
return True
__init__(self, penalty='L2', tol=0.0001, alpha=1.0, optimizer='sgd', batch_size=-1, learning_rate=0.01, init_param=<federatedml.param.init_model_param.InitParam object at 0x7f3a40b34a10>, max_iter=100, early_stop='diff', encrypt_param=<federatedml.param.encrypt_param.EncryptParam object at 0x7f3a40b34a90>, predict_param=<federatedml.param.predict_param.PredictParam object at 0x7f3a40b34e90>, cv_param=<federatedml.param.cross_validation_param.CrossValidationParam object at 0x7f3a40b34910>, decay=1, decay_sqrt=True, multi_class='ovr', use_mix_rand=True, reveal_strategy='respectively', reveal_every_iter=False, callback_param=<federatedml.param.callback_param.CallbackParam object at 0x7f3a40b34f10>, encrypted_mode_calculator_param=<federatedml.param.encrypted_mode_calculation_param.EncryptedModeCalculatorParam object at 0x7f3a40b34e10>)
special
¶Source code in federatedml/param/hetero_sshe_lr_param.py
def __init__(self, penalty='L2',
tol=1e-4, alpha=1.0, optimizer='sgd',
batch_size=-1, learning_rate=0.01, init_param=InitParam(),
max_iter=100, early_stop='diff', encrypt_param=EncryptParam(),
predict_param=PredictParam(), cv_param=CrossValidationParam(),
decay=1, decay_sqrt=True,
multi_class='ovr', use_mix_rand=True,
reveal_strategy="respectively",
reveal_every_iter=False,
callback_param=CallbackParam(),
encrypted_mode_calculator_param=EncryptedModeCalculatorParam()
):
super(HeteroSSHELRParam, self).__init__(penalty=penalty, tol=tol, alpha=alpha, optimizer=optimizer,
batch_size=batch_size,
learning_rate=learning_rate,
init_param=init_param, max_iter=max_iter, early_stop=early_stop,
predict_param=predict_param, cv_param=cv_param,
decay=decay,
decay_sqrt=decay_sqrt, multi_class=multi_class,
encrypt_param=encrypt_param, callback_param=callback_param)
self.use_mix_rand = use_mix_rand
self.reveal_strategy = reveal_strategy
self.reveal_every_iter = reveal_every_iter
self.encrypted_mode_calculator_param = copy.deepcopy(encrypted_mode_calculator_param)
check(self)
¶Source code in federatedml/param/hetero_sshe_lr_param.py
def check(self):
descr = "logistic_param's"
super(HeteroSSHELRParam, self).check()
self.check_boolean(self.reveal_every_iter, descr)
if self.penalty is None:
pass
elif type(self.penalty).__name__ != "str":
raise ValueError(
"logistic_param's penalty {} not supported, should be str type".format(self.penalty))
else:
self.penalty = self.penalty.upper()
"""
if self.penalty not in [consts.L1_PENALTY, consts.L2_PENALTY, consts.NONE.upper()]:
raise ValueError(
"logistic_param's penalty not supported, penalty should be 'L1', 'L2' or 'NONE'")
"""
if not self.reveal_every_iter:
if self.penalty not in [consts.L2_PENALTY, consts.NONE.upper()]:
raise ValueError(
f"penalty should be 'L2' or 'none', when reveal_every_iter is False"
)
if type(self.optimizer).__name__ != "str":
raise ValueError(
"logistic_param's optimizer {} not supported, should be str type".format(self.optimizer))
else:
self.optimizer = self.optimizer.lower()
if self.reveal_every_iter:
if self.optimizer not in ['sgd', 'rmsprop', 'adam', 'adagrad', 'nesterov_momentum_sgd']:
raise ValueError(
"When reveal_every_iter is True, "
"sshe logistic_param's optimizer not supported, optimizer should be"
" 'sgd', 'rmsprop', 'adam', 'nesterov_momentum_sgd', or 'adagrad'")
else:
if self.optimizer not in ['sgd', 'nesterov_momentum_sgd']:
raise ValueError("When reveal_every_iter is False, "
"sshe logistic_param's optimizer not supported, optimizer should be"
" 'sgd', 'nesterov_momentum_sgd'")
if self.encrypt_param.method not in [consts.PAILLIER, None]:
raise ValueError(
"logistic_param's encrypted method support 'Paillier' or None only")
if self.callback_param.validation_freqs is not None:
if self.reveal_every_iter is False:
raise ValueError(f"When reveal_every_iter is False, validation every iter"
f" is not supported.")
self.reveal_strategy = self.check_and_change_lower(self.reveal_strategy,
["respectively", "encrypted_reveal_in_host"],
f"{descr} reveal_strategy")
if self.reveal_strategy == "encrypted_reveal_in_host" and self.reveal_every_iter:
raise PermissionError("reveal strategy: encrypted_reveal_in_host mode is not allow to reveal every iter.")
self.encrypted_mode_calculator_param.check()
return True
homo_nn_param
¶
Classes¶
HomoNNParam (BaseParam)
¶Parameters used for Homo Neural Network.
Parameters¶
secure_aggregate : bool enable secure aggregation or not, defaults to True. aggregate_every_n_epoch : int aggregate model every n epoch, defaults to 1. config_type : {"nn", "keras", "tf"} config type nn_define : dict a dict represents the structure of neural network. optimizer : str or dict optimizer method, accept following types: 1. a string, one of "Adadelta", "Adagrad", "Adam", "Adamax", "Nadam", "RMSprop", "SGD" 2. a dict, with a required key-value pair keyed by "optimizer", with optional key-value pairs such as learning rate. defaults to "SGD" loss : str loss
str or list of str
metrics
int
the maximum iteration for aggregation in training.
batch_size : int batch size when updating model. -1 means use all data in a batch. i.e. Not to use mini-batch strategy. defaults to -1. early_stop : {'diff', 'weight_diff', 'abs'} Method used to judge converge or not. a) diff: Use difference of loss between two iterations to judge whether converge. b) weight_diff: Use difference between weights of two consecutive iterations c) abs: Use the absolute value of loss to judge whether converge. i.e. if loss < eps, it is converged. encode_label : bool encode label to one_hot.
Source code in federatedml/param/homo_nn_param.py
class HomoNNParam(BaseParam):
"""
Parameters used for Homo Neural Network.
Parameters
----------
secure_aggregate : bool
enable secure aggregation or not, defaults to True.
aggregate_every_n_epoch : int
aggregate model every n epoch, defaults to 1.
config_type : {"nn", "keras", "tf"}
config type
nn_define : dict
a dict represents the structure of neural network.
optimizer : str or dict
optimizer method, accept following types:
1. a string, one of "Adadelta", "Adagrad", "Adam", "Adamax", "Nadam", "RMSprop", "SGD"
2. a dict, with a required key-value pair keyed by "optimizer",
with optional key-value pairs such as learning rate.
defaults to "SGD"
loss : str
loss
metrics: str or list of str
metrics
max_iter: int
the maximum iteration for aggregation in training.
batch_size : int
batch size when updating model.
-1 means use all data in a batch. i.e. Not to use mini-batch strategy.
defaults to -1.
early_stop : {'diff', 'weight_diff', 'abs'}
Method used to judge converge or not.
a) diff: Use difference of loss between two iterations to judge whether converge.
b) weight_diff: Use difference between weights of two consecutive iterations
c) abs: Use the absolute value of loss to judge whether converge. i.e. if loss < eps, it is converged.
encode_label : bool
encode label to one_hot.
"""
def __init__(
self,
api_version: int = 0,
secure_aggregate: bool = True,
aggregate_every_n_epoch: int = 1,
config_type: str = "nn",
nn_define: dict = None,
optimizer: typing.Union[str, dict, SimpleNamespace] = "SGD",
loss: str = None,
metrics: typing.Union[str, list] = None,
max_iter: int = 100,
batch_size: int = -1,
early_stop: typing.Union[str, dict, SimpleNamespace] = "diff",
encode_label: bool = False,
predict_param=PredictParam(),
cv_param=CrossValidationParam(),
callback_param=CallbackParam(),
):
super(HomoNNParam, self).__init__()
self.api_version = api_version
self.secure_aggregate = secure_aggregate
self.aggregate_every_n_epoch = aggregate_every_n_epoch
self.config_type = config_type
self.nn_define = nn_define or []
self.encode_label = encode_label
self.batch_size = batch_size
self.max_iter = max_iter
self.early_stop = early_stop
self.metrics = metrics
self.optimizer = optimizer
self.loss = loss
self.predict_param = copy.deepcopy(predict_param)
self.cv_param = copy.deepcopy(cv_param)
self.callback_param = copy.deepcopy(callback_param)
def check(self):
supported_config_type = ["nn", "keras", "pytorch"]
if self.config_type not in supported_config_type:
raise ValueError(f"config_type should be one of {supported_config_type}")
self.early_stop = _parse_early_stop(self.early_stop)
self.metrics = _parse_metrics(self.metrics)
self.optimizer = _parse_optimizer(self.optimizer)
def generate_pb(self):
from federatedml.protobuf.generated import nn_model_meta_pb2
pb = nn_model_meta_pb2.HomoNNParam()
pb.secure_aggregate = self.secure_aggregate
pb.encode_label = self.encode_label
pb.aggregate_every_n_epoch = self.aggregate_every_n_epoch
pb.config_type = self.config_type
if self.config_type == "nn":
for layer in self.nn_define:
pb.nn_define.append(json.dumps(layer))
elif self.config_type == "keras":
pb.nn_define.append(json.dumps(self.nn_define))
elif self.config_type == "pytorch":
for layer in self.nn_define:
pb.nn_define.append(json.dumps(layer))
pb.batch_size = self.batch_size
pb.max_iter = self.max_iter
pb.early_stop.early_stop = self.early_stop.converge_func
pb.early_stop.eps = self.early_stop.eps
for metric in self.metrics:
pb.metrics.append(metric)
pb.optimizer.optimizer = self.optimizer.optimizer
pb.optimizer.args = json.dumps(self.optimizer.kwargs)
pb.loss = self.loss
return pb
def restore_from_pb(self, pb, is_warm_start_mode: bool = False):
self.secure_aggregate = pb.secure_aggregate
self.encode_label = pb.encode_label
self.aggregate_every_n_epoch = pb.aggregate_every_n_epoch
self.config_type = pb.config_type
if self.config_type == "nn":
for layer in pb.nn_define:
self.nn_define.append(json.loads(layer))
elif self.config_type == "keras":
self.nn_define = json.loads(pb.nn_define[0])
elif self.config_type == "pytorch":
for layer in pb.nn_define:
self.nn_define.append(json.loads(layer))
else:
raise ValueError(f"{self.config_type} is not supported")
self.batch_size = pb.batch_size
if not is_warm_start_mode:
self.max_iter = pb.max_iter
self.optimizer = _parse_optimizer(
dict(optimizer=pb.optimizer.optimizer, **json.loads(pb.optimizer.args))
)
self.early_stop = _parse_early_stop(
dict(early_stop=pb.early_stop.early_stop, eps=pb.early_stop.eps)
)
self.metrics = list(pb.metrics)
self.loss = pb.loss
return pb
__init__(self, api_version=0, secure_aggregate=True, aggregate_every_n_epoch=1, config_type='nn', nn_define=None, optimizer='SGD', loss=None, metrics=None, max_iter=100, batch_size=-1, early_stop='diff', encode_label=False, predict_param=<federatedml.param.predict_param.PredictParam object at 0x7f3a40bd2e10>, cv_param=<federatedml.param.cross_validation_param.CrossValidationParam object at 0x7f3a40bd2f50>, callback_param=<federatedml.param.callback_param.CallbackParam object at 0x7f3a40bd2ed0>)
special
¶Source code in federatedml/param/homo_nn_param.py
def __init__(
self,
api_version: int = 0,
secure_aggregate: bool = True,
aggregate_every_n_epoch: int = 1,
config_type: str = "nn",
nn_define: dict = None,
optimizer: typing.Union[str, dict, SimpleNamespace] = "SGD",
loss: str = None,
metrics: typing.Union[str, list] = None,
max_iter: int = 100,
batch_size: int = -1,
early_stop: typing.Union[str, dict, SimpleNamespace] = "diff",
encode_label: bool = False,
predict_param=PredictParam(),
cv_param=CrossValidationParam(),
callback_param=CallbackParam(),
):
super(HomoNNParam, self).__init__()
self.api_version = api_version
self.secure_aggregate = secure_aggregate
self.aggregate_every_n_epoch = aggregate_every_n_epoch
self.config_type = config_type
self.nn_define = nn_define or []
self.encode_label = encode_label
self.batch_size = batch_size
self.max_iter = max_iter
self.early_stop = early_stop
self.metrics = metrics
self.optimizer = optimizer
self.loss = loss
self.predict_param = copy.deepcopy(predict_param)
self.cv_param = copy.deepcopy(cv_param)
self.callback_param = copy.deepcopy(callback_param)
check(self)
¶Source code in federatedml/param/homo_nn_param.py
def check(self):
supported_config_type = ["nn", "keras", "pytorch"]
if self.config_type not in supported_config_type:
raise ValueError(f"config_type should be one of {supported_config_type}")
self.early_stop = _parse_early_stop(self.early_stop)
self.metrics = _parse_metrics(self.metrics)
self.optimizer = _parse_optimizer(self.optimizer)
generate_pb(self)
¶Source code in federatedml/param/homo_nn_param.py
def generate_pb(self):
from federatedml.protobuf.generated import nn_model_meta_pb2
pb = nn_model_meta_pb2.HomoNNParam()
pb.secure_aggregate = self.secure_aggregate
pb.encode_label = self.encode_label
pb.aggregate_every_n_epoch = self.aggregate_every_n_epoch
pb.config_type = self.config_type
if self.config_type == "nn":
for layer in self.nn_define:
pb.nn_define.append(json.dumps(layer))
elif self.config_type == "keras":
pb.nn_define.append(json.dumps(self.nn_define))
elif self.config_type == "pytorch":
for layer in self.nn_define:
pb.nn_define.append(json.dumps(layer))
pb.batch_size = self.batch_size
pb.max_iter = self.max_iter
pb.early_stop.early_stop = self.early_stop.converge_func
pb.early_stop.eps = self.early_stop.eps
for metric in self.metrics:
pb.metrics.append(metric)
pb.optimizer.optimizer = self.optimizer.optimizer
pb.optimizer.args = json.dumps(self.optimizer.kwargs)
pb.loss = self.loss
return pb
restore_from_pb(self, pb, is_warm_start_mode=False)
¶Source code in federatedml/param/homo_nn_param.py
def restore_from_pb(self, pb, is_warm_start_mode: bool = False):
self.secure_aggregate = pb.secure_aggregate
self.encode_label = pb.encode_label
self.aggregate_every_n_epoch = pb.aggregate_every_n_epoch
self.config_type = pb.config_type
if self.config_type == "nn":
for layer in pb.nn_define:
self.nn_define.append(json.loads(layer))
elif self.config_type == "keras":
self.nn_define = json.loads(pb.nn_define[0])
elif self.config_type == "pytorch":
for layer in pb.nn_define:
self.nn_define.append(json.loads(layer))
else:
raise ValueError(f"{self.config_type} is not supported")
self.batch_size = pb.batch_size
if not is_warm_start_mode:
self.max_iter = pb.max_iter
self.optimizer = _parse_optimizer(
dict(optimizer=pb.optimizer.optimizer, **json.loads(pb.optimizer.args))
)
self.early_stop = _parse_early_stop(
dict(early_stop=pb.early_stop.early_stop, eps=pb.early_stop.eps)
)
self.metrics = list(pb.metrics)
self.loss = pb.loss
return pb
homo_onehot_encoder_param
¶
Classes¶
HomoOneHotParam (BaseParam)
¶Parameters¶
list or int, default: -1
Specify which columns need to calculated. -1 represent for all columns.
bool, default True
Indicate if this module needed to be run
bool, default True
Indicated whether alignment of features is turned on
Source code in federatedml/param/homo_onehot_encoder_param.py
class HomoOneHotParam(BaseParam):
"""
Parameters
----------
transform_col_indexes: list or int, default: -1
Specify which columns need to calculated. -1 represent for all columns.
need_run: bool, default True
Indicate if this module needed to be run
need_alignment: bool, default True
Indicated whether alignment of features is turned on
"""
def __init__(self, transform_col_indexes=-1, transform_col_names=None, need_run=True, need_alignment=True):
super(HomoOneHotParam, self).__init__()
if transform_col_names is None:
transform_col_names = []
self.transform_col_indexes = transform_col_indexes
self.transform_col_names = transform_col_names
self.need_run = need_run
self.need_alignment = need_alignment
def check(self):
descr = "One-hot encoder with alignment param's"
self.check_defined_type(self.transform_col_indexes, descr, ['list', 'int'])
self.check_boolean(self.need_run, descr)
self.check_boolean(self.need_alignment, descr)
return True
__init__(self, transform_col_indexes=-1, transform_col_names=None, need_run=True, need_alignment=True)
special
¶Source code in federatedml/param/homo_onehot_encoder_param.py
def __init__(self, transform_col_indexes=-1, transform_col_names=None, need_run=True, need_alignment=True):
super(HomoOneHotParam, self).__init__()
if transform_col_names is None:
transform_col_names = []
self.transform_col_indexes = transform_col_indexes
self.transform_col_names = transform_col_names
self.need_run = need_run
self.need_alignment = need_alignment
check(self)
¶Source code in federatedml/param/homo_onehot_encoder_param.py
def check(self):
descr = "One-hot encoder with alignment param's"
self.check_defined_type(self.transform_col_indexes, descr, ['list', 'int'])
self.check_boolean(self.need_run, descr)
self.check_boolean(self.need_alignment, descr)
return True
init_model_param
¶
Classes¶
InitParam (BaseParam)
¶Initialize Parameters used in initializing a model.
Parameters¶
init_method : {'random_uniform', 'random_normal', 'ones', 'zeros' or 'const'} Initial method.
init_const : int or float, default: 1 Required when init_method is 'const'. Specify the constant.
fit_intercept : bool, default: True Whether to initialize the intercept or not.
Source code in federatedml/param/init_model_param.py
class InitParam(BaseParam):
"""
Initialize Parameters used in initializing a model.
Parameters
----------
init_method : {'random_uniform', 'random_normal', 'ones', 'zeros' or 'const'}
Initial method.
init_const : int or float, default: 1
Required when init_method is 'const'. Specify the constant.
fit_intercept : bool, default: True
Whether to initialize the intercept or not.
"""
def __init__(self, init_method='random_uniform', init_const=1, fit_intercept=True, random_seed=None):
super().__init__()
self.init_method = init_method
self.init_const = init_const
self.fit_intercept = fit_intercept
self.random_seed = random_seed
def check(self):
if type(self.init_method).__name__ != "str":
raise ValueError(
"Init param's init_method {} not supported, should be str type".format(self.init_method))
else:
self.init_method = self.init_method.lower()
if self.init_method not in ['random_uniform', 'random_normal', 'ones', 'zeros', 'const']:
raise ValueError(
"Init param's init_method {} not supported, init_method should in 'random_uniform',"
" 'random_normal' 'ones', 'zeros' or 'const'".format(self.init_method))
if type(self.init_const).__name__ not in ['int', 'float']:
raise ValueError(
"Init param's init_const {} not supported, should be int or float type".format(self.init_const))
if type(self.fit_intercept).__name__ != 'bool':
raise ValueError(
"Init param's fit_intercept {} not supported, should be bool type".format(self.fit_intercept))
if self.random_seed is not None:
if type(self.random_seed).__name__ != 'int':
raise ValueError(
"Init param's random_seed {} not supported, should be int or float type".format(self.random_seed))
return True
__init__(self, init_method='random_uniform', init_const=1, fit_intercept=True, random_seed=None)
special
¶Source code in federatedml/param/init_model_param.py
def __init__(self, init_method='random_uniform', init_const=1, fit_intercept=True, random_seed=None):
super().__init__()
self.init_method = init_method
self.init_const = init_const
self.fit_intercept = fit_intercept
self.random_seed = random_seed
check(self)
¶Source code in federatedml/param/init_model_param.py
def check(self):
if type(self.init_method).__name__ != "str":
raise ValueError(
"Init param's init_method {} not supported, should be str type".format(self.init_method))
else:
self.init_method = self.init_method.lower()
if self.init_method not in ['random_uniform', 'random_normal', 'ones', 'zeros', 'const']:
raise ValueError(
"Init param's init_method {} not supported, init_method should in 'random_uniform',"
" 'random_normal' 'ones', 'zeros' or 'const'".format(self.init_method))
if type(self.init_const).__name__ not in ['int', 'float']:
raise ValueError(
"Init param's init_const {} not supported, should be int or float type".format(self.init_const))
if type(self.fit_intercept).__name__ != 'bool':
raise ValueError(
"Init param's fit_intercept {} not supported, should be bool type".format(self.fit_intercept))
if self.random_seed is not None:
if type(self.random_seed).__name__ != 'int':
raise ValueError(
"Init param's random_seed {} not supported, should be int or float type".format(self.random_seed))
return True
intersect_param
¶
DEFAULT_RANDOM_BIT
¶Classes¶
EncodeParam (BaseParam)
¶Define the hash method for raw intersect method
Parameters¶
str
the src data string will be str = str + salt, default by empty string
{"none", "md5", "sha1", "sha224", "sha256", "sha384", "sha512", "sm3"}
the hash method of src data string, support md5, sha1, sha224, sha256, sha384, sha512, sm3, default by None
bool
if True, the result of hash will be changed to base64, default by False
Source code in federatedml/param/intersect_param.py
class EncodeParam(BaseParam):
"""
Define the hash method for raw intersect method
Parameters
----------
salt: str
the src data string will be str = str + salt, default by empty string
encode_method: {"none", "md5", "sha1", "sha224", "sha256", "sha384", "sha512", "sm3"}
the hash method of src data string, support md5, sha1, sha224, sha256, sha384, sha512, sm3, default by None
base64: bool
if True, the result of hash will be changed to base64, default by False
"""
def __init__(self, salt='', encode_method='none', base64=False):
super().__init__()
self.salt = salt
self.encode_method = encode_method
self.base64 = base64
def check(self):
if type(self.salt).__name__ != "str":
raise ValueError(
"encode param's salt {} not supported, should be str type".format(
self.salt))
descr = "encode param's "
self.encode_method = self.check_and_change_lower(self.encode_method,
["none", consts.MD5, consts.SHA1, consts.SHA224,
consts.SHA256, consts.SHA384, consts.SHA512,
consts.SM3],
descr)
if type(self.base64).__name__ != "bool":
raise ValueError(
"hash param's base64 {} not supported, should be bool type".format(self.base64))
LOGGER.debug("Finish EncodeParam check!")
LOGGER.warning(f"'EncodeParam' will be replaced by 'RAWParam' in future release."
f"Please do not rely on current param naming in application.")
return True
__init__(self, salt='', encode_method='none', base64=False)
special
¶Source code in federatedml/param/intersect_param.py
def __init__(self, salt='', encode_method='none', base64=False):
super().__init__()
self.salt = salt
self.encode_method = encode_method
self.base64 = base64
check(self)
¶Source code in federatedml/param/intersect_param.py
def check(self):
if type(self.salt).__name__ != "str":
raise ValueError(
"encode param's salt {} not supported, should be str type".format(
self.salt))
descr = "encode param's "
self.encode_method = self.check_and_change_lower(self.encode_method,
["none", consts.MD5, consts.SHA1, consts.SHA224,
consts.SHA256, consts.SHA384, consts.SHA512,
consts.SM3],
descr)
if type(self.base64).__name__ != "bool":
raise ValueError(
"hash param's base64 {} not supported, should be bool type".format(self.base64))
LOGGER.debug("Finish EncodeParam check!")
LOGGER.warning(f"'EncodeParam' will be replaced by 'RAWParam' in future release."
f"Please do not rely on current param naming in application.")
return True
RAWParam (BaseParam)
¶Specify parameters for raw intersect method
Parameters¶
bool
whether to hash ids for raw intersect
str
the src data string will be str = str + salt, default by empty string
str
the hash method of src data string, support md5, sha1, sha224, sha256, sha384, sha512, sm3, default by None
bool
if True, the result of hash will be changed to base64, default by False
{"guest", "host"}
role who joins ids, supports "guest" and "host" only and effective only for raw. If it is "guest", the host will send its ids to guest and find the intersection of ids in guest; if it is "host", the guest will send its ids to host. Default by "guest";
Source code in federatedml/param/intersect_param.py
class RAWParam(BaseParam):
"""
Specify parameters for raw intersect method
Parameters
----------
use_hash: bool
whether to hash ids for raw intersect
salt: str
the src data string will be str = str + salt, default by empty string
hash_method: str
the hash method of src data string, support md5, sha1, sha224, sha256, sha384, sha512, sm3, default by None
base64: bool
if True, the result of hash will be changed to base64, default by False
join_role: {"guest", "host"}
role who joins ids, supports "guest" and "host" only and effective only for raw.
If it is "guest", the host will send its ids to guest and find the intersection of
ids in guest; if it is "host", the guest will send its ids to host. Default by "guest";
"""
def __init__(self, use_hash=False, salt='', hash_method='none', base64=False, join_role=consts.GUEST):
super().__init__()
self.use_hash = use_hash
self.salt = salt
self.hash_method = hash_method
self.base64 = base64
self.join_role = join_role
def check(self):
descr = "raw param's "
self.check_boolean(self.use_hash, f"{descr}use_hash")
self.check_string(self.salt, f"{descr}salt")
self.hash_method = self.check_and_change_lower(self.hash_method,
["none", consts.MD5, consts.SHA1, consts.SHA224,
consts.SHA256, consts.SHA384, consts.SHA512,
consts.SM3],
f"{descr}hash_method")
self.check_boolean(self.base64, f"{descr}base_64")
self.join_role = self.check_and_change_lower(self.join_role, [consts.GUEST, consts.HOST], f"{descr}join_role")
LOGGER.debug("Finish RAWParam check!")
return True
__init__(self, use_hash=False, salt='', hash_method='none', base64=False, join_role='guest')
special
¶Source code in federatedml/param/intersect_param.py
def __init__(self, use_hash=False, salt='', hash_method='none', base64=False, join_role=consts.GUEST):
super().__init__()
self.use_hash = use_hash
self.salt = salt
self.hash_method = hash_method
self.base64 = base64
self.join_role = join_role
check(self)
¶Source code in federatedml/param/intersect_param.py
def check(self):
descr = "raw param's "
self.check_boolean(self.use_hash, f"{descr}use_hash")
self.check_string(self.salt, f"{descr}salt")
self.hash_method = self.check_and_change_lower(self.hash_method,
["none", consts.MD5, consts.SHA1, consts.SHA224,
consts.SHA256, consts.SHA384, consts.SHA512,
consts.SM3],
f"{descr}hash_method")
self.check_boolean(self.base64, f"{descr}base_64")
self.join_role = self.check_and_change_lower(self.join_role, [consts.GUEST, consts.HOST], f"{descr}join_role")
LOGGER.debug("Finish RAWParam check!")
return True
RSAParam (BaseParam)
¶Specify parameters for RSA intersect method
Parameters¶
str
the src data string will be str = str + salt, default ''
str
the hash method of src data string, support sha256, sha384, sha512, sm3, default sha256
str
the hash method of result data string, support md5, sha1, sha224, sha256, sha384, sha512, sm3, default sha256
bool
if True, Host & Guest split operations for faster performance, recommended on large data set
positive float
if not None, generate (fraction * public key id count) of r for encryption and reuse generated r; note that value greater than 0.99 will be taken as 1, and value less than 0.01 will be rounded up to 0.01
int
value >= 1024, bit count of rsa key, default 1024
positive int
it will define the size of blinding factor in rsa algorithm, default 128
Source code in federatedml/param/intersect_param.py
class RSAParam(BaseParam):
"""
Specify parameters for RSA intersect method
Parameters
----------
salt: str
the src data string will be str = str + salt, default ''
hash_method: str
the hash method of src data string, support sha256, sha384, sha512, sm3, default sha256
final_hash_method: str
the hash method of result data string, support md5, sha1, sha224, sha256, sha384, sha512, sm3, default sha256
split_calculation: bool
if True, Host & Guest split operations for faster performance, recommended on large data set
random_base_fraction: positive float
if not None, generate (fraction * public key id count) of r for encryption and reuse generated r;
note that value greater than 0.99 will be taken as 1, and value less than 0.01 will be rounded up to 0.01
key_length: int
value >= 1024, bit count of rsa key, default 1024
random_bit: positive int
it will define the size of blinding factor in rsa algorithm, default 128
"""
def __init__(self, salt='', hash_method='sha256', final_hash_method='sha256',
split_calculation=False, random_base_fraction=None, key_length=consts.DEFAULT_KEY_LENGTH,
random_bit=DEFAULT_RANDOM_BIT):
super().__init__()
self.salt = salt
self.hash_method = hash_method
self.final_hash_method = final_hash_method
self.split_calculation = split_calculation
self.random_base_fraction = random_base_fraction
self.key_length = key_length
self.random_bit = random_bit
def check(self):
descr = "rsa param's "
self.check_string(self.salt, f"{descr}salt")
self.hash_method = self.check_and_change_lower(self.hash_method,
[consts.SHA256, consts.SHA384, consts.SHA512, consts.SM3],
f"{descr}hash_method")
self.final_hash_method = self.check_and_change_lower(self.final_hash_method,
[consts.MD5, consts.SHA1, consts.SHA224,
consts.SHA256, consts.SHA384, consts.SHA512,
consts.SM3],
f"{descr}final_hash_method")
self.check_boolean(self.split_calculation, f"{descr}split_calculation")
if self.random_base_fraction:
self.check_positive_number(self.random_base_fraction, descr)
self.check_decimal_float(self.random_base_fraction, f"{descr}random_base_fraction")
self.check_positive_integer(self.key_length, f"{descr}key_length")
if self.key_length < 1024:
raise ValueError(f"key length must be >= 1024")
self.check_positive_integer(self.random_bit, f"{descr}random_bit")
LOGGER.debug("Finish RSAParam parameter check!")
return True
__init__(self, salt='', hash_method='sha256', final_hash_method='sha256', split_calculation=False, random_base_fraction=None, key_length=1024, random_bit=128)
special
¶Source code in federatedml/param/intersect_param.py
def __init__(self, salt='', hash_method='sha256', final_hash_method='sha256',
split_calculation=False, random_base_fraction=None, key_length=consts.DEFAULT_KEY_LENGTH,
random_bit=DEFAULT_RANDOM_BIT):
super().__init__()
self.salt = salt
self.hash_method = hash_method
self.final_hash_method = final_hash_method
self.split_calculation = split_calculation
self.random_base_fraction = random_base_fraction
self.key_length = key_length
self.random_bit = random_bit
check(self)
¶Source code in federatedml/param/intersect_param.py
def check(self):
descr = "rsa param's "
self.check_string(self.salt, f"{descr}salt")
self.hash_method = self.check_and_change_lower(self.hash_method,
[consts.SHA256, consts.SHA384, consts.SHA512, consts.SM3],
f"{descr}hash_method")
self.final_hash_method = self.check_and_change_lower(self.final_hash_method,
[consts.MD5, consts.SHA1, consts.SHA224,
consts.SHA256, consts.SHA384, consts.SHA512,
consts.SM3],
f"{descr}final_hash_method")
self.check_boolean(self.split_calculation, f"{descr}split_calculation")
if self.random_base_fraction:
self.check_positive_number(self.random_base_fraction, descr)
self.check_decimal_float(self.random_base_fraction, f"{descr}random_base_fraction")
self.check_positive_integer(self.key_length, f"{descr}key_length")
if self.key_length < 1024:
raise ValueError(f"key length must be >= 1024")
self.check_positive_integer(self.random_bit, f"{descr}random_bit")
LOGGER.debug("Finish RSAParam parameter check!")
return True
DHParam (BaseParam)
¶Define the hash method for DH intersect method
Parameters¶
str
the src data string will be str = str + salt, default ''
str
the hash method of src data string, support none, md5, sha1, sha 224, sha256, sha384, sha512, sm3, default sha256
int, value >= 1024
the key length of the commutative cipher p, default 1024
Source code in federatedml/param/intersect_param.py
class DHParam(BaseParam):
"""
Define the hash method for DH intersect method
Parameters
----------
salt: str
the src data string will be str = str + salt, default ''
hash_method: str
the hash method of src data string, support none, md5, sha1, sha 224, sha256, sha384, sha512, sm3, default sha256
key_length: int, value >= 1024
the key length of the commutative cipher p, default 1024
"""
def __init__(self, salt='', hash_method='sha256', key_length=consts.DEFAULT_KEY_LENGTH):
super().__init__()
self.salt = salt
self.hash_method = hash_method
self.key_length = key_length
def check(self):
descr = "dh param's "
self.check_string(self.salt, f"{descr}salt")
self.hash_method = self.check_and_change_lower(self.hash_method,
["none", consts.MD5, consts.SHA1, consts.SHA224,
consts.SHA256, consts.SHA384, consts.SHA512,
consts.SM3],
f"{descr}hash_method")
self.check_positive_integer(self.key_length, f"{descr}key_length")
if self.key_length < 1024:
raise ValueError(f"key length must be >= 1024")
LOGGER.debug("Finish DHParam parameter check!")
return True
__init__(self, salt='', hash_method='sha256', key_length=1024)
special
¶Source code in federatedml/param/intersect_param.py
def __init__(self, salt='', hash_method='sha256', key_length=consts.DEFAULT_KEY_LENGTH):
super().__init__()
self.salt = salt
self.hash_method = hash_method
self.key_length = key_length
check(self)
¶Source code in federatedml/param/intersect_param.py
def check(self):
descr = "dh param's "
self.check_string(self.salt, f"{descr}salt")
self.hash_method = self.check_and_change_lower(self.hash_method,
["none", consts.MD5, consts.SHA1, consts.SHA224,
consts.SHA256, consts.SHA384, consts.SHA512,
consts.SM3],
f"{descr}hash_method")
self.check_positive_integer(self.key_length, f"{descr}key_length")
if self.key_length < 1024:
raise ValueError(f"key length must be >= 1024")
LOGGER.debug("Finish DHParam parameter check!")
return True
IntersectCache (BaseParam)
¶Source code in federatedml/param/intersect_param.py
class IntersectCache(BaseParam):
def __init__(self, use_cache=False, id_type=consts.PHONE, encrypt_type=consts.SHA256):
"""
Parameters
----------
use_cache: bool
whether to use cached ids; with ver1.7 and above, this param is ignored
id_type
with ver1.7 and above, this param is ignored
encrypt_type
with ver1.7 and above, this param is ignored
"""
super().__init__()
self.use_cache = use_cache
self.id_type = id_type
self.encrypt_type = encrypt_type
def check(self):
descr = "intersect_cache param's "
# self.check_boolean(self.use_cache, f"{descr}use_cache")
self.check_and_change_lower(self.id_type,
[consts.PHONE, consts.IMEI],
f"{descr}id_type")
self.check_and_change_lower(self.encrypt_type,
[consts.MD5, consts.SHA256],
f"{descr}encrypt_type")
__init__(self, use_cache=False, id_type='phone', encrypt_type='sha256')
special
¶Parameters¶
bool
whether to use cached ids; with ver1.7 and above, this param is ignored
id_type with ver1.7 and above, this param is ignored encrypt_type with ver1.7 and above, this param is ignored
Source code in federatedml/param/intersect_param.py
def __init__(self, use_cache=False, id_type=consts.PHONE, encrypt_type=consts.SHA256):
"""
Parameters
----------
use_cache: bool
whether to use cached ids; with ver1.7 and above, this param is ignored
id_type
with ver1.7 and above, this param is ignored
encrypt_type
with ver1.7 and above, this param is ignored
"""
super().__init__()
self.use_cache = use_cache
self.id_type = id_type
self.encrypt_type = encrypt_type
check(self)
¶Source code in federatedml/param/intersect_param.py
def check(self):
descr = "intersect_cache param's "
# self.check_boolean(self.use_cache, f"{descr}use_cache")
self.check_and_change_lower(self.id_type,
[consts.PHONE, consts.IMEI],
f"{descr}id_type")
self.check_and_change_lower(self.encrypt_type,
[consts.MD5, consts.SHA256],
f"{descr}encrypt_type")
IntersectPreProcessParam (BaseParam)
¶Specify parameters for pre-processing and cardinality-only mode
Parameters¶
float
initial target false positive rate when creating Bloom Filter, must be <= 0.5, default 1e-3
str
encrypt method for encrypting id when performing cardinality_only task, supports rsa only, default rsa; specify rsa parameter setting with RSAParam
str
the hash method for inserting ids, support md5, sha1, sha 224, sha256, sha384, sha512, sm3, default sha256
str
the hash method for encoding ids before insertion into filter, default sha256, only effective for preprocessing
str
salt to be appended to hash result by preprocess_method before insertion into filter, default '', only effective for preprocessing
int
seed for random salt generator when constructing hash functions, salt is appended to hash result by hash_method when performing insertion, default None
str
role that constructs filter, either guest or host, default guest, only effective for preprocessing
Source code in federatedml/param/intersect_param.py
class IntersectPreProcessParam(BaseParam):
"""
Specify parameters for pre-processing and cardinality-only mode
Parameters
----------
false_positive_rate: float
initial target false positive rate when creating Bloom Filter,
must be <= 0.5, default 1e-3
encrypt_method: str
encrypt method for encrypting id when performing cardinality_only task,
supports rsa only, default rsa;
specify rsa parameter setting with RSAParam
hash_method: str
the hash method for inserting ids, support md5, sha1, sha 224, sha256, sha384, sha512, sm3,
default sha256
preprocess_method: str
the hash method for encoding ids before insertion into filter, default sha256,
only effective for preprocessing
preprocess_salt: str
salt to be appended to hash result by preprocess_method before insertion into filter,
default '', only effective for preprocessing
random_state: int
seed for random salt generator when constructing hash functions,
salt is appended to hash result by hash_method when performing insertion, default None
filter_owner: str
role that constructs filter, either guest or host, default guest,
only effective for preprocessing
"""
def __init__(self, false_positive_rate=1e-3, encrypt_method=consts.RSA, hash_method='sha256',
preprocess_method='sha256', preprocess_salt='', random_state=None, filter_owner=consts.GUEST):
super().__init__()
self.false_positive_rate = false_positive_rate
self.encrypt_method = encrypt_method
self.hash_method = hash_method
self.preprocess_method = preprocess_method
self.preprocess_salt = preprocess_salt
self.random_state = random_state
self.filter_owner = filter_owner
def check(self):
descr = "intersect preprocess param's false_positive_rate "
self.check_decimal_float(self.false_positive_rate, descr)
self.check_positive_number(self.false_positive_rate, descr)
if self.false_positive_rate > 0.5:
raise ValueError(f"{descr} must be positive float no greater than 0.5")
descr = "intersect preprocess param's encrypt_method "
self.encrypt_method = self.check_and_change_lower(self.encrypt_method, [consts.RSA], descr)
descr = "intersect preprocess param's random_state "
if self.random_state:
self.check_nonnegative_number(self.random_state, descr)
descr = "intersect preprocess param's hash_method "
self.hash_method = self.check_and_change_lower(self.hash_method,
[consts.MD5, consts.SHA1, consts.SHA224,
consts.SHA256, consts.SHA384, consts.SHA512,
consts.SM3],
descr)
descr = "intersect preprocess param's preprocess_salt "
self.check_string(self.preprocess_salt, descr)
descr = "intersect preprocess param's preprocess_method "
self.preprocess_method = self.check_and_change_lower(self.preprocess_method,
[consts.MD5, consts.SHA1, consts.SHA224,
consts.SHA256, consts.SHA384, consts.SHA512,
consts.SM3],
descr)
descr = "intersect preprocess param's filter_owner "
self.filter_owner = self.check_and_change_lower(self.filter_owner,
[consts.GUEST, consts.HOST],
descr)
LOGGER.debug("Finish IntersectPreProcessParam parameter check!")
return True
__init__(self, false_positive_rate=0.001, encrypt_method='rsa', hash_method='sha256', preprocess_method='sha256', preprocess_salt='', random_state=None, filter_owner='guest')
special
¶Source code in federatedml/param/intersect_param.py
def __init__(self, false_positive_rate=1e-3, encrypt_method=consts.RSA, hash_method='sha256',
preprocess_method='sha256', preprocess_salt='', random_state=None, filter_owner=consts.GUEST):
super().__init__()
self.false_positive_rate = false_positive_rate
self.encrypt_method = encrypt_method
self.hash_method = hash_method
self.preprocess_method = preprocess_method
self.preprocess_salt = preprocess_salt
self.random_state = random_state
self.filter_owner = filter_owner
check(self)
¶Source code in federatedml/param/intersect_param.py
def check(self):
descr = "intersect preprocess param's false_positive_rate "
self.check_decimal_float(self.false_positive_rate, descr)
self.check_positive_number(self.false_positive_rate, descr)
if self.false_positive_rate > 0.5:
raise ValueError(f"{descr} must be positive float no greater than 0.5")
descr = "intersect preprocess param's encrypt_method "
self.encrypt_method = self.check_and_change_lower(self.encrypt_method, [consts.RSA], descr)
descr = "intersect preprocess param's random_state "
if self.random_state:
self.check_nonnegative_number(self.random_state, descr)
descr = "intersect preprocess param's hash_method "
self.hash_method = self.check_and_change_lower(self.hash_method,
[consts.MD5, consts.SHA1, consts.SHA224,
consts.SHA256, consts.SHA384, consts.SHA512,
consts.SM3],
descr)
descr = "intersect preprocess param's preprocess_salt "
self.check_string(self.preprocess_salt, descr)
descr = "intersect preprocess param's preprocess_method "
self.preprocess_method = self.check_and_change_lower(self.preprocess_method,
[consts.MD5, consts.SHA1, consts.SHA224,
consts.SHA256, consts.SHA384, consts.SHA512,
consts.SM3],
descr)
descr = "intersect preprocess param's filter_owner "
self.filter_owner = self.check_and_change_lower(self.filter_owner,
[consts.GUEST, consts.HOST],
descr)
LOGGER.debug("Finish IntersectPreProcessParam parameter check!")
return True
IntersectParam (BaseParam)
¶Define the intersect method
Parameters¶
str
it supports 'rsa', 'raw', and 'dh', default by 'rsa'
positive int
it will define the size of blinding factor in rsa algorithm, default 128 note that this param will be deprecated in future, please use random_bit in RSAParam instead
bool
In rsa, 'sync_intersect_ids' is True means guest or host will send intersect results to the others, and False will not. while in raw, 'sync_intersect_ids' is True means the role of "join_role" will send intersect results and the others will get them. Default by True.
str
role who joins ids, supports "guest" and "host" only and effective only for raw. If it is "guest", the host will send its ids to guest and find the intersection of ids in guest; if it is "host", the guest will send its ids to host. Default by "guest"; note this param will be deprecated in future version, please use 'join_role' in raw_params instead
bool
if false, the results of intersection will include key and value which from input data; if true, it will just include key from input data and the value will be empty or filled by uniform string like "intersect_id"
bool
if True, it will use hash method for intersect ids, effective for raw method only; note that this param will be deprecated in future version, please use 'use_hash' in raw_params; currently if this param is set to True, specification by 'encode_params' will be taken instead of 'raw_params'.
EncodeParam
effective only when with_encode is True; this param will be deprecated in future version, use 'raw_params' in future implementation
RAWParam
effective for raw method only
RSAParam
effective for rsa method only
DHParam
effective for dh method only
{'inner_join', 'left_join'}
if 'left_join', participants will all include sample_id_generator's (imputed) ids in output, default 'inner_join'
bool
whether to generate new id for sample_id_generator's ids, only effective when join_method is 'left_join' or when input data are instance with match id, default False
str
role whose ids are to be kept, effective only when join_method is 'left_join' or when input data are instance with match id, default 'guest'
IntersectCacheParam
specification for cache generation, with ver1.7 and above, this param is ignored.
bool
whether to store Host's encrypted ids, only valid when intersect method is 'rsa' or 'dh', default False
bool
whether to output estimated intersection count(cardinality); if sync_cardinality is True, then sync cardinality count with host(s)
bool
whether to sync cardinality with all participants, default False, only effective when cardinality_only set to True
bool
whether to run preprocess process, default False
IntersectPreProcessParam
used for preprocessing and cardinality_only mode
bool
if true, intersection will process the ids which can be repeatable; in ver 1.7 and above,repeated id process will be automatically applied to data with instance id, this param will be ignored
str
which role has the repeated id; in ver 1.7 and above, this param is ignored
bool
in ver 1.7 and above, this param is ignored
str
in ver 1.7 and above, this param is ignored
bool
data with sample id or not, default False; in ver 1.7 and above, this param is ignored
Source code in federatedml/param/intersect_param.py
class IntersectParam(BaseParam):
"""
Define the intersect method
Parameters
----------
intersect_method: str
it supports 'rsa', 'raw', and 'dh', default by 'rsa'
random_bit: positive int
it will define the size of blinding factor in rsa algorithm, default 128
note that this param will be deprecated in future, please use random_bit in RSAParam instead
sync_intersect_ids: bool
In rsa, 'sync_intersect_ids' is True means guest or host will send intersect results to the others, and False will not.
while in raw, 'sync_intersect_ids' is True means the role of "join_role" will send intersect results and the others will get them.
Default by True.
join_role: str
role who joins ids, supports "guest" and "host" only and effective only for raw.
If it is "guest", the host will send its ids to guest and find the intersection of
ids in guest; if it is "host", the guest will send its ids to host. Default by "guest";
note this param will be deprecated in future version, please use 'join_role' in raw_params instead
only_output_key: bool
if false, the results of intersection will include key and value which from input data; if true, it will just include key from input
data and the value will be empty or filled by uniform string like "intersect_id"
with_encode: bool
if True, it will use hash method for intersect ids, effective for raw method only;
note that this param will be deprecated in future version, please use 'use_hash' in raw_params;
currently if this param is set to True,
specification by 'encode_params' will be taken instead of 'raw_params'.
encode_params: EncodeParam
effective only when with_encode is True;
this param will be deprecated in future version, use 'raw_params' in future implementation
raw_params: RAWParam
effective for raw method only
rsa_params: RSAParam
effective for rsa method only
dh_params: DHParam
effective for dh method only
join_method: {'inner_join', 'left_join'}
if 'left_join', participants will all include sample_id_generator's (imputed) ids in output,
default 'inner_join'
new_sample_id: bool
whether to generate new id for sample_id_generator's ids,
only effective when join_method is 'left_join' or when input data are instance with match id,
default False
sample_id_generator: str
role whose ids are to be kept,
effective only when join_method is 'left_join' or when input data are instance with match id,
default 'guest'
intersect_cache_param: IntersectCacheParam
specification for cache generation,
with ver1.7 and above, this param is ignored.
run_cache: bool
whether to store Host's encrypted ids, only valid when intersect method is 'rsa' or 'dh', default False
cardinality_only: bool
whether to output estimated intersection count(cardinality);
if sync_cardinality is True, then sync cardinality count with host(s)
sync_cardinality: bool
whether to sync cardinality with all participants, default False,
only effective when cardinality_only set to True
run_preprocess: bool
whether to run preprocess process, default False
intersect_preprocess_params: IntersectPreProcessParam
used for preprocessing and cardinality_only mode
repeated_id_process: bool
if true, intersection will process the ids which can be repeatable;
in ver 1.7 and above,repeated id process
will be automatically applied to data with instance id, this param will be ignored
repeated_id_owner: str
which role has the repeated id; in ver 1.7 and above, this param is ignored
allow_info_share: bool
in ver 1.7 and above, this param is ignored
info_owner: str
in ver 1.7 and above, this param is ignored
with_sample_id: bool
data with sample id or not, default False; in ver 1.7 and above, this param is ignored
"""
def __init__(self, intersect_method: str = consts.RSA, random_bit=DEFAULT_RANDOM_BIT, sync_intersect_ids=True,
join_role=consts.GUEST, only_output_key: bool = False,
with_encode=False, encode_params=EncodeParam(),
raw_params=RAWParam(), rsa_params=RSAParam(), dh_params=DHParam(),
join_method=consts.INNER_JOIN, new_sample_id: bool = False, sample_id_generator=consts.GUEST,
intersect_cache_param=IntersectCache(), run_cache: bool = False,
cardinality_only: bool = False, sync_cardinality: bool = False,
run_preprocess: bool = False,
intersect_preprocess_params=IntersectPreProcessParam(),
repeated_id_process=False, repeated_id_owner=consts.GUEST,
with_sample_id=False, allow_info_share: bool = False, info_owner=consts.GUEST):
super().__init__()
self.intersect_method = intersect_method
self.random_bit = random_bit
self.sync_intersect_ids = sync_intersect_ids
self.join_role = join_role
self.with_encode = with_encode
self.encode_params = copy.deepcopy(encode_params)
self.raw_params = copy.deepcopy(raw_params)
self.rsa_params = copy.deepcopy(rsa_params)
self.only_output_key = only_output_key
self.sample_id_generator = sample_id_generator
self.intersect_cache_param = copy.deepcopy(intersect_cache_param)
self.run_cache = run_cache
self.repeated_id_process = repeated_id_process
self.repeated_id_owner = repeated_id_owner
self.allow_info_share = allow_info_share
self.info_owner = info_owner
self.with_sample_id = with_sample_id
self.join_method = join_method
self.new_sample_id = new_sample_id
self.dh_params = copy.deepcopy(dh_params)
self.cardinality_only = cardinality_only
self.sync_cardinality = sync_cardinality
self.run_preprocess = run_preprocess
self.intersect_preprocess_params = copy.deepcopy(intersect_preprocess_params)
def check(self):
descr = "intersect param's "
self.intersect_method = self.check_and_change_lower(self.intersect_method,
[consts.RSA, consts.RAW, consts.DH],
f"{descr}intersect_method")
if self._warn_to_deprecate_param("random_bit", descr, "rsa_params' 'random_bit'"):
if "rsa_params.random_bit" in self.get_user_feeded():
raise ValueError(f"random_bit and rsa_params.random_bit should not be set simultaneously")
self.rsa_params.random_bit = self.random_bit
self.check_boolean(self.sync_intersect_ids, f"{descr}intersect_ids")
if self._warn_to_deprecate_param("encode_param", "", ""):
if "raw_params" in self.get_user_feeded():
raise ValueError(f"encode_param and raw_params should not be set simultaneously")
else:
self.callback_param.callbacks = ["PerformanceEvaluate"]
if self._warn_to_deprecate_param("join_role", descr, "raw_params' 'join_role'"):
if "raw_params.join_role" in self.get_user_feeded():
raise ValueError(f"join_role and raw_params.join_role should not be set simultaneously")
self.raw_params.join_role = self.join_role
self.check_boolean(self.only_output_key, f"{descr}only_output_key")
self.join_method = self.check_and_change_lower(self.join_method, [consts.INNER_JOIN, consts.LEFT_JOIN],
f"{descr}join_method")
self.check_boolean(self.new_sample_id, f"{descr}new_sample_id")
self.sample_id_generator = self.check_and_change_lower(self.sample_id_generator,
[consts.GUEST, consts.HOST],
f"{descr}sample_id_generator")
if self.join_method == consts.LEFT_JOIN:
if not self.sync_intersect_ids:
raise ValueError(f"Cannot perform left join without sync intersect ids")
self.check_boolean(self.run_cache, f"{descr} run_cache")
if self._warn_to_deprecate_param("encode_params", descr, "raw_params") or \
self._warn_to_deprecate_param("with_encode", descr, "raw_params' 'use_hash'"):
# self.encode_params.check()
if "with_encode" in self.get_user_feeded() and "raw_params.use_hash" in self.get_user_feeded():
raise ValueError(f"'raw_params' and 'encode_params' should not be set simultaneously.")
if "raw_params" in self.get_user_feeded() and "encode_params" in self.get_user_feeded():
raise ValueError(f"'raw_params' and 'encode_params' should not be set simultaneously.")
LOGGER.warning(f"Param values from 'encode_params' will override 'raw_params' settings.")
self.raw_params.use_hash = self.with_encode
self.raw_params.hash_method = self.encode_params.encode_method
self.raw_params.salt = self.encode_params.salt
self.raw_params.base64 = self.encode_params.base64
self.raw_params.check()
self.rsa_params.check()
self.dh_params.check()
# self.intersect_cache_param.check()
self.check_boolean(self.cardinality_only, f"{descr}cardinality_only")
self.check_boolean(self.sync_cardinality, f"{descr}sync_cardinality")
self.check_boolean(self.run_preprocess, f"{descr}run_preprocess")
self.intersect_preprocess_params.check()
if self.cardinality_only:
if self.intersect_method not in [consts.RSA]:
raise ValueError(f"cardinality-only mode only support rsa.")
if self.intersect_method == consts.RSA and self.rsa_params.split_calculation:
raise ValueError(f"cardinality-only mode only supports unified calculation.")
if self.run_preprocess:
if self.intersect_preprocess_params.false_positive_rate < 0.01:
raise ValueError(f"for preprocessing ids, false_positive_rate must be no less than 0.01")
if self.cardinality_only:
raise ValueError(f"cardinality_only mode cannot run preprocessing.")
if self.run_cache:
if self.intersect_method not in [consts.RSA, consts.DH]:
raise ValueError(f"Only rsa or dh method supports cache.")
if self.intersect_method == consts.RSA and self.rsa_params.split_calculation:
raise ValueError(f"RSA split_calculation does not support cache.")
if self.cardinality_only:
raise ValueError(f"cache is not available for cardinality_only mode.")
if self.run_preprocess:
raise ValueError(f"Preprocessing does not support cache.")
deprecated_param_list = ["repeated_id_process", "repeated_id_owner", "intersect_cache_param",
"allow_info_share", "info_owner", "with_sample_id"]
for param in deprecated_param_list:
self._warn_deprecated_param(param, descr)
LOGGER.debug("Finish intersect parameter check!")
return True
__init__(self, intersect_method='rsa', random_bit=128, sync_intersect_ids=True, join_role='guest', only_output_key=False, with_encode=False, encode_params=<federatedml.param.intersect_param.EncodeParam object at 0x7f3a40bcf9d0>, raw_params=<federatedml.param.intersect_param.RAWParam object at 0x7f3a40bcfb50>, rsa_params=<federatedml.param.intersect_param.RSAParam object at 0x7f3a40bcff50>, dh_params=<federatedml.param.intersect_param.DHParam object at 0x7f3a40bcfc10>, join_method='inner_join', new_sample_id=False, sample_id_generator='guest', intersect_cache_param=<federatedml.param.intersect_param.IntersectCache object at 0x7f3a40bcfa10>, run_cache=False, cardinality_only=False, sync_cardinality=False, run_preprocess=False, intersect_preprocess_params=<federatedml.param.intersect_param.IntersectPreProcessParam object at 0x7f3a40c35890>, repeated_id_process=False, repeated_id_owner='guest', with_sample_id=False, allow_info_share=False, info_owner='guest')
special
¶Source code in federatedml/param/intersect_param.py
def __init__(self, intersect_method: str = consts.RSA, random_bit=DEFAULT_RANDOM_BIT, sync_intersect_ids=True,
join_role=consts.GUEST, only_output_key: bool = False,
with_encode=False, encode_params=EncodeParam(),
raw_params=RAWParam(), rsa_params=RSAParam(), dh_params=DHParam(),
join_method=consts.INNER_JOIN, new_sample_id: bool = False, sample_id_generator=consts.GUEST,
intersect_cache_param=IntersectCache(), run_cache: bool = False,
cardinality_only: bool = False, sync_cardinality: bool = False,
run_preprocess: bool = False,
intersect_preprocess_params=IntersectPreProcessParam(),
repeated_id_process=False, repeated_id_owner=consts.GUEST,
with_sample_id=False, allow_info_share: bool = False, info_owner=consts.GUEST):
super().__init__()
self.intersect_method = intersect_method
self.random_bit = random_bit
self.sync_intersect_ids = sync_intersect_ids
self.join_role = join_role
self.with_encode = with_encode
self.encode_params = copy.deepcopy(encode_params)
self.raw_params = copy.deepcopy(raw_params)
self.rsa_params = copy.deepcopy(rsa_params)
self.only_output_key = only_output_key
self.sample_id_generator = sample_id_generator
self.intersect_cache_param = copy.deepcopy(intersect_cache_param)
self.run_cache = run_cache
self.repeated_id_process = repeated_id_process
self.repeated_id_owner = repeated_id_owner
self.allow_info_share = allow_info_share
self.info_owner = info_owner
self.with_sample_id = with_sample_id
self.join_method = join_method
self.new_sample_id = new_sample_id
self.dh_params = copy.deepcopy(dh_params)
self.cardinality_only = cardinality_only
self.sync_cardinality = sync_cardinality
self.run_preprocess = run_preprocess
self.intersect_preprocess_params = copy.deepcopy(intersect_preprocess_params)
check(self)
¶Source code in federatedml/param/intersect_param.py
def check(self):
descr = "intersect param's "
self.intersect_method = self.check_and_change_lower(self.intersect_method,
[consts.RSA, consts.RAW, consts.DH],
f"{descr}intersect_method")
if self._warn_to_deprecate_param("random_bit", descr, "rsa_params' 'random_bit'"):
if "rsa_params.random_bit" in self.get_user_feeded():
raise ValueError(f"random_bit and rsa_params.random_bit should not be set simultaneously")
self.rsa_params.random_bit = self.random_bit
self.check_boolean(self.sync_intersect_ids, f"{descr}intersect_ids")
if self._warn_to_deprecate_param("encode_param", "", ""):
if "raw_params" in self.get_user_feeded():
raise ValueError(f"encode_param and raw_params should not be set simultaneously")
else:
self.callback_param.callbacks = ["PerformanceEvaluate"]
if self._warn_to_deprecate_param("join_role", descr, "raw_params' 'join_role'"):
if "raw_params.join_role" in self.get_user_feeded():
raise ValueError(f"join_role and raw_params.join_role should not be set simultaneously")
self.raw_params.join_role = self.join_role
self.check_boolean(self.only_output_key, f"{descr}only_output_key")
self.join_method = self.check_and_change_lower(self.join_method, [consts.INNER_JOIN, consts.LEFT_JOIN],
f"{descr}join_method")
self.check_boolean(self.new_sample_id, f"{descr}new_sample_id")
self.sample_id_generator = self.check_and_change_lower(self.sample_id_generator,
[consts.GUEST, consts.HOST],
f"{descr}sample_id_generator")
if self.join_method == consts.LEFT_JOIN:
if not self.sync_intersect_ids:
raise ValueError(f"Cannot perform left join without sync intersect ids")
self.check_boolean(self.run_cache, f"{descr} run_cache")
if self._warn_to_deprecate_param("encode_params", descr, "raw_params") or \
self._warn_to_deprecate_param("with_encode", descr, "raw_params' 'use_hash'"):
# self.encode_params.check()
if "with_encode" in self.get_user_feeded() and "raw_params.use_hash" in self.get_user_feeded():
raise ValueError(f"'raw_params' and 'encode_params' should not be set simultaneously.")
if "raw_params" in self.get_user_feeded() and "encode_params" in self.get_user_feeded():
raise ValueError(f"'raw_params' and 'encode_params' should not be set simultaneously.")
LOGGER.warning(f"Param values from 'encode_params' will override 'raw_params' settings.")
self.raw_params.use_hash = self.with_encode
self.raw_params.hash_method = self.encode_params.encode_method
self.raw_params.salt = self.encode_params.salt
self.raw_params.base64 = self.encode_params.base64
self.raw_params.check()
self.rsa_params.check()
self.dh_params.check()
# self.intersect_cache_param.check()
self.check_boolean(self.cardinality_only, f"{descr}cardinality_only")
self.check_boolean(self.sync_cardinality, f"{descr}sync_cardinality")
self.check_boolean(self.run_preprocess, f"{descr}run_preprocess")
self.intersect_preprocess_params.check()
if self.cardinality_only:
if self.intersect_method not in [consts.RSA]:
raise ValueError(f"cardinality-only mode only support rsa.")
if self.intersect_method == consts.RSA and self.rsa_params.split_calculation:
raise ValueError(f"cardinality-only mode only supports unified calculation.")
if self.run_preprocess:
if self.intersect_preprocess_params.false_positive_rate < 0.01:
raise ValueError(f"for preprocessing ids, false_positive_rate must be no less than 0.01")
if self.cardinality_only:
raise ValueError(f"cardinality_only mode cannot run preprocessing.")
if self.run_cache:
if self.intersect_method not in [consts.RSA, consts.DH]:
raise ValueError(f"Only rsa or dh method supports cache.")
if self.intersect_method == consts.RSA and self.rsa_params.split_calculation:
raise ValueError(f"RSA split_calculation does not support cache.")
if self.cardinality_only:
raise ValueError(f"cache is not available for cardinality_only mode.")
if self.run_preprocess:
raise ValueError(f"Preprocessing does not support cache.")
deprecated_param_list = ["repeated_id_process", "repeated_id_owner", "intersect_cache_param",
"allow_info_share", "info_owner", "with_sample_id"]
for param in deprecated_param_list:
self._warn_deprecated_param(param, descr)
LOGGER.debug("Finish intersect parameter check!")
return True
label_transform_param
¶
Classes¶
LabelTransformParam (BaseParam)
¶Define label transform param that used in label transform.
Parameters¶
label_encoder : None or dict, default : None Specify (label, encoded label) key-value pairs for transforming labels to new values. e.g. {"Yes": 1, "No": 0}
label_list : None or list, default : None List all input labels, used for matching types of original keys in label_encoder dict, length should match key count in label_encoder e.g. ["Yes", "No"]
bool, default: True
Specify whether to run label transform
Source code in federatedml/param/label_transform_param.py
class LabelTransformParam(BaseParam):
"""
Define label transform param that used in label transform.
Parameters
----------
label_encoder : None or dict, default : None
Specify (label, encoded label) key-value pairs for transforming labels to new values.
e.g. {"Yes": 1, "No": 0}
label_list : None or list, default : None
List all input labels, used for matching types of original keys in label_encoder dict,
length should match key count in label_encoder
e.g. ["Yes", "No"]
need_run: bool, default: True
Specify whether to run label transform
"""
def __init__(self, label_encoder=None, label_list=None, need_run=True):
super(LabelTransformParam, self).__init__()
self.label_encoder = label_encoder
self.label_list = label_list
self.need_run = need_run
def check(self):
model_param_descr = "label transform param's "
BaseParam.check_boolean(self.need_run, f"{model_param_descr} need run ")
if self.label_encoder is not None:
if not isinstance(self.label_encoder, dict):
raise ValueError(f"{model_param_descr} label_encoder should be dict type")
if self.label_list is not None:
if not isinstance(self.label_list, list):
raise ValueError(f"{model_param_descr} label_list should be list type")
if self.label_encoder and len(self.label_list) != len(self.label_encoder.keys()):
raise ValueError(f"label_list length should match label_encoder key count")
LOGGER.debug("Finish label transformer parameter check!")
return True
__init__(self, label_encoder=None, label_list=None, need_run=True)
special
¶Source code in federatedml/param/label_transform_param.py
def __init__(self, label_encoder=None, label_list=None, need_run=True):
super(LabelTransformParam, self).__init__()
self.label_encoder = label_encoder
self.label_list = label_list
self.need_run = need_run
check(self)
¶Source code in federatedml/param/label_transform_param.py
def check(self):
model_param_descr = "label transform param's "
BaseParam.check_boolean(self.need_run, f"{model_param_descr} need run ")
if self.label_encoder is not None:
if not isinstance(self.label_encoder, dict):
raise ValueError(f"{model_param_descr} label_encoder should be dict type")
if self.label_list is not None:
if not isinstance(self.label_list, list):
raise ValueError(f"{model_param_descr} label_list should be list type")
if self.label_encoder and len(self.label_list) != len(self.label_encoder.keys()):
raise ValueError(f"label_list length should match label_encoder key count")
LOGGER.debug("Finish label transformer parameter check!")
return True
linear_regression_param
¶
Classes¶
LinearParam (LinearModelParam)
¶Parameters used for Linear Regression.
Parameters¶
penalty : {'L2' or 'L1'} Penalty method used in LinR. Please note that, when using encrypted version in HeteroLinR, 'L1' is not supported.
tol : float, default: 1e-4 The tolerance of convergence
alpha : float, default: 1.0 Regularization strength coefficient.
optimizer : {'sgd', 'rmsprop', 'adam', 'sqn', 'adagrad'} Optimize method
batch_size : int, default: -1 Batch size when updating model. -1 means use all data in a batch. i.e. Not to use mini-batch strategy.
learning_rate : float, default: 0.01 Learning rate
max_iter : int, default: 20 The maximum iteration for training.
InitParam object, default: default InitParam object
Init param method object.
early_stop : {'diff', 'abs', 'weight_dff'} Method used to judge convergence. a) diff: Use difference of loss between two iterations to judge whether converge. b) abs: Use the absolute value of loss to judge whether converge. i.e. if loss < tol, it is converged. c) weight_diff: Use difference between weights of two consecutive iterations
EncryptParam object, default: default EncryptParam object
encrypt param
EncryptedModeCalculatorParam object, default: default EncryptedModeCalculatorParam object
encrypted mode calculator param
CrossValidationParam object, default: default CrossValidationParam object
cv param
int or float, default: 1
Decay rate for learning rate. learning rate will follow the following decay schedule. lr = lr0/(1+decay*t) if decay_sqrt is False. If decay_sqrt is True, lr = lr0 / sqrt(1+decay*t) where t is the iter number.
Bool, default: True
lr = lr0/(1+decay*t) if decay_sqrt is False, otherwise, lr = lr0 / sqrt(1+decay*t)
int, list, tuple, set, or None
validation frequency during training, required when using early stopping. The default value is None, 1 is suggested. You can set it to a number larger than 1 in order to speed up training by skipping validation rounds. When it is larger than 1, a number which is divisible by "max_iter" is recommended, otherwise, you will miss the validation scores of the last training iteration.
int, default: None
If positive number specified, at every specified training rounds, program checks for early stopping criteria. Validation_freqs must also be set when using early stopping.
list or None, default: None
Specify which metrics to be used when performing evaluation during training process. If metrics have not improved at early_stopping rounds, trianing stops before convergence. If set as empty, default metrics will be used. For regression tasks, default metrics are ['root_mean_squared_error', 'mean_absolute_error']
bool, default: False
Indicate whether to use the first metric in metrics
as the only criterion for early stopping judgement.
None or integer
if not None, use floating_point_precision-bit to speed up calculation, e.g.: convert an x to round(x * 2**floating_point_precision) during Paillier operation, divide the result by 2**floating_point_precision in the end.
CallbackParam object
callback param
Source code in federatedml/param/linear_regression_param.py
class LinearParam(LinearModelParam):
"""
Parameters used for Linear Regression.
Parameters
----------
penalty : {'L2' or 'L1'}
Penalty method used in LinR. Please note that, when using encrypted version in HeteroLinR,
'L1' is not supported.
tol : float, default: 1e-4
The tolerance of convergence
alpha : float, default: 1.0
Regularization strength coefficient.
optimizer : {'sgd', 'rmsprop', 'adam', 'sqn', 'adagrad'}
Optimize method
batch_size : int, default: -1
Batch size when updating model. -1 means use all data in a batch. i.e. Not to use mini-batch strategy.
learning_rate : float, default: 0.01
Learning rate
max_iter : int, default: 20
The maximum iteration for training.
init_param: InitParam object, default: default InitParam object
Init param method object.
early_stop : {'diff', 'abs', 'weight_dff'}
Method used to judge convergence.
a) diff: Use difference of loss between two iterations to judge whether converge.
b) abs: Use the absolute value of loss to judge whether converge. i.e. if loss < tol, it is converged.
c) weight_diff: Use difference between weights of two consecutive iterations
encrypt_param: EncryptParam object, default: default EncryptParam object
encrypt param
encrypted_mode_calculator_param: EncryptedModeCalculatorParam object, default: default EncryptedModeCalculatorParam object
encrypted mode calculator param
cv_param: CrossValidationParam object, default: default CrossValidationParam object
cv param
decay: int or float, default: 1
Decay rate for learning rate. learning rate will follow the following decay schedule.
lr = lr0/(1+decay*t) if decay_sqrt is False. If decay_sqrt is True, lr = lr0 / sqrt(1+decay*t)
where t is the iter number.
decay_sqrt: Bool, default: True
lr = lr0/(1+decay*t) if decay_sqrt is False, otherwise, lr = lr0 / sqrt(1+decay*t)
validation_freqs: int, list, tuple, set, or None
validation frequency during training, required when using early stopping.
The default value is None, 1 is suggested. You can set it to a number larger than 1 in order to speed up training by skipping validation rounds.
When it is larger than 1, a number which is divisible by "max_iter" is recommended, otherwise, you will miss the validation scores of the last training iteration.
early_stopping_rounds: int, default: None
If positive number specified, at every specified training rounds, program checks for early stopping criteria.
Validation_freqs must also be set when using early stopping.
metrics: list or None, default: None
Specify which metrics to be used when performing evaluation during training process. If metrics have not improved at early_stopping rounds, trianing stops before convergence.
If set as empty, default metrics will be used. For regression tasks, default metrics are ['root_mean_squared_error', 'mean_absolute_error']
use_first_metric_only: bool, default: False
Indicate whether to use the first metric in `metrics` as the only criterion for early stopping judgement.
floating_point_precision: None or integer
if not None, use floating_point_precision-bit to speed up calculation,
e.g.: convert an x to round(x * 2**floating_point_precision) during Paillier operation, divide
the result by 2**floating_point_precision in the end.
callback_param: CallbackParam object
callback param
"""
def __init__(self, penalty='L2',
tol=1e-4, alpha=1.0, optimizer='sgd',
batch_size=-1, learning_rate=0.01, init_param=InitParam(),
max_iter=20, early_stop='diff',
encrypt_param=EncryptParam(), sqn_param=StochasticQuasiNewtonParam(),
encrypted_mode_calculator_param=EncryptedModeCalculatorParam(),
cv_param=CrossValidationParam(), decay=1, decay_sqrt=True, validation_freqs=None,
early_stopping_rounds=None, stepwise_param=StepwiseParam(), metrics=None, use_first_metric_only=False,
floating_point_precision=23, callback_param=CallbackParam()):
super(LinearParam, self).__init__(penalty=penalty, tol=tol, alpha=alpha, optimizer=optimizer,
batch_size=batch_size, learning_rate=learning_rate,
init_param=init_param, max_iter=max_iter, early_stop=early_stop,
encrypt_param=encrypt_param, cv_param=cv_param, decay=decay,
decay_sqrt=decay_sqrt, validation_freqs=validation_freqs,
early_stopping_rounds=early_stopping_rounds,
stepwise_param=stepwise_param, metrics=metrics,
use_first_metric_only=use_first_metric_only,
floating_point_precision=floating_point_precision,
callback_param=callback_param)
self.sqn_param = copy.deepcopy(sqn_param)
self.encrypted_mode_calculator_param = copy.deepcopy(encrypted_mode_calculator_param)
def check(self):
descr = "linear_regression_param's "
super(LinearParam, self).check()
if self.optimizer not in ['sgd', 'rmsprop', 'adam', 'adagrad', 'sqn']:
raise ValueError(
descr + "optimizer not supported, optimizer should be"
" 'sgd', 'rmsprop', 'adam', 'sqn' or 'adagrad'")
self.sqn_param.check()
if self.encrypt_param.method != consts.PAILLIER:
raise ValueError(
descr + "encrypt method supports 'Paillier' only")
return True
__init__(self, penalty='L2', tol=0.0001, alpha=1.0, optimizer='sgd', batch_size=-1, learning_rate=0.01, init_param=<federatedml.param.init_model_param.InitParam object at 0x7f3a40c768d0>, max_iter=20, early_stop='diff', encrypt_param=<federatedml.param.encrypt_param.EncryptParam object at 0x7f3a40c5cc90>, sqn_param=<federatedml.param.sqn_param.StochasticQuasiNewtonParam object at 0x7f3a40c5cb50>, encrypted_mode_calculator_param=<federatedml.param.encrypted_mode_calculation_param.EncryptedModeCalculatorParam object at 0x7f3a40c5ce10>, cv_param=<federatedml.param.cross_validation_param.CrossValidationParam object at 0x7f3a40c5cd90>, decay=1, decay_sqrt=True, validation_freqs=None, early_stopping_rounds=None, stepwise_param=<federatedml.param.stepwise_param.StepwiseParam object at 0x7f3a40c5ce90>, metrics=None, use_first_metric_only=False, floating_point_precision=23, callback_param=<federatedml.param.callback_param.CallbackParam object at 0x7f3a40c5ce50>)
special
¶Source code in federatedml/param/linear_regression_param.py
def __init__(self, penalty='L2',
tol=1e-4, alpha=1.0, optimizer='sgd',
batch_size=-1, learning_rate=0.01, init_param=InitParam(),
max_iter=20, early_stop='diff',
encrypt_param=EncryptParam(), sqn_param=StochasticQuasiNewtonParam(),
encrypted_mode_calculator_param=EncryptedModeCalculatorParam(),
cv_param=CrossValidationParam(), decay=1, decay_sqrt=True, validation_freqs=None,
early_stopping_rounds=None, stepwise_param=StepwiseParam(), metrics=None, use_first_metric_only=False,
floating_point_precision=23, callback_param=CallbackParam()):
super(LinearParam, self).__init__(penalty=penalty, tol=tol, alpha=alpha, optimizer=optimizer,
batch_size=batch_size, learning_rate=learning_rate,
init_param=init_param, max_iter=max_iter, early_stop=early_stop,
encrypt_param=encrypt_param, cv_param=cv_param, decay=decay,
decay_sqrt=decay_sqrt, validation_freqs=validation_freqs,
early_stopping_rounds=early_stopping_rounds,
stepwise_param=stepwise_param, metrics=metrics,
use_first_metric_only=use_first_metric_only,
floating_point_precision=floating_point_precision,
callback_param=callback_param)
self.sqn_param = copy.deepcopy(sqn_param)
self.encrypted_mode_calculator_param = copy.deepcopy(encrypted_mode_calculator_param)
check(self)
¶Source code in federatedml/param/linear_regression_param.py
def check(self):
descr = "linear_regression_param's "
super(LinearParam, self).check()
if self.optimizer not in ['sgd', 'rmsprop', 'adam', 'adagrad', 'sqn']:
raise ValueError(
descr + "optimizer not supported, optimizer should be"
" 'sgd', 'rmsprop', 'adam', 'sqn' or 'adagrad'")
self.sqn_param.check()
if self.encrypt_param.method != consts.PAILLIER:
raise ValueError(
descr + "encrypt method supports 'Paillier' only")
return True
local_baseline_param
¶
Classes¶
LocalBaselineParam (BaseParam)
¶Define the local baseline model param
Parameters¶
model_name : str sklearn model used to train on baseline model
model_opts : dict or none, default None Param to be used as input into baseline model
predict_param : PredictParam object, default: default PredictParam object predict param
bool, default True
Indicate if this module needed to be run
Source code in federatedml/param/local_baseline_param.py
class LocalBaselineParam(BaseParam):
"""
Define the local baseline model param
Parameters
----------
model_name : str
sklearn model used to train on baseline model
model_opts : dict or none, default None
Param to be used as input into baseline model
predict_param : PredictParam object, default: default PredictParam object
predict param
need_run: bool, default True
Indicate if this module needed to be run
"""
def __init__(self, model_name="LogisticRegression", model_opts=None, predict_param=PredictParam(), need_run=True):
super(LocalBaselineParam, self).__init__()
self.model_name = model_name
self.model_opts = model_opts
self.predict_param = copy.deepcopy(predict_param)
self.need_run = need_run
def check(self):
descr = "local baseline param"
self.model_name = self.check_and_change_lower(self.model_name,
["logisticregression"],
descr)
self.check_boolean(self.need_run, descr)
if self.model_opts is not None:
if not isinstance(self.model_opts, dict):
raise ValueError(descr + " model_opts must be None or dict.")
if self.model_opts is None:
self.model_opts = {}
self.predict_param.check()
return True
__init__(self, model_name='LogisticRegression', model_opts=None, predict_param=<federatedml.param.predict_param.PredictParam object at 0x7f3a40c5cf50>, need_run=True)
special
¶Source code in federatedml/param/local_baseline_param.py
def __init__(self, model_name="LogisticRegression", model_opts=None, predict_param=PredictParam(), need_run=True):
super(LocalBaselineParam, self).__init__()
self.model_name = model_name
self.model_opts = model_opts
self.predict_param = copy.deepcopy(predict_param)
self.need_run = need_run
check(self)
¶Source code in federatedml/param/local_baseline_param.py
def check(self):
descr = "local baseline param"
self.model_name = self.check_and_change_lower(self.model_name,
["logisticregression"],
descr)
self.check_boolean(self.need_run, descr)
if self.model_opts is not None:
if not isinstance(self.model_opts, dict):
raise ValueError(descr + " model_opts must be None or dict.")
if self.model_opts is None:
self.model_opts = {}
self.predict_param.check()
return True
logistic_regression_param
¶
Classes¶
LogisticParam (LinearModelParam)
¶Parameters used for Logistic Regression both for Homo mode or Hetero mode.
Parameters¶
penalty : {'L2', 'L1' or None} Penalty method used in LR. Please note that, when using encrypted version in HomoLR, 'L1' is not supported.
tol : float, default: 1e-4 The tolerance of convergence
alpha : float, default: 1.0 Regularization strength coefficient.
optimizer : {'rmsprop', 'sgd', 'adam', 'nesterov_momentum_sgd', 'adagrad'}, default: 'rmsprop' Optimize method.
batch_strategy : str, {'full', 'random'}, default: "full" Strategy to generate batch data. a) full: use full data to generate batch_data, batch_nums every iteration is ceil(data_size / batch_size) b) random: select data randomly from full data, batch_num will be 1 every iteration.
batch_size : int, default: -1 Batch size when updating model. -1 means use all data in a batch. i.e. Not to use mini-batch strategy.
shuffle : bool, default: True Work only in hetero logistic regression, batch data will be shuffle in every iteration.
int, float: default: 5
Use masked data to enhance security of hetero logistic regression
learning_rate : float, default: 0.01 Learning rate
max_iter : int, default: 100 The maximum iteration for training.
early_stop : {'diff', 'weight_diff', 'abs'}, default: 'diff' Method used to judge converge or not. a) diff: Use difference of loss between two iterations to judge whether converge. b) weight_diff: Use difference between weights of two consecutive iterations c) abs: Use the absolute value of loss to judge whether converge. i.e. if loss < eps, it is converged.
Please note that for hetero-lr multi-host situation, this parameter support "weight_diff" only.
int or float, default: 1
Decay rate for learning rate. learning rate will follow the following decay schedule. lr = lr0/(1+decay*t) if decay_sqrt is False. If decay_sqrt is True, lr = lr0 / sqrt(1+decay*t) where t is the iter number.
bool, default: True
lr = lr0/(1+decay*t) if decay_sqrt is False, otherwise, lr = lr0 / sqrt(1+decay*t)
EncryptParam object, default: default EncryptParam object
encrypt param
PredictParam object, default: default PredictParam object
predict param
CallbackParam object
callback param
CrossValidationParam object, default: default CrossValidationParam object
cv param
{'ovr'}, default: 'ovr'
If it is a multi_class task, indicate what strategy to use. Currently, support 'ovr' short for one_vs_rest only.
int or list or tuple or set, or None, default None
validation frequency during training.
int, default: None
Will stop training if one metric doesn’t improve in last early_stopping_round rounds
list or None, default: None
Indicate when executing evaluation during train process, which metrics will be used. If set as empty, default metrics for specific task type will be used. As for binary classification, default metrics are ['auc', 'ks']
bool, default: False
Indicate whether use the first metric only for early stopping judgement.
None or integer
if not None, use floating_point_precision-bit to speed up calculation, e.g.: convert an x to round(x * 2**floating_point_precision) during Paillier operation, divide the result by 2**floating_point_precision in the end.
Source code in federatedml/param/logistic_regression_param.py
class LogisticParam(LinearModelParam):
"""
Parameters used for Logistic Regression both for Homo mode or Hetero mode.
Parameters
----------
penalty : {'L2', 'L1' or None}
Penalty method used in LR. Please note that, when using encrypted version in HomoLR,
'L1' is not supported.
tol : float, default: 1e-4
The tolerance of convergence
alpha : float, default: 1.0
Regularization strength coefficient.
optimizer : {'rmsprop', 'sgd', 'adam', 'nesterov_momentum_sgd', 'adagrad'}, default: 'rmsprop'
Optimize method.
batch_strategy : str, {'full', 'random'}, default: "full"
Strategy to generate batch data.
a) full: use full data to generate batch_data, batch_nums every iteration is ceil(data_size / batch_size)
b) random: select data randomly from full data, batch_num will be 1 every iteration.
batch_size : int, default: -1
Batch size when updating model. -1 means use all data in a batch. i.e. Not to use mini-batch strategy.
shuffle : bool, default: True
Work only in hetero logistic regression, batch data will be shuffle in every iteration.
masked_rate: int, float: default: 5
Use masked data to enhance security of hetero logistic regression
learning_rate : float, default: 0.01
Learning rate
max_iter : int, default: 100
The maximum iteration for training.
early_stop : {'diff', 'weight_diff', 'abs'}, default: 'diff'
Method used to judge converge or not.
a) diff: Use difference of loss between two iterations to judge whether converge.
b) weight_diff: Use difference between weights of two consecutive iterations
c) abs: Use the absolute value of loss to judge whether converge. i.e. if loss < eps, it is converged.
Please note that for hetero-lr multi-host situation, this parameter support "weight_diff" only.
decay: int or float, default: 1
Decay rate for learning rate. learning rate will follow the following decay schedule.
lr = lr0/(1+decay*t) if decay_sqrt is False. If decay_sqrt is True, lr = lr0 / sqrt(1+decay*t)
where t is the iter number.
decay_sqrt: bool, default: True
lr = lr0/(1+decay*t) if decay_sqrt is False, otherwise, lr = lr0 / sqrt(1+decay*t)
encrypt_param: EncryptParam object, default: default EncryptParam object
encrypt param
predict_param: PredictParam object, default: default PredictParam object
predict param
callback_param: CallbackParam object
callback param
cv_param: CrossValidationParam object, default: default CrossValidationParam object
cv param
multi_class: {'ovr'}, default: 'ovr'
If it is a multi_class task, indicate what strategy to use. Currently, support 'ovr' short for one_vs_rest only.
validation_freqs: int or list or tuple or set, or None, default None
validation frequency during training.
early_stopping_rounds: int, default: None
Will stop training if one metric doesn’t improve in last early_stopping_round rounds
metrics: list or None, default: None
Indicate when executing evaluation during train process, which metrics will be used. If set as empty,
default metrics for specific task type will be used. As for binary classification, default metrics are
['auc', 'ks']
use_first_metric_only: bool, default: False
Indicate whether use the first metric only for early stopping judgement.
floating_point_precision: None or integer
if not None, use floating_point_precision-bit to speed up calculation,
e.g.: convert an x to round(x * 2**floating_point_precision) during Paillier operation, divide
the result by 2**floating_point_precision in the end.
"""
def __init__(self, penalty='L2',
tol=1e-4, alpha=1.0, optimizer='rmsprop',
batch_size=-1, shuffle=True, batch_strategy="full", masked_rate=5,
learning_rate=0.01, init_param=InitParam(),
max_iter=100, early_stop='diff', encrypt_param=EncryptParam(),
predict_param=PredictParam(), cv_param=CrossValidationParam(),
decay=1, decay_sqrt=True,
multi_class='ovr', validation_freqs=None, early_stopping_rounds=None,
stepwise_param=StepwiseParam(), floating_point_precision=23,
metrics=None,
use_first_metric_only=False,
callback_param=CallbackParam()
):
super(LogisticParam, self).__init__()
self.penalty = penalty
self.tol = tol
self.alpha = alpha
self.optimizer = optimizer
self.batch_size = batch_size
self.learning_rate = learning_rate
self.init_param = copy.deepcopy(init_param)
self.max_iter = max_iter
self.early_stop = early_stop
self.encrypt_param = encrypt_param
self.shuffle = shuffle
self.batch_strategy = batch_strategy
self.masked_rate = masked_rate
self.predict_param = copy.deepcopy(predict_param)
self.cv_param = copy.deepcopy(cv_param)
self.decay = decay
self.decay_sqrt = decay_sqrt
self.multi_class = multi_class
self.validation_freqs = validation_freqs
self.stepwise_param = copy.deepcopy(stepwise_param)
self.early_stopping_rounds = early_stopping_rounds
self.metrics = metrics or []
self.use_first_metric_only = use_first_metric_only
self.floating_point_precision = floating_point_precision
self.callback_param = copy.deepcopy(callback_param)
def check(self):
descr = "logistic_param's"
super(LogisticParam, self).check()
self.predict_param.check()
if self.encrypt_param.method not in [consts.PAILLIER, None]:
raise ValueError(
"logistic_param's encrypted method support 'Paillier' or None only")
self.multi_class = self.check_and_change_lower(self.multi_class, ["ovr"], f"{descr}")
if not isinstance(self.masked_rate, (float, int)) or self.masked_rate < 0:
raise ValueError("masked rate should be non-negative numeric number")
if not isinstance(self.batch_strategy, str) or self.batch_strategy.lower() not in ["full", "random"]:
raise ValueError("batch strategy should be full or random")
self.batch_strategy = self.batch_strategy.lower()
if not isinstance(self.shuffle, bool):
raise ValueError("shuffle should be boolean type")
return True
__init__(self, penalty='L2', tol=0.0001, alpha=1.0, optimizer='rmsprop', batch_size=-1, shuffle=True, batch_strategy='full', masked_rate=5, learning_rate=0.01, init_param=<federatedml.param.init_model_param.InitParam object at 0x7f3a40ba60d0>, max_iter=100, early_stop='diff', encrypt_param=<federatedml.param.encrypt_param.EncryptParam object at 0x7f3a40ba6210>, predict_param=<federatedml.param.predict_param.PredictParam object at 0x7f3a40ba62d0>, cv_param=<federatedml.param.cross_validation_param.CrossValidationParam object at 0x7f3a40ba6110>, decay=1, decay_sqrt=True, multi_class='ovr', validation_freqs=None, early_stopping_rounds=None, stepwise_param=<federatedml.param.stepwise_param.StepwiseParam object at 0x7f3a40ba63d0>, floating_point_precision=23, metrics=None, use_first_metric_only=False, callback_param=<federatedml.param.callback_param.CallbackParam object at 0x7f3a40ba6390>)
special
¶Source code in federatedml/param/logistic_regression_param.py
def __init__(self, penalty='L2',
tol=1e-4, alpha=1.0, optimizer='rmsprop',
batch_size=-1, shuffle=True, batch_strategy="full", masked_rate=5,
learning_rate=0.01, init_param=InitParam(),
max_iter=100, early_stop='diff', encrypt_param=EncryptParam(),
predict_param=PredictParam(), cv_param=CrossValidationParam(),
decay=1, decay_sqrt=True,
multi_class='ovr', validation_freqs=None, early_stopping_rounds=None,
stepwise_param=StepwiseParam(), floating_point_precision=23,
metrics=None,
use_first_metric_only=False,
callback_param=CallbackParam()
):
super(LogisticParam, self).__init__()
self.penalty = penalty
self.tol = tol
self.alpha = alpha
self.optimizer = optimizer
self.batch_size = batch_size
self.learning_rate = learning_rate
self.init_param = copy.deepcopy(init_param)
self.max_iter = max_iter
self.early_stop = early_stop
self.encrypt_param = encrypt_param
self.shuffle = shuffle
self.batch_strategy = batch_strategy
self.masked_rate = masked_rate
self.predict_param = copy.deepcopy(predict_param)
self.cv_param = copy.deepcopy(cv_param)
self.decay = decay
self.decay_sqrt = decay_sqrt
self.multi_class = multi_class
self.validation_freqs = validation_freqs
self.stepwise_param = copy.deepcopy(stepwise_param)
self.early_stopping_rounds = early_stopping_rounds
self.metrics = metrics or []
self.use_first_metric_only = use_first_metric_only
self.floating_point_precision = floating_point_precision
self.callback_param = copy.deepcopy(callback_param)
check(self)
¶Source code in federatedml/param/logistic_regression_param.py
def check(self):
descr = "logistic_param's"
super(LogisticParam, self).check()
self.predict_param.check()
if self.encrypt_param.method not in [consts.PAILLIER, None]:
raise ValueError(
"logistic_param's encrypted method support 'Paillier' or None only")
self.multi_class = self.check_and_change_lower(self.multi_class, ["ovr"], f"{descr}")
if not isinstance(self.masked_rate, (float, int)) or self.masked_rate < 0:
raise ValueError("masked rate should be non-negative numeric number")
if not isinstance(self.batch_strategy, str) or self.batch_strategy.lower() not in ["full", "random"]:
raise ValueError("batch strategy should be full or random")
self.batch_strategy = self.batch_strategy.lower()
if not isinstance(self.shuffle, bool):
raise ValueError("shuffle should be boolean type")
return True
HomoLogisticParam (LogisticParam)
¶Parameters¶
re_encrypt_batches : int, default: 2 Required when using encrypted version HomoLR. Since multiple batch updating coefficient may cause overflow error. The model need to be re-encrypt for every several batches. Please be careful when setting this parameter. Too large batches may cause training failure.
aggregate_iters : int, default: 1 Indicate how many iterations are aggregated once.
bool, default: False
Whether to turn on additional proximial term. For more details of FedProx, Please refer to https://arxiv.org/abs/1812.06127
float, default 0.1
To scale the proximal term
Source code in federatedml/param/logistic_regression_param.py
class HomoLogisticParam(LogisticParam):
"""
Parameters
----------
re_encrypt_batches : int, default: 2
Required when using encrypted version HomoLR. Since multiple batch updating coefficient may cause
overflow error. The model need to be re-encrypt for every several batches. Please be careful when setting
this parameter. Too large batches may cause training failure.
aggregate_iters : int, default: 1
Indicate how many iterations are aggregated once.
use_proximal: bool, default: False
Whether to turn on additional proximial term. For more details of FedProx, Please refer to
https://arxiv.org/abs/1812.06127
mu: float, default 0.1
To scale the proximal term
"""
def __init__(self, penalty='L2',
tol=1e-4, alpha=1.0, optimizer='rmsprop',
batch_size=-1, learning_rate=0.01, init_param=InitParam(),
max_iter=100, early_stop='diff',
encrypt_param=EncryptParam(method=None), re_encrypt_batches=2,
predict_param=PredictParam(), cv_param=CrossValidationParam(),
decay=1, decay_sqrt=True,
aggregate_iters=1, multi_class='ovr', validation_freqs=None,
early_stopping_rounds=None,
metrics=['auc', 'ks'],
use_first_metric_only=False,
use_proximal=False,
mu=0.1, callback_param=CallbackParam()
):
super(HomoLogisticParam, self).__init__(penalty=penalty, tol=tol, alpha=alpha, optimizer=optimizer,
batch_size=batch_size,
learning_rate=learning_rate,
init_param=init_param, max_iter=max_iter, early_stop=early_stop,
encrypt_param=encrypt_param, predict_param=predict_param,
cv_param=cv_param, multi_class=multi_class,
validation_freqs=validation_freqs,
decay=decay, decay_sqrt=decay_sqrt,
early_stopping_rounds=early_stopping_rounds,
metrics=metrics, use_first_metric_only=use_first_metric_only,
callback_param=callback_param)
self.re_encrypt_batches = re_encrypt_batches
self.aggregate_iters = aggregate_iters
self.use_proximal = use_proximal
self.mu = mu
def check(self):
super().check()
if type(self.re_encrypt_batches).__name__ != "int":
raise ValueError(
"logistic_param's re_encrypt_batches {} not supported, should be int type".format(
self.re_encrypt_batches))
elif self.re_encrypt_batches < 0:
raise ValueError(
"logistic_param's re_encrypt_batches must be greater or equal to 0")
if not isinstance(self.aggregate_iters, int):
raise ValueError(
"logistic_param's aggregate_iters {} not supported, should be int type".format(
self.aggregate_iters))
if self.encrypt_param.method == consts.PAILLIER:
if self.optimizer != 'sgd':
raise ValueError("Paillier encryption mode supports 'sgd' optimizer method only.")
if self.penalty == consts.L1_PENALTY:
raise ValueError("Paillier encryption mode supports 'L2' penalty or None only.")
return True
__init__(self, penalty='L2', tol=0.0001, alpha=1.0, optimizer='rmsprop', batch_size=-1, learning_rate=0.01, init_param=<federatedml.param.init_model_param.InitParam object at 0x7f3a40ba6490>, max_iter=100, early_stop='diff', encrypt_param=<federatedml.param.encrypt_param.EncryptParam object at 0x7f3a40ba64d0>, re_encrypt_batches=2, predict_param=<federatedml.param.predict_param.PredictParam object at 0x7f3a40ba6310>, cv_param=<federatedml.param.cross_validation_param.CrossValidationParam object at 0x7f3a40ba6350>, decay=1, decay_sqrt=True, aggregate_iters=1, multi_class='ovr', validation_freqs=None, early_stopping_rounds=None, metrics=['auc', 'ks'], use_first_metric_only=False, use_proximal=False, mu=0.1, callback_param=<federatedml.param.callback_param.CallbackParam object at 0x7f3a40ba6550>)
special
¶Source code in federatedml/param/logistic_regression_param.py
def __init__(self, penalty='L2',
tol=1e-4, alpha=1.0, optimizer='rmsprop',
batch_size=-1, learning_rate=0.01, init_param=InitParam(),
max_iter=100, early_stop='diff',
encrypt_param=EncryptParam(method=None), re_encrypt_batches=2,
predict_param=PredictParam(), cv_param=CrossValidationParam(),
decay=1, decay_sqrt=True,
aggregate_iters=1, multi_class='ovr', validation_freqs=None,
early_stopping_rounds=None,
metrics=['auc', 'ks'],
use_first_metric_only=False,
use_proximal=False,
mu=0.1, callback_param=CallbackParam()
):
super(HomoLogisticParam, self).__init__(penalty=penalty, tol=tol, alpha=alpha, optimizer=optimizer,
batch_size=batch_size,
learning_rate=learning_rate,
init_param=init_param, max_iter=max_iter, early_stop=early_stop,
encrypt_param=encrypt_param, predict_param=predict_param,
cv_param=cv_param, multi_class=multi_class,
validation_freqs=validation_freqs,
decay=decay, decay_sqrt=decay_sqrt,
early_stopping_rounds=early_stopping_rounds,
metrics=metrics, use_first_metric_only=use_first_metric_only,
callback_param=callback_param)
self.re_encrypt_batches = re_encrypt_batches
self.aggregate_iters = aggregate_iters
self.use_proximal = use_proximal
self.mu = mu
check(self)
¶Source code in federatedml/param/logistic_regression_param.py
def check(self):
super().check()
if type(self.re_encrypt_batches).__name__ != "int":
raise ValueError(
"logistic_param's re_encrypt_batches {} not supported, should be int type".format(
self.re_encrypt_batches))
elif self.re_encrypt_batches < 0:
raise ValueError(
"logistic_param's re_encrypt_batches must be greater or equal to 0")
if not isinstance(self.aggregate_iters, int):
raise ValueError(
"logistic_param's aggregate_iters {} not supported, should be int type".format(
self.aggregate_iters))
if self.encrypt_param.method == consts.PAILLIER:
if self.optimizer != 'sgd':
raise ValueError("Paillier encryption mode supports 'sgd' optimizer method only.")
if self.penalty == consts.L1_PENALTY:
raise ValueError("Paillier encryption mode supports 'L2' penalty or None only.")
return True
HeteroLogisticParam (LogisticParam)
¶Source code in federatedml/param/logistic_regression_param.py
class HeteroLogisticParam(LogisticParam):
def __init__(self, penalty='L2',
tol=1e-4, alpha=1.0, optimizer='rmsprop',
batch_size=-1, shuffle=True, batch_strategy="full", masked_rate=5,
learning_rate=0.01, init_param=InitParam(),
max_iter=100, early_stop='diff',
encrypted_mode_calculator_param=EncryptedModeCalculatorParam(),
predict_param=PredictParam(), cv_param=CrossValidationParam(),
decay=1, decay_sqrt=True, sqn_param=StochasticQuasiNewtonParam(),
multi_class='ovr', validation_freqs=None, early_stopping_rounds=None,
metrics=['auc', 'ks'], floating_point_precision=23,
encrypt_param=EncryptParam(),
use_first_metric_only=False, stepwise_param=StepwiseParam(),
callback_param=CallbackParam()
):
super(HeteroLogisticParam, self).__init__(penalty=penalty, tol=tol, alpha=alpha, optimizer=optimizer,
batch_size=batch_size, shuffle=shuffle, batch_strategy=batch_strategy,
masked_rate=masked_rate,
learning_rate=learning_rate,
init_param=init_param, max_iter=max_iter, early_stop=early_stop,
predict_param=predict_param, cv_param=cv_param,
decay=decay,
decay_sqrt=decay_sqrt, multi_class=multi_class,
validation_freqs=validation_freqs,
early_stopping_rounds=early_stopping_rounds,
metrics=metrics, floating_point_precision=floating_point_precision,
encrypt_param=encrypt_param,
use_first_metric_only=use_first_metric_only,
stepwise_param=stepwise_param,
callback_param=callback_param)
self.encrypted_mode_calculator_param = copy.deepcopy(encrypted_mode_calculator_param)
self.sqn_param = copy.deepcopy(sqn_param)
def check(self):
super().check()
self.encrypted_mode_calculator_param.check()
self.sqn_param.check()
return True
__init__(self, penalty='L2', tol=0.0001, alpha=1.0, optimizer='rmsprop', batch_size=-1, shuffle=True, batch_strategy='full', masked_rate=5, learning_rate=0.01, init_param=<federatedml.param.init_model_param.InitParam object at 0x7f3a40ba6610>, max_iter=100, early_stop='diff', encrypted_mode_calculator_param=<federatedml.param.encrypted_mode_calculation_param.EncryptedModeCalculatorParam object at 0x7f3a40ba65d0>, predict_param=<federatedml.param.predict_param.PredictParam object at 0x7f3a40ba66d0>, cv_param=<federatedml.param.cross_validation_param.CrossValidationParam object at 0x7f3a40ba6650>, decay=1, decay_sqrt=True, sqn_param=<federatedml.param.sqn_param.StochasticQuasiNewtonParam object at 0x7f3a40ba6cd0>, multi_class='ovr', validation_freqs=None, early_stopping_rounds=None, metrics=['auc', 'ks'], floating_point_precision=23, encrypt_param=<federatedml.param.encrypt_param.EncryptParam object at 0x7f3a40ba6c90>, use_first_metric_only=False, stepwise_param=<federatedml.param.stepwise_param.StepwiseParam object at 0x7f3a40ba6d10>, callback_param=<federatedml.param.callback_param.CallbackParam object at 0x7f3a40ba6d90>)
special
¶Source code in federatedml/param/logistic_regression_param.py
def __init__(self, penalty='L2',
tol=1e-4, alpha=1.0, optimizer='rmsprop',
batch_size=-1, shuffle=True, batch_strategy="full", masked_rate=5,
learning_rate=0.01, init_param=InitParam(),
max_iter=100, early_stop='diff',
encrypted_mode_calculator_param=EncryptedModeCalculatorParam(),
predict_param=PredictParam(), cv_param=CrossValidationParam(),
decay=1, decay_sqrt=True, sqn_param=StochasticQuasiNewtonParam(),
multi_class='ovr', validation_freqs=None, early_stopping_rounds=None,
metrics=['auc', 'ks'], floating_point_precision=23,
encrypt_param=EncryptParam(),
use_first_metric_only=False, stepwise_param=StepwiseParam(),
callback_param=CallbackParam()
):
super(HeteroLogisticParam, self).__init__(penalty=penalty, tol=tol, alpha=alpha, optimizer=optimizer,
batch_size=batch_size, shuffle=shuffle, batch_strategy=batch_strategy,
masked_rate=masked_rate,
learning_rate=learning_rate,
init_param=init_param, max_iter=max_iter, early_stop=early_stop,
predict_param=predict_param, cv_param=cv_param,
decay=decay,
decay_sqrt=decay_sqrt, multi_class=multi_class,
validation_freqs=validation_freqs,
early_stopping_rounds=early_stopping_rounds,
metrics=metrics, floating_point_precision=floating_point_precision,
encrypt_param=encrypt_param,
use_first_metric_only=use_first_metric_only,
stepwise_param=stepwise_param,
callback_param=callback_param)
self.encrypted_mode_calculator_param = copy.deepcopy(encrypted_mode_calculator_param)
self.sqn_param = copy.deepcopy(sqn_param)
check(self)
¶Source code in federatedml/param/logistic_regression_param.py
def check(self):
super().check()
self.encrypted_mode_calculator_param.check()
self.sqn_param.check()
return True
one_vs_rest_param
¶
Classes¶
OneVsRestParam (BaseParam)
¶Define the one_vs_rest parameters.
Parameters¶
bool, default: true
For some algorithm, may not has arbiter, for instances, secureboost of FATE, for these algorithms, it should be set to false.
Source code in federatedml/param/one_vs_rest_param.py
class OneVsRestParam(BaseParam):
"""
Define the one_vs_rest parameters.
Parameters
----------
has_arbiter: bool, default: true
For some algorithm, may not has arbiter, for instances, secureboost of FATE,
for these algorithms, it should be set to false.
"""
def __init__(self, need_one_vs_rest=False, has_arbiter=True):
super().__init__()
self.need_one_vs_rest = need_one_vs_rest
self.has_arbiter = has_arbiter
def check(self):
if type(self.has_arbiter).__name__ != "bool":
raise ValueError(
"one_vs_rest param's has_arbiter {} not supported, should be bool type".format(
self.has_arbiter))
LOGGER.debug("Finish one_vs_rest parameter check!")
return True
__init__(self, need_one_vs_rest=False, has_arbiter=True)
special
¶Source code in federatedml/param/one_vs_rest_param.py
def __init__(self, need_one_vs_rest=False, has_arbiter=True):
super().__init__()
self.need_one_vs_rest = need_one_vs_rest
self.has_arbiter = has_arbiter
check(self)
¶Source code in federatedml/param/one_vs_rest_param.py
def check(self):
if type(self.has_arbiter).__name__ != "bool":
raise ValueError(
"one_vs_rest param's has_arbiter {} not supported, should be bool type".format(
self.has_arbiter))
LOGGER.debug("Finish one_vs_rest parameter check!")
return True
onehot_encoder_param
¶
Classes¶
OneHotEncoderParam (BaseParam)
¶Parameters¶
list or int, default: -1
Specify which columns need to calculated. -1 represent for all columns.
transform_col_names : list of string, default: [] Specify which columns need to calculated. Each element in the list represent for a column name in header.
bool, default True
Indicate if this module needed to be run
Source code in federatedml/param/onehot_encoder_param.py
class OneHotEncoderParam(BaseParam):
"""
Parameters
----------
transform_col_indexes: list or int, default: -1
Specify which columns need to calculated. -1 represent for all columns.
transform_col_names : list of string, default: []
Specify which columns need to calculated. Each element in the list represent for a column name in header.
need_run: bool, default True
Indicate if this module needed to be run
"""
def __init__(self, transform_col_indexes=-1, transform_col_names=None, need_run=True):
super(OneHotEncoderParam, self).__init__()
if transform_col_names is None:
transform_col_names = []
self.transform_col_indexes = transform_col_indexes
self.transform_col_names = transform_col_names
self.need_run = need_run
def check(self):
descr = "One-hot encoder param's"
self.check_defined_type(self.transform_col_indexes, descr, ['list', 'int', 'NoneType'])
self.check_defined_type(self.transform_col_names, descr, ['list', 'NoneType'])
return True
__init__(self, transform_col_indexes=-1, transform_col_names=None, need_run=True)
special
¶Source code in federatedml/param/onehot_encoder_param.py
def __init__(self, transform_col_indexes=-1, transform_col_names=None, need_run=True):
super(OneHotEncoderParam, self).__init__()
if transform_col_names is None:
transform_col_names = []
self.transform_col_indexes = transform_col_indexes
self.transform_col_names = transform_col_names
self.need_run = need_run
check(self)
¶Source code in federatedml/param/onehot_encoder_param.py
def check(self):
descr = "One-hot encoder param's"
self.check_defined_type(self.transform_col_indexes, descr, ['list', 'int', 'NoneType'])
self.check_defined_type(self.transform_col_names, descr, ['list', 'NoneType'])
return True
pearson_param
¶
Classes¶
PearsonParam (BaseParam)
¶param for pearson correlation
Parameters¶
column_names : list of string list of column names
column_index : list of int list of column index
cross_parties : bool, default: True if True, calculate correlation of columns from both party
need_run : bool set False to skip this party
use_mix_rand : bool, defalut: False mix system random and pseudo random for quicker calculation
calc_loca_vif : bool, default True calculate VIF for columns in local
Source code in federatedml/param/pearson_param.py
class PearsonParam(BaseParam):
"""
param for pearson correlation
Parameters
----------
column_names : list of string
list of column names
column_index : list of int
list of column index
cross_parties : bool, default: True
if True, calculate correlation of columns from both party
need_run : bool
set False to skip this party
use_mix_rand : bool, defalut: False
mix system random and pseudo random for quicker calculation
calc_loca_vif : bool, default True
calculate VIF for columns in local
"""
def __init__(
self,
column_names=None,
column_indexes=None,
cross_parties=True,
need_run=True,
use_mix_rand=False,
calc_local_vif=True,
):
super().__init__()
self.column_names = column_names
self.column_indexes = column_indexes
self.cross_parties = cross_parties
self.need_run = need_run
self.use_mix_rand = use_mix_rand
if column_names is None:
self.column_names = []
if column_indexes is None:
self.column_indexes = []
self.calc_local_vif = calc_local_vif
def check(self):
if not isinstance(self.use_mix_rand, bool):
raise ValueError(
f"use_mix_rand accept bool type only, {type(self.use_mix_rand)} got"
)
if self.cross_parties and (not self.need_run):
raise ValueError(
f"need_run should be True(which is default) when cross_parties is True."
)
if not isinstance(self.column_names, list):
raise ValueError(
f"type mismatch, column_names with type {type(self.column_names)}"
)
for name in self.column_names:
if not isinstance(name, str):
raise ValueError(
f"type mismatch, column_names with element {name}(type is {type(name)})"
)
if isinstance(self.column_indexes, list):
for idx in self.column_indexes:
if not isinstance(idx, int):
raise ValueError(
f"type mismatch, column_indexes with element {idx}(type is {type(idx)})"
)
if isinstance(self.column_indexes, int) and self.column_indexes != -1:
raise ValueError(
f"column_indexes with type int and value {self.column_indexes}(only -1 allowed)"
)
if self.need_run:
if isinstance(self.column_indexes, list) and isinstance(
self.column_names, list
):
if len(self.column_indexes) == 0 and len(self.column_names) == 0:
raise ValueError(f"provide at least one column")
__init__(self, column_names=None, column_indexes=None, cross_parties=True, need_run=True, use_mix_rand=False, calc_local_vif=True)
special
¶Source code in federatedml/param/pearson_param.py
def __init__(
self,
column_names=None,
column_indexes=None,
cross_parties=True,
need_run=True,
use_mix_rand=False,
calc_local_vif=True,
):
super().__init__()
self.column_names = column_names
self.column_indexes = column_indexes
self.cross_parties = cross_parties
self.need_run = need_run
self.use_mix_rand = use_mix_rand
if column_names is None:
self.column_names = []
if column_indexes is None:
self.column_indexes = []
self.calc_local_vif = calc_local_vif
check(self)
¶Source code in federatedml/param/pearson_param.py
def check(self):
if not isinstance(self.use_mix_rand, bool):
raise ValueError(
f"use_mix_rand accept bool type only, {type(self.use_mix_rand)} got"
)
if self.cross_parties and (not self.need_run):
raise ValueError(
f"need_run should be True(which is default) when cross_parties is True."
)
if not isinstance(self.column_names, list):
raise ValueError(
f"type mismatch, column_names with type {type(self.column_names)}"
)
for name in self.column_names:
if not isinstance(name, str):
raise ValueError(
f"type mismatch, column_names with element {name}(type is {type(name)})"
)
if isinstance(self.column_indexes, list):
for idx in self.column_indexes:
if not isinstance(idx, int):
raise ValueError(
f"type mismatch, column_indexes with element {idx}(type is {type(idx)})"
)
if isinstance(self.column_indexes, int) and self.column_indexes != -1:
raise ValueError(
f"column_indexes with type int and value {self.column_indexes}(only -1 allowed)"
)
if self.need_run:
if isinstance(self.column_indexes, list) and isinstance(
self.column_names, list
):
if len(self.column_indexes) == 0 and len(self.column_names) == 0:
raise ValueError(f"provide at least one column")
poisson_regression_param
¶
Classes¶
PoissonParam (LinearModelParam)
¶Parameters used for Poisson Regression.
Parameters¶
penalty : {'L2', 'L1'}, default: 'L2' Penalty method used in Poisson. Please note that, when using encrypted version in HeteroPoisson, 'L1' is not supported.
tol : float, default: 1e-4 The tolerance of convergence
alpha : float, default: 1.0 Regularization strength coefficient.
optimizer : {'rmsprop', 'sgd', 'adam', 'adagrad'}, default: 'rmsprop' Optimize method
batch_size : int, default: -1 Batch size when updating model. -1 means use all data in a batch. i.e. Not to use mini-batch strategy.
learning_rate : float, default: 0.01 Learning rate
max_iter : int, default: 20 The maximum iteration for training.
InitParam object, default: default InitParam object
Init param method object.
early_stop : str, 'weight_diff', 'diff' or 'abs', default: 'diff' Method used to judge convergence. a) diff: Use difference of loss between two iterations to judge whether converge. b) weight_diff: Use difference between weights of two consecutive iterations c) abs: Use the absolute value of loss to judge whether converge. i.e. if loss < eps, it is converged.
str or None, default: None
Name of optional exposure variable in dTable.
EncryptParam object, default: default EncryptParam object
encrypt param
EncryptedModeCalculatorParam object, default: default EncryptedModeCalculatorParam object
encrypted mode calculator param
CrossValidationParam object, default: default CrossValidationParam object
cv param
StepwiseParam object, default: default StepwiseParam object
stepwise param
int or float, default: 1
Decay rate for learning rate. learning rate will follow the following decay schedule. lr = lr0/(1+decay*t) if decay_sqrt is False. If decay_sqrt is True, lr = lr0 / sqrt(1+decay*t) where t is the iter number.
bool, default: True
lr = lr0/(1+decay*t) if decay_sqrt is False, otherwise, lr = lr0 / sqrt(1+decay*t)
int, list, tuple, set, or None
validation frequency during training, required when using early stopping. The default value is None, 1 is suggested. You can set it to a number larger than 1 in order to speed up training by skipping validation rounds. When it is larger than 1, a number which is divisible by "max_iter" is recommended, otherwise, you will miss the validation scores of the last training iteration.
int, default: None
If positive number specified, at every specified training rounds, program checks for early stopping criteria. Validation_freqs must also be set when using early stopping.
list or None, default: None
Specify which metrics to be used when performing evaluation during training process. If metrics have not improved at early_stopping rounds, trianing stops before convergence. If set as empty, default metrics will be used. For regression tasks, default metrics are ['root_mean_squared_error', 'mean_absolute_error']
bool, default: False
Indicate whether to use the first metric in metrics
as the only criterion for early stopping judgement.
None or integer
if not None, use floating_point_precision-bit to speed up calculation, e.g.: convert an x to round(x * 2**floating_point_precision) during Paillier operation, divide the result by 2**floating_point_precision in the end.
CallbackParam object
callback param
Source code in federatedml/param/poisson_regression_param.py
class PoissonParam(LinearModelParam):
"""
Parameters used for Poisson Regression.
Parameters
----------
penalty : {'L2', 'L1'}, default: 'L2'
Penalty method used in Poisson. Please note that, when using encrypted version in HeteroPoisson,
'L1' is not supported.
tol : float, default: 1e-4
The tolerance of convergence
alpha : float, default: 1.0
Regularization strength coefficient.
optimizer : {'rmsprop', 'sgd', 'adam', 'adagrad'}, default: 'rmsprop'
Optimize method
batch_size : int, default: -1
Batch size when updating model. -1 means use all data in a batch. i.e. Not to use mini-batch strategy.
learning_rate : float, default: 0.01
Learning rate
max_iter : int, default: 20
The maximum iteration for training.
init_param: InitParam object, default: default InitParam object
Init param method object.
early_stop : str, 'weight_diff', 'diff' or 'abs', default: 'diff'
Method used to judge convergence.
a) diff: Use difference of loss between two iterations to judge whether converge.
b) weight_diff: Use difference between weights of two consecutive iterations
c) abs: Use the absolute value of loss to judge whether converge. i.e. if loss < eps, it is converged.
exposure_colname: str or None, default: None
Name of optional exposure variable in dTable.
encrypt_param: EncryptParam object, default: default EncryptParam object
encrypt param
encrypted_mode_calculator_param: EncryptedModeCalculatorParam object, default: default EncryptedModeCalculatorParam object
encrypted mode calculator param
cv_param: CrossValidationParam object, default: default CrossValidationParam object
cv param
stepwise_param: StepwiseParam object, default: default StepwiseParam object
stepwise param
decay: int or float, default: 1
Decay rate for learning rate. learning rate will follow the following decay schedule.
lr = lr0/(1+decay*t) if decay_sqrt is False. If decay_sqrt is True, lr = lr0 / sqrt(1+decay*t)
where t is the iter number.
decay_sqrt: bool, default: True
lr = lr0/(1+decay*t) if decay_sqrt is False, otherwise, lr = lr0 / sqrt(1+decay*t)
validation_freqs: int, list, tuple, set, or None
validation frequency during training, required when using early stopping.
The default value is None, 1 is suggested. You can set it to a number larger than 1 in order to speed up training by skipping validation rounds.
When it is larger than 1, a number which is divisible by "max_iter" is recommended, otherwise, you will miss the validation scores of the last training iteration.
early_stopping_rounds: int, default: None
If positive number specified, at every specified training rounds, program checks for early stopping criteria.
Validation_freqs must also be set when using early stopping.
metrics: list or None, default: None
Specify which metrics to be used when performing evaluation during training process. If metrics have not improved at early_stopping rounds, trianing stops before convergence.
If set as empty, default metrics will be used. For regression tasks, default metrics are ['root_mean_squared_error', 'mean_absolute_error']
use_first_metric_only: bool, default: False
Indicate whether to use the first metric in `metrics` as the only criterion for early stopping judgement.
floating_point_precision: None or integer
if not None, use floating_point_precision-bit to speed up calculation,
e.g.: convert an x to round(x * 2**floating_point_precision) during Paillier operation, divide
the result by 2**floating_point_precision in the end.
callback_param: CallbackParam object
callback param
"""
def __init__(self, penalty='L2',
tol=1e-4, alpha=1.0, optimizer='rmsprop',
batch_size=-1, learning_rate=0.01, init_param=InitParam(),
max_iter=20, early_stop='diff',
exposure_colname=None,
encrypt_param=EncryptParam(),
encrypted_mode_calculator_param=EncryptedModeCalculatorParam(),
cv_param=CrossValidationParam(), stepwise_param=StepwiseParam(),
decay=1, decay_sqrt=True,
validation_freqs=None, early_stopping_rounds=None, metrics=None, use_first_metric_only=False,
floating_point_precision=23, callback_param=CallbackParam()):
super(PoissonParam, self).__init__(penalty=penalty, tol=tol, alpha=alpha, optimizer=optimizer,
batch_size=batch_size, learning_rate=learning_rate,
init_param=init_param, max_iter=max_iter,
early_stop=early_stop, cv_param=cv_param, decay=decay,
decay_sqrt=decay_sqrt, validation_freqs=validation_freqs,
early_stopping_rounds=early_stopping_rounds, metrics=metrics,
floating_point_precision=floating_point_precision,
encrypt_param=encrypt_param,
use_first_metric_only=use_first_metric_only,
stepwise_param=stepwise_param,
callback_param=callback_param)
self.encrypted_mode_calculator_param = copy.deepcopy(encrypted_mode_calculator_param)
self.exposure_colname = exposure_colname
def check(self):
descr = "poisson_regression_param's "
super(PoissonParam, self).check()
if self.encrypt_param.method != consts.PAILLIER:
raise ValueError(
descr + "encrypt method supports 'Paillier' only")
if self.optimizer not in ['sgd', 'rmsprop', 'adam', 'adagrad']:
raise ValueError(
descr + "optimizer not supported, optimizer should be"
" 'sgd', 'rmsprop', 'adam', or 'adagrad'")
if self.exposure_colname is not None:
if type(self.exposure_colname).__name__ != "str":
raise ValueError(
descr + "exposure_colname {} not supported, should be string type".format(self.exposure_colname))
self.encrypted_mode_calculator_param.check()
return True
__init__(self, penalty='L2', tol=0.0001, alpha=1.0, optimizer='rmsprop', batch_size=-1, learning_rate=0.01, init_param=<federatedml.param.init_model_param.InitParam object at 0x7f3a40bc51d0>, max_iter=20, early_stop='diff', exposure_colname=None, encrypt_param=<federatedml.param.encrypt_param.EncryptParam object at 0x7f3a40bc5310>, encrypted_mode_calculator_param=<federatedml.param.encrypted_mode_calculation_param.EncryptedModeCalculatorParam object at 0x7f3a40bc5350>, cv_param=<federatedml.param.cross_validation_param.CrossValidationParam object at 0x7f3a40bc5290>, stepwise_param=<federatedml.param.stepwise_param.StepwiseParam object at 0x7f3a40bc57d0>, decay=1, decay_sqrt=True, validation_freqs=None, early_stopping_rounds=None, metrics=None, use_first_metric_only=False, floating_point_precision=23, callback_param=<federatedml.param.callback_param.CallbackParam object at 0x7f3a40bc5790>)
special
¶Source code in federatedml/param/poisson_regression_param.py
def __init__(self, penalty='L2',
tol=1e-4, alpha=1.0, optimizer='rmsprop',
batch_size=-1, learning_rate=0.01, init_param=InitParam(),
max_iter=20, early_stop='diff',
exposure_colname=None,
encrypt_param=EncryptParam(),
encrypted_mode_calculator_param=EncryptedModeCalculatorParam(),
cv_param=CrossValidationParam(), stepwise_param=StepwiseParam(),
decay=1, decay_sqrt=True,
validation_freqs=None, early_stopping_rounds=None, metrics=None, use_first_metric_only=False,
floating_point_precision=23, callback_param=CallbackParam()):
super(PoissonParam, self).__init__(penalty=penalty, tol=tol, alpha=alpha, optimizer=optimizer,
batch_size=batch_size, learning_rate=learning_rate,
init_param=init_param, max_iter=max_iter,
early_stop=early_stop, cv_param=cv_param, decay=decay,
decay_sqrt=decay_sqrt, validation_freqs=validation_freqs,
early_stopping_rounds=early_stopping_rounds, metrics=metrics,
floating_point_precision=floating_point_precision,
encrypt_param=encrypt_param,
use_first_metric_only=use_first_metric_only,
stepwise_param=stepwise_param,
callback_param=callback_param)
self.encrypted_mode_calculator_param = copy.deepcopy(encrypted_mode_calculator_param)
self.exposure_colname = exposure_colname
check(self)
¶Source code in federatedml/param/poisson_regression_param.py
def check(self):
descr = "poisson_regression_param's "
super(PoissonParam, self).check()
if self.encrypt_param.method != consts.PAILLIER:
raise ValueError(
descr + "encrypt method supports 'Paillier' only")
if self.optimizer not in ['sgd', 'rmsprop', 'adam', 'adagrad']:
raise ValueError(
descr + "optimizer not supported, optimizer should be"
" 'sgd', 'rmsprop', 'adam', or 'adagrad'")
if self.exposure_colname is not None:
if type(self.exposure_colname).__name__ != "str":
raise ValueError(
descr + "exposure_colname {} not supported, should be string type".format(self.exposure_colname))
self.encrypted_mode_calculator_param.check()
return True
predict_param
¶
Classes¶
PredictParam (BaseParam)
¶Define the predict method of HomoLR, HeteroLR, SecureBoosting
Parameters¶
float or int
The threshold use to separate positive and negative class. Normally, it should be (0,1)
Source code in federatedml/param/predict_param.py
class PredictParam(BaseParam):
"""
Define the predict method of HomoLR, HeteroLR, SecureBoosting
Parameters
----------
threshold: float or int
The threshold use to separate positive and negative class. Normally, it should be (0,1)
"""
def __init__(self, threshold=0.5):
self.threshold = threshold
def check(self):
if type(self.threshold).__name__ not in ["float", "int"]:
raise ValueError("predict param's predict_param {} not supported, should be float or int".format(
self.threshold))
LOGGER.debug("Finish predict parameter check!")
return True
__init__(self, threshold=0.5)
special
¶Source code in federatedml/param/predict_param.py
def __init__(self, threshold=0.5):
self.threshold = threshold
check(self)
¶Source code in federatedml/param/predict_param.py
def check(self):
if type(self.threshold).__name__ not in ["float", "int"]:
raise ValueError("predict param's predict_param {} not supported, should be float or int".format(
self.threshold))
LOGGER.debug("Finish predict parameter check!")
return True
psi_param
¶
PSIParam (BaseParam)
¶Source code in federatedml/param/psi_param.py
class PSIParam(BaseParam):
def __init__(self, max_bin_num=20, need_run=True, dense_missing_val=None,
binning_error=consts.DEFAULT_RELATIVE_ERROR):
super(PSIParam, self).__init__()
self.max_bin_num = max_bin_num
self.need_run = need_run
self.dense_missing_val = dense_missing_val
self.binning_error = binning_error
def check(self):
assert isinstance(self.max_bin_num, int) and self.max_bin_num > 0, 'max bin must be an integer larger than 0'
assert isinstance(self.need_run, bool)
if self.dense_missing_val is not None:
assert isinstance(self.dense_missing_val, str) or isinstance(self.dense_missing_val, int) or \
isinstance(self.dense_missing_val, float), \
'missing value type {} not supported'.format(type(self.dense_missing_val))
self.check_decimal_float(self.binning_error, "psi's param")
__init__(self, max_bin_num=20, need_run=True, dense_missing_val=None, binning_error=0.0001)
special
¶Source code in federatedml/param/psi_param.py
def __init__(self, max_bin_num=20, need_run=True, dense_missing_val=None,
binning_error=consts.DEFAULT_RELATIVE_ERROR):
super(PSIParam, self).__init__()
self.max_bin_num = max_bin_num
self.need_run = need_run
self.dense_missing_val = dense_missing_val
self.binning_error = binning_error
check(self)
¶Source code in federatedml/param/psi_param.py
def check(self):
assert isinstance(self.max_bin_num, int) and self.max_bin_num > 0, 'max bin must be an integer larger than 0'
assert isinstance(self.need_run, bool)
if self.dense_missing_val is not None:
assert isinstance(self.dense_missing_val, str) or isinstance(self.dense_missing_val, int) or \
isinstance(self.dense_missing_val, float), \
'missing value type {} not supported'.format(type(self.dense_missing_val))
self.check_decimal_float(self.binning_error, "psi's param")
rsa_param
¶
Classes¶
RsaParam (BaseParam)
¶Define the sample method
Parameters¶
integer
RSA modulus, default: None
integer
RSA public exponent, default: None
integer
RSA private exponent, default: None
str
namespace of table where stores the output data. default: None
str
name of table where stores the output data. default: None
Source code in federatedml/param/rsa_param.py
class RsaParam(BaseParam):
"""
Define the sample method
Parameters
----------
rsa_key_n: integer
RSA modulus, default: None
rsa_key_e: integer
RSA public exponent, default: None
rsa_key_d: integer
RSA private exponent, default: None
save_out_table_namespace: str
namespace of table where stores the output data. default: None
save_out_table_name: str
name of table where stores the output data. default: None
"""
def __init__(
self,
rsa_key_n=None,
rsa_key_e=None,
rsa_key_d=None,
save_out_table_namespace=None,
save_out_table_name=None):
self.rsa_key_n = rsa_key_n
self.rsa_key_e = rsa_key_e
self.rsa_key_d = rsa_key_d
self.save_out_table_namespace = save_out_table_namespace
self.save_out_table_name = save_out_table_name
def check(self):
descr = "rsa param"
self.check_positive_integer(self.rsa_key_n, descr)
self.check_positive_integer(self.rsa_key_e, descr)
self.check_positive_integer(self.rsa_key_d, descr)
self.check_string(self.save_out_table_namespace, descr)
self.check_string(self.save_out_table_name, descr)
return True
__init__(self, rsa_key_n=None, rsa_key_e=None, rsa_key_d=None, save_out_table_namespace=None, save_out_table_name=None)
special
¶Source code in federatedml/param/rsa_param.py
def __init__(
self,
rsa_key_n=None,
rsa_key_e=None,
rsa_key_d=None,
save_out_table_namespace=None,
save_out_table_name=None):
self.rsa_key_n = rsa_key_n
self.rsa_key_e = rsa_key_e
self.rsa_key_d = rsa_key_d
self.save_out_table_namespace = save_out_table_namespace
self.save_out_table_name = save_out_table_name
check(self)
¶Source code in federatedml/param/rsa_param.py
def check(self):
descr = "rsa param"
self.check_positive_integer(self.rsa_key_n, descr)
self.check_positive_integer(self.rsa_key_e, descr)
self.check_positive_integer(self.rsa_key_d, descr)
self.check_string(self.save_out_table_namespace, descr)
self.check_string(self.save_out_table_name, descr)
return True
sample_param
¶
Classes¶
SampleParam (BaseParam)
¶Define the sample method
Parameters¶
{'random', 'stratified'}'
specify sample to use, default: 'random'
{'downsample', 'upsample'}, default: 'downsample'
specify sample method
None or float or list
if mode equals to random, it should be a float number greater than 0, otherwise a list of elements of pairs like [label_i, sample_rate_i], e.g. [[0, 0.5], [1, 0.8], [2, 0.3]]. default: None
int, RandomState instance or None, default: None
random state
bool, default True
Indicate if this module needed to be run
Source code in federatedml/param/sample_param.py
class SampleParam(BaseParam):
"""
Define the sample method
Parameters
----------
mode: {'random', 'stratified'}'
specify sample to use, default: 'random'
method: {'downsample', 'upsample'}, default: 'downsample'
specify sample method
fractions: None or float or list
if mode equals to random, it should be a float number greater than 0,
otherwise a list of elements of pairs like [label_i, sample_rate_i], e.g. [[0, 0.5], [1, 0.8], [2, 0.3]]. default: None
random_state: int, RandomState instance or None, default: None
random state
need_run: bool, default True
Indicate if this module needed to be run
"""
def __init__(self, mode="random", method="downsample", fractions=None, random_state=None, task_type="hetero",
need_run=True):
self.mode = mode
self.method = method
self.fractions = fractions
self.random_state = random_state
self.task_type = task_type
self.need_run = need_run
def check(self):
descr = "sample param"
self.mode = self.check_and_change_lower(self.mode,
["random", "stratified"],
descr)
self.method = self.check_and_change_lower(self.method,
["upsample", "downsample"],
descr)
if self.mode == "stratified" and self.fractions is not None:
if not isinstance(self.fractions, list):
raise ValueError("fractions of sample param when using stratified should be list")
for ele in self.fractions:
if not isinstance(ele, collections.Container) or len(ele) != 2:
raise ValueError(
"element in fractions of sample param using stratified should be a pair like [label_i, rate_i]")
return True
__init__(self, mode='random', method='downsample', fractions=None, random_state=None, task_type='hetero', need_run=True)
special
¶Source code in federatedml/param/sample_param.py
def __init__(self, mode="random", method="downsample", fractions=None, random_state=None, task_type="hetero",
need_run=True):
self.mode = mode
self.method = method
self.fractions = fractions
self.random_state = random_state
self.task_type = task_type
self.need_run = need_run
check(self)
¶Source code in federatedml/param/sample_param.py
def check(self):
descr = "sample param"
self.mode = self.check_and_change_lower(self.mode,
["random", "stratified"],
descr)
self.method = self.check_and_change_lower(self.method,
["upsample", "downsample"],
descr)
if self.mode == "stratified" and self.fractions is not None:
if not isinstance(self.fractions, list):
raise ValueError("fractions of sample param when using stratified should be list")
for ele in self.fractions:
if not isinstance(ele, collections.Container) or len(ele) != 2:
raise ValueError(
"element in fractions of sample param using stratified should be a pair like [label_i, rate_i]")
return True
sample_weight_param
¶
Classes¶
SampleWeightParam (BaseParam)
¶Define sample weight parameters
Parameters¶
class_weight : str or dict, or None, default None class weight dictionary or class weight computation mode, string value only accepts 'balanced'; If dict provided, key should be class(label), and weight will not be normalize, e.g.: {'0': 1, '1': 2} If both class_weight and sample_weight_name are None, return original input data.
sample_weight_name : str name of column which specifies sample weight. feature name of sample weight; if both class_weight and sample_weight_name are None, return original input data
normalize : bool, default False
whether to normalize sample weight extracted from sample_weight_name
column
need_run : bool, default True whether to run this module or not
Source code in federatedml/param/sample_weight_param.py
class SampleWeightParam(BaseParam):
"""
Define sample weight parameters
Parameters
----------
class_weight : str or dict, or None, default None
class weight dictionary or class weight computation mode, string value only accepts 'balanced';
If dict provided, key should be class(label), and weight will not be normalize, e.g.: {'0': 1, '1': 2}
If both class_weight and sample_weight_name are None, return original input data.
sample_weight_name : str
name of column which specifies sample weight.
feature name of sample weight; if both class_weight and sample_weight_name are None, return original input data
normalize : bool, default False
whether to normalize sample weight extracted from `sample_weight_name` column
need_run : bool, default True
whether to run this module or not
"""
def __init__(self, class_weight=None, sample_weight_name=None, normalize=False, need_run=True):
self.class_weight = class_weight
self.sample_weight_name = sample_weight_name
self.normalize = normalize
self.need_run = need_run
def check(self):
descr = "sample weight param's"
if self.class_weight:
if not isinstance(self.class_weight, str) and not isinstance(self.class_weight, dict):
raise ValueError(f"{descr} class_weight must be str, dict, or None.")
if isinstance(self.class_weight, str):
self.class_weight = self.check_and_change_lower(self.class_weight,
[consts.BALANCED],
f"{descr} class_weight")
if isinstance(self.class_weight, dict):
for k, v in self.class_weight.items():
if v < 0:
LOGGER.warning(f"Negative value {v} provided for class {k} as class_weight.")
if self.sample_weight_name:
self.check_string(self.sample_weight_name, f"{descr} sample_weight_name")
self.check_boolean(self.need_run, f"{descr} need_run")
self.check_boolean(self.normalize, f"{descr} normalize")
return True
__init__(self, class_weight=None, sample_weight_name=None, normalize=False, need_run=True)
special
¶Source code in federatedml/param/sample_weight_param.py
def __init__(self, class_weight=None, sample_weight_name=None, normalize=False, need_run=True):
self.class_weight = class_weight
self.sample_weight_name = sample_weight_name
self.normalize = normalize
self.need_run = need_run
check(self)
¶Source code in federatedml/param/sample_weight_param.py
def check(self):
descr = "sample weight param's"
if self.class_weight:
if not isinstance(self.class_weight, str) and not isinstance(self.class_weight, dict):
raise ValueError(f"{descr} class_weight must be str, dict, or None.")
if isinstance(self.class_weight, str):
self.class_weight = self.check_and_change_lower(self.class_weight,
[consts.BALANCED],
f"{descr} class_weight")
if isinstance(self.class_weight, dict):
for k, v in self.class_weight.items():
if v < 0:
LOGGER.warning(f"Negative value {v} provided for class {k} as class_weight.")
if self.sample_weight_name:
self.check_string(self.sample_weight_name, f"{descr} sample_weight_name")
self.check_boolean(self.need_run, f"{descr} need_run")
self.check_boolean(self.normalize, f"{descr} normalize")
return True
scale_param
¶
Classes¶
ScaleParam (BaseParam)
¶Define the feature scale parameters.
Parameters¶
method : {"standard_scale", "min_max_scale"} like scale in sklearn, now it support "min_max_scale" and "standard_scale", and will support other scale method soon. Default standard_scale, which will do nothing for scale
mode : {"normal", "cap"} for mode is "normal", the feat_upper and feat_lower is the normal value like "10" or "3.1" and for "cap", feat_upper and feature_lower will between 0 and 1, which means the percentile of the column. Default "normal"
feat_upper : int or float or list of int or float the upper limit in the column. If use list, mode must be "normal", and list length should equal to the number of features to scale. If the scaled value is larger than feat_upper, it will be set to feat_upper
int or float or list of int or float
the lower limit in the column. If use list, mode must be "normal", and list length should equal to the number of features to scale. If the scaled value is less than feat_lower, it will be set to feat_lower
list
the idx of column in scale_column_idx will be scaled, while the idx of column is not in, it will not be scaled.
scale_names : list of string Specify which columns need to scaled. Each element in the list represent for a column name in header. default: []
with_mean : bool used for "standard_scale". Default True.
with_std : bool used for "standard_scale". Default True. The standard scale of column x is calculated as : z = (x - u) / s , where u is the mean of the column and s is the standard deviation of the column. if with_mean is False, u will be 0, and if with_std is False, s will be 1.
need_run : bool Indicate if this module needed to be run, default True
Source code in federatedml/param/scale_param.py
class ScaleParam(BaseParam):
"""
Define the feature scale parameters.
Parameters
----------
method : {"standard_scale", "min_max_scale"}
like scale in sklearn, now it support "min_max_scale" and "standard_scale", and will support other scale method soon.
Default standard_scale, which will do nothing for scale
mode : {"normal", "cap"}
for mode is "normal", the feat_upper and feat_lower is the normal value like "10" or "3.1"
and for "cap", feat_upper and feature_lower will between 0 and 1, which means the percentile of the column. Default "normal"
feat_upper : int or float or list of int or float
the upper limit in the column.
If use list, mode must be "normal", and list length should equal to the number of features to scale.
If the scaled value is larger than feat_upper, it will be set to feat_upper
feat_lower: int or float or list of int or float
the lower limit in the column.
If use list, mode must be "normal", and list length should equal to the number of features to scale.
If the scaled value is less than feat_lower, it will be set to feat_lower
scale_col_indexes: list
the idx of column in scale_column_idx will be scaled, while the idx of column is not in, it will not be scaled.
scale_names : list of string
Specify which columns need to scaled. Each element in the list represent for a column name in header. default: []
with_mean : bool
used for "standard_scale". Default True.
with_std : bool
used for "standard_scale". Default True.
The standard scale of column x is calculated as : $z = (x - u) / s$ , where $u$ is the mean of the column and $s$ is the standard deviation of the column.
if with_mean is False, $u$ will be 0, and if with_std is False, $s$ will be 1.
need_run : bool
Indicate if this module needed to be run, default True
"""
def __init__(
self,
method="standard_scale",
mode="normal",
scale_col_indexes=-1,
scale_names=None,
feat_upper=None,
feat_lower=None,
with_mean=True,
with_std=True,
need_run=True):
super().__init__()
self.scale_names = [] if scale_names is None else scale_names
self.method = method
self.mode = mode
self.feat_upper = feat_upper
# LOGGER.debug("self.feat_upper:{}, type:{}".format(self.feat_upper, type(self.feat_upper)))
self.feat_lower = feat_lower
self.scale_col_indexes = scale_col_indexes
self.with_mean = with_mean
self.with_std = with_std
self.need_run = need_run
def check(self):
if self.method is not None:
descr = "scale param's method"
self.method = self.check_and_change_lower(self.method,
[consts.MINMAXSCALE, consts.STANDARDSCALE],
descr)
descr = "scale param's mode"
self.mode = self.check_and_change_lower(self.mode,
[consts.NORMAL, consts.CAP],
descr)
# LOGGER.debug("self.feat_upper:{}, type:{}".format(self.feat_upper, type(self.feat_upper)))
# if type(self.feat_upper).__name__ not in ["float", "int"]:
# raise ValueError("scale param's feat_upper {} not supported, should be float or int".format(
# self.feat_upper))
if self.scale_col_indexes != -1 and not isinstance(self.scale_col_indexes, list):
raise ValueError("scale_col_indexes is should be -1 or a list")
if self.scale_names is None:
self.scale_names = []
if not isinstance(self.scale_names, list):
raise ValueError("scale_names is should be a list of string")
else:
for e in self.scale_names:
if not isinstance(e, str):
raise ValueError("scale_names is should be a list of string")
self.check_boolean(self.with_mean, "scale_param with_mean")
self.check_boolean(self.with_std, "scale_param with_std")
self.check_boolean(self.need_run, "scale_param need_run")
LOGGER.debug("Finish scale parameter check!")
return True
__init__(self, method='standard_scale', mode='normal', scale_col_indexes=-1, scale_names=None, feat_upper=None, feat_lower=None, with_mean=True, with_std=True, need_run=True)
special
¶Source code in federatedml/param/scale_param.py
def __init__(
self,
method="standard_scale",
mode="normal",
scale_col_indexes=-1,
scale_names=None,
feat_upper=None,
feat_lower=None,
with_mean=True,
with_std=True,
need_run=True):
super().__init__()
self.scale_names = [] if scale_names is None else scale_names
self.method = method
self.mode = mode
self.feat_upper = feat_upper
# LOGGER.debug("self.feat_upper:{}, type:{}".format(self.feat_upper, type(self.feat_upper)))
self.feat_lower = feat_lower
self.scale_col_indexes = scale_col_indexes
self.with_mean = with_mean
self.with_std = with_std
self.need_run = need_run
check(self)
¶Source code in federatedml/param/scale_param.py
def check(self):
if self.method is not None:
descr = "scale param's method"
self.method = self.check_and_change_lower(self.method,
[consts.MINMAXSCALE, consts.STANDARDSCALE],
descr)
descr = "scale param's mode"
self.mode = self.check_and_change_lower(self.mode,
[consts.NORMAL, consts.CAP],
descr)
# LOGGER.debug("self.feat_upper:{}, type:{}".format(self.feat_upper, type(self.feat_upper)))
# if type(self.feat_upper).__name__ not in ["float", "int"]:
# raise ValueError("scale param's feat_upper {} not supported, should be float or int".format(
# self.feat_upper))
if self.scale_col_indexes != -1 and not isinstance(self.scale_col_indexes, list):
raise ValueError("scale_col_indexes is should be -1 or a list")
if self.scale_names is None:
self.scale_names = []
if not isinstance(self.scale_names, list):
raise ValueError("scale_names is should be a list of string")
else:
for e in self.scale_names:
if not isinstance(e, str):
raise ValueError("scale_names is should be a list of string")
self.check_boolean(self.with_mean, "scale_param with_mean")
self.check_boolean(self.with_std, "scale_param with_std")
self.check_boolean(self.need_run, "scale_param need_run")
LOGGER.debug("Finish scale parameter check!")
return True
scorecard_param
¶
Classes¶
ScorecardParam (BaseParam)
¶Define method used for transforming prediction score to credit score
Parameters¶
method : {"credit"}, default: 'credit' score method, currently only supports "credit"
offset : int or float, default: 500 score baseline
factor : int or float, default: 20 scoring step, when odds double, result score increases by this factor
factor_base : int or float, default: 2 factor base, value ln(factor_base) is used for calculating result score
upper_limit_ratio : int or float, default: 3 upper bound for odds, credit score upper bound is upper_limit_ratio * offset
lower_limit_value : int or float, default: 0 lower bound for result score
need_run : bool, default: True Indicate if this module needs to be run.
Source code in federatedml/param/scorecard_param.py
class ScorecardParam(BaseParam):
"""
Define method used for transforming prediction score to credit score
Parameters
----------
method : {"credit"}, default: 'credit'
score method, currently only supports "credit"
offset : int or float, default: 500
score baseline
factor : int or float, default: 20
scoring step, when odds double, result score increases by this factor
factor_base : int or float, default: 2
factor base, value ln(factor_base) is used for calculating result score
upper_limit_ratio : int or float, default: 3
upper bound for odds, credit score upper bound is upper_limit_ratio * offset
lower_limit_value : int or float, default: 0
lower bound for result score
need_run : bool, default: True
Indicate if this module needs to be run.
"""
def __init__(
self,
method="credit",
offset=500,
factor=20,
factor_base=2,
upper_limit_ratio=3,
lower_limit_value=0,
need_run=True):
super(ScorecardParam, self).__init__()
self.method = method
self.offset = offset
self.factor = factor
self.factor_base = factor_base
self.upper_limit_ratio = upper_limit_ratio
self.lower_limit_value = lower_limit_value
self.need_run = need_run
def check(self):
descr = "scorecard param"
if not isinstance(self.method, str):
raise ValueError(f"{descr}method {self.method} not supported, should be str type")
else:
user_input = self.method.lower()
if user_input == "credit":
self.method = consts.CREDIT
else:
raise ValueError(f"{descr} method {user_input} not supported")
if type(self.offset).__name__ not in ["int", "long", "float"]:
raise ValueError(f"{descr} offset must be numeric,"
f"received {type(self.offset)} instead.")
if type(self.factor).__name__ not in ["int", "long", "float"]:
raise ValueError(f"{descr} factor must be numeric,"
f"received {type(self.factor)} instead.")
if type(self.factor_base).__name__ not in ["int", "long", "float"]:
raise ValueError(f"{descr} factor_base must be numeric,"
f"received {type(self.factor_base)} instead.")
if type(self.upper_limit_ratio).__name__ not in ["int", "long", "float"]:
raise ValueError(f"{descr} upper_limit_ratio must be numeric,"
f"received {type(self.upper_limit_ratio)} instead.")
if type(self.lower_limit_value).__name__ not in ["int", "long", "float"]:
raise ValueError(f"{descr} lower_limit_value must be numeric,"
f"received {type(self.lower_limit_value)} instead.")
BaseParam.check_boolean(self.need_run, descr=descr + "need_run ")
LOGGER.debug("Finish Scorecard parameter check!")
return True
__init__(self, method='credit', offset=500, factor=20, factor_base=2, upper_limit_ratio=3, lower_limit_value=0, need_run=True)
special
¶Source code in federatedml/param/scorecard_param.py
def __init__(
self,
method="credit",
offset=500,
factor=20,
factor_base=2,
upper_limit_ratio=3,
lower_limit_value=0,
need_run=True):
super(ScorecardParam, self).__init__()
self.method = method
self.offset = offset
self.factor = factor
self.factor_base = factor_base
self.upper_limit_ratio = upper_limit_ratio
self.lower_limit_value = lower_limit_value
self.need_run = need_run
check(self)
¶Source code in federatedml/param/scorecard_param.py
def check(self):
descr = "scorecard param"
if not isinstance(self.method, str):
raise ValueError(f"{descr}method {self.method} not supported, should be str type")
else:
user_input = self.method.lower()
if user_input == "credit":
self.method = consts.CREDIT
else:
raise ValueError(f"{descr} method {user_input} not supported")
if type(self.offset).__name__ not in ["int", "long", "float"]:
raise ValueError(f"{descr} offset must be numeric,"
f"received {type(self.offset)} instead.")
if type(self.factor).__name__ not in ["int", "long", "float"]:
raise ValueError(f"{descr} factor must be numeric,"
f"received {type(self.factor)} instead.")
if type(self.factor_base).__name__ not in ["int", "long", "float"]:
raise ValueError(f"{descr} factor_base must be numeric,"
f"received {type(self.factor_base)} instead.")
if type(self.upper_limit_ratio).__name__ not in ["int", "long", "float"]:
raise ValueError(f"{descr} upper_limit_ratio must be numeric,"
f"received {type(self.upper_limit_ratio)} instead.")
if type(self.lower_limit_value).__name__ not in ["int", "long", "float"]:
raise ValueError(f"{descr} lower_limit_value must be numeric,"
f"received {type(self.lower_limit_value)} instead.")
BaseParam.check_boolean(self.need_run, descr=descr + "need_run ")
LOGGER.debug("Finish Scorecard parameter check!")
return True
secure_add_example_param
¶
SecureAddExampleParam (BaseParam)
¶Source code in federatedml/param/secure_add_example_param.py
class SecureAddExampleParam(BaseParam):
def __init__(self, seed=None, partition=1, data_num=1000):
self.seed = seed
self.partition = partition
self.data_num = data_num
def check(self):
if self.seed is not None and type(self.seed).__name__ != "int":
raise ValueError("random seed should be None or integers")
if type(self.partition).__name__ != "int" or self.partition < 1:
raise ValueError("partition should be an integer large than 0")
if type(self.data_num).__name__ != "int" or self.data_num < 1:
raise ValueError("data_num should be an integer large than 0")
__init__(self, seed=None, partition=1, data_num=1000)
special
¶Source code in federatedml/param/secure_add_example_param.py
def __init__(self, seed=None, partition=1, data_num=1000):
self.seed = seed
self.partition = partition
self.data_num = data_num
check(self)
¶Source code in federatedml/param/secure_add_example_param.py
def check(self):
if self.seed is not None and type(self.seed).__name__ != "int":
raise ValueError("random seed should be None or integers")
if type(self.partition).__name__ != "int" or self.partition < 1:
raise ValueError("partition should be an integer large than 0")
if type(self.data_num).__name__ != "int" or self.data_num < 1:
raise ValueError("data_num should be an integer large than 0")
sir_param
¶
Classes¶
SecureInformationRetrievalParam (BaseParam)
¶Parameters¶
float, default 0.5
security level, should set value in [0, 1] if security_level equals 0.0 means raw data retrieval
{"OT_Hauck"}
OT type, only supports OT_Hauck
commutative_encryption : {"CommutativeEncryptionPohligHellman"} the commutative encryption scheme used
non_committing_encryption : {"aes"} the non-committing encryption scheme used
dh_params params for Pohlig-Hellman Encryption
int, value >= 1024
the key length of the commutative cipher; note that this param will be deprecated in future, please specify key_length in PHParam instead.
bool
perform raw retrieval if raw_retrieval
str or list of str
target cols to retrieve; any values not retrieved will be marked as "unretrieved", if target_cols is None, label will be retrieved, same behavior as in previous version default None
Source code in federatedml/param/sir_param.py
class SecureInformationRetrievalParam(BaseParam):
"""
Parameters
----------
security_level: float, default 0.5
security level, should set value in [0, 1]
if security_level equals 0.0 means raw data retrieval
oblivious_transfer_protocol: {"OT_Hauck"}
OT type, only supports OT_Hauck
commutative_encryption : {"CommutativeEncryptionPohligHellman"}
the commutative encryption scheme used
non_committing_encryption : {"aes"}
the non-committing encryption scheme used
dh_params
params for Pohlig-Hellman Encryption
key_size: int, value >= 1024
the key length of the commutative cipher;
note that this param will be deprecated in future, please specify key_length in PHParam instead.
raw_retrieval: bool
perform raw retrieval if raw_retrieval
target_cols: str or list of str
target cols to retrieve;
any values not retrieved will be marked as "unretrieved",
if target_cols is None, label will be retrieved, same behavior as in previous version
default None
"""
def __init__(self, security_level=0.5,
oblivious_transfer_protocol=consts.OT_HAUCK,
commutative_encryption=consts.CE_PH,
non_committing_encryption=consts.AES,
key_size=consts.DEFAULT_KEY_LENGTH,
dh_params=DHParam(),
raw_retrieval=False,
target_cols=None):
super(SecureInformationRetrievalParam, self).__init__()
self.security_level = security_level
self.oblivious_transfer_protocol = oblivious_transfer_protocol
self.commutative_encryption = commutative_encryption
self.non_committing_encryption = non_committing_encryption
self.dh_params = dh_params
self.key_size = key_size
self.raw_retrieval = raw_retrieval
self.target_cols = [] if target_cols is None else target_cols
def check(self):
descr = "secure information retrieval param's "
self.check_decimal_float(self.security_level, descr + "security_level")
self.oblivious_transfer_protocol = self.check_and_change_lower(self.oblivious_transfer_protocol,
[consts.OT_HAUCK.lower()],
descr + "oblivious_transfer_protocol")
self.commutative_encryption = self.check_and_change_lower(self.commutative_encryption,
[consts.CE_PH.lower()],
descr + "commutative_encryption")
self.non_committing_encryption = self.check_and_change_lower(self.non_committing_encryption,
[consts.AES.lower()],
descr + "non_committing_encryption")
if self._warn_to_deprecate_param("key_size", descr, "dh_param's key_length"):
self.dh_params.key_length = self.key_size
self.dh_params.check()
if self._warn_to_deprecate_param("raw_retrieval", descr, "dh_param's security_level = 0"):
self.check_boolean(self.raw_retrieval, descr)
if not isinstance(self.target_cols, list):
self.target_cols = [self.target_cols]
for col in self.target_cols:
self.check_string(col, descr + "target_cols")
if len(self.target_cols) == 0:
LOGGER.warning(f"Both 'target_cols' and 'target_indexes' are empty. Label will be retrieved.")
__init__(self, security_level=0.5, oblivious_transfer_protocol='OT_Hauck', commutative_encryption='CommutativeEncryptionPohligHellman', non_committing_encryption='aes', key_size=1024, dh_params=<federatedml.param.intersect_param.DHParam object at 0x7f3a40bc7910>, raw_retrieval=False, target_cols=None)
special
¶Source code in federatedml/param/sir_param.py
def __init__(self, security_level=0.5,
oblivious_transfer_protocol=consts.OT_HAUCK,
commutative_encryption=consts.CE_PH,
non_committing_encryption=consts.AES,
key_size=consts.DEFAULT_KEY_LENGTH,
dh_params=DHParam(),
raw_retrieval=False,
target_cols=None):
super(SecureInformationRetrievalParam, self).__init__()
self.security_level = security_level
self.oblivious_transfer_protocol = oblivious_transfer_protocol
self.commutative_encryption = commutative_encryption
self.non_committing_encryption = non_committing_encryption
self.dh_params = dh_params
self.key_size = key_size
self.raw_retrieval = raw_retrieval
self.target_cols = [] if target_cols is None else target_cols
check(self)
¶Source code in federatedml/param/sir_param.py
def check(self):
descr = "secure information retrieval param's "
self.check_decimal_float(self.security_level, descr + "security_level")
self.oblivious_transfer_protocol = self.check_and_change_lower(self.oblivious_transfer_protocol,
[consts.OT_HAUCK.lower()],
descr + "oblivious_transfer_protocol")
self.commutative_encryption = self.check_and_change_lower(self.commutative_encryption,
[consts.CE_PH.lower()],
descr + "commutative_encryption")
self.non_committing_encryption = self.check_and_change_lower(self.non_committing_encryption,
[consts.AES.lower()],
descr + "non_committing_encryption")
if self._warn_to_deprecate_param("key_size", descr, "dh_param's key_length"):
self.dh_params.key_length = self.key_size
self.dh_params.check()
if self._warn_to_deprecate_param("raw_retrieval", descr, "dh_param's security_level = 0"):
self.check_boolean(self.raw_retrieval, descr)
if not isinstance(self.target_cols, list):
self.target_cols = [self.target_cols]
for col in self.target_cols:
self.check_string(col, descr + "target_cols")
if len(self.target_cols) == 0:
LOGGER.warning(f"Both 'target_cols' and 'target_indexes' are empty. Label will be retrieved.")
sqn_param
¶
Classes¶
StochasticQuasiNewtonParam (BaseParam)
¶Parameters used for stochastic quasi-newton method.
Parameters¶
update_interval_L : int, default: 3 Set how many iteration to update hess matrix
memory_M : int, default: 5 Stack size of curvature information, i.e. y_k and s_k in the paper.
sample_size : int, default: 5000 Sample size of data that used to update Hess matrix
Source code in federatedml/param/sqn_param.py
class StochasticQuasiNewtonParam(BaseParam):
"""
Parameters used for stochastic quasi-newton method.
Parameters
----------
update_interval_L : int, default: 3
Set how many iteration to update hess matrix
memory_M : int, default: 5
Stack size of curvature information, i.e. y_k and s_k in the paper.
sample_size : int, default: 5000
Sample size of data that used to update Hess matrix
"""
def __init__(self, update_interval_L=3, memory_M=5, sample_size=5000, random_seed=None):
super().__init__()
self.update_interval_L = update_interval_L
self.memory_M = memory_M
self.sample_size = sample_size
self.random_seed = random_seed
def check(self):
descr = "hetero sqn param's"
self.check_positive_integer(self.update_interval_L, descr)
self.check_positive_integer(self.memory_M, descr)
self.check_positive_integer(self.sample_size, descr)
if self.random_seed is not None:
self.check_positive_integer(self.random_seed, descr)
return True
__init__(self, update_interval_L=3, memory_M=5, sample_size=5000, random_seed=None)
special
¶Source code in federatedml/param/sqn_param.py
def __init__(self, update_interval_L=3, memory_M=5, sample_size=5000, random_seed=None):
super().__init__()
self.update_interval_L = update_interval_L
self.memory_M = memory_M
self.sample_size = sample_size
self.random_seed = random_seed
check(self)
¶Source code in federatedml/param/sqn_param.py
def check(self):
descr = "hetero sqn param's"
self.check_positive_integer(self.update_interval_L, descr)
self.check_positive_integer(self.memory_M, descr)
self.check_positive_integer(self.sample_size, descr)
if self.random_seed is not None:
self.check_positive_integer(self.random_seed, descr)
return True
statistics_param
¶
Classes¶
StatisticsParam (BaseParam)
¶Define statistics params
Parameters¶
list, string, default "summary"
Specify the statistic types to be computed. "summary" represents list: [consts.SUM, consts.MEAN, consts.STANDARD_DEVIATION, consts.MEDIAN, consts.MIN, consts.MAX, consts.MISSING_COUNT, consts.SKEWNESS, consts.KURTOSIS]
list of string, default []
Specify columns to be used for statistic computation by column names in header
list of int, default -1
Specify columns to be used for statistic computation by column order in header -1 indicates to compute statistics over all columns
bool, default: True
If False, the calculations of skewness and kurtosis are corrected for statistical bias.
bool, default True
Indicate whether to run this modules
Source code in federatedml/param/statistics_param.py
class StatisticsParam(BaseParam):
"""
Define statistics params
Parameters
----------
statistics: list, string, default "summary"
Specify the statistic types to be computed.
"summary" represents list: [consts.SUM, consts.MEAN, consts.STANDARD_DEVIATION,
consts.MEDIAN, consts.MIN, consts.MAX,
consts.MISSING_COUNT, consts.SKEWNESS, consts.KURTOSIS]
column_names: list of string, default []
Specify columns to be used for statistic computation by column names in header
column_indexes: list of int, default -1
Specify columns to be used for statistic computation by column order in header
-1 indicates to compute statistics over all columns
bias: bool, default: True
If False, the calculations of skewness and kurtosis are corrected for statistical bias.
need_run: bool, default True
Indicate whether to run this modules
"""
LEGAL_STAT = [consts.COUNT, consts.SUM, consts.MEAN, consts.STANDARD_DEVIATION,
consts.MEDIAN, consts.MIN, consts.MAX, consts.VARIANCE,
consts.COEFFICIENT_OF_VARIATION, consts.MISSING_COUNT,
consts.MISSING_RATIO,
consts.SKEWNESS, consts.KURTOSIS]
BASIC_STAT = [consts.SUM, consts.MEAN, consts.STANDARD_DEVIATION,
consts.MEDIAN, consts.MIN, consts.MAX, consts.MISSING_RATIO,
consts.MISSING_COUNT, consts.SKEWNESS, consts.KURTOSIS,
consts.COEFFICIENT_OF_VARIATION]
LEGAL_QUANTILE = re.compile("^(100)|([1-9]?[0-9])%$")
def __init__(self, statistics="summary", column_names=None,
column_indexes=-1, need_run=True, abnormal_list=None,
quantile_error=consts.DEFAULT_RELATIVE_ERROR, bias=True):
super().__init__()
self.statistics = statistics
self.column_names = column_names
self.column_indexes = column_indexes
self.abnormal_list = abnormal_list
self.need_run = need_run
self.quantile_error = quantile_error
self.bias = bias
if column_names is None:
self.column_names = []
if column_indexes is None:
self.column_indexes = []
if abnormal_list is None:
self.abnormal_list = []
# @staticmethod
# def extend_statistics(statistic_name):
# basic_metrics = [consts.SUM, consts.MEAN, consts.STANDARD_DEVIATION,
# consts.MEDIAN, consts.MIN, consts.MAX, consts.MISSING_RATIO,
# consts.MISSING_COUNT, consts.SKEWNESS, consts.KURTOSIS,
# consts.COEFFICIENT_OF_VARIATION]
# if statistic_name == "summary":
# return basic_metrics
#
# if statistic_name == "describe":
# return [consts.COUNT, consts.MEAN, consts.STANDARD_DEVIATION,
# consts.MIN, consts.MAX]
@staticmethod
def find_stat_name_match(stat_name):
if stat_name in StatisticsParam.LEGAL_STAT or StatisticsParam.LEGAL_QUANTILE.match(stat_name):
return True
return False
# match_result = [legal_name == stat_name for legal_name in StatisticsParam.LEGAL_STAT]
# match_result.append(0 if LEGAL_QUANTILE.match(stat_name) is None else True)
# match_found = sum(match_result) > 0
# return match_found
def check(self):
model_param_descr = "Statistics's param statistics"
BaseParam.check_boolean(self.need_run, model_param_descr)
statistics = copy.copy(self.BASIC_STAT)
if not isinstance(self.statistics, list):
if self.statistics in [consts.SUMMARY]:
self.statistics = statistics
else:
if self.statistics not in statistics:
statistics.append(self.statistics)
self.statistics = statistics
else:
for s in self.statistics:
if s not in statistics:
statistics.append(s)
self.statistics = statistics
for stat_name in self.statistics:
match_found = StatisticsParam.find_stat_name_match(stat_name)
if not match_found:
raise ValueError(f"Illegal statistics name provided: {stat_name}.")
model_param_descr = "Statistics's param column_names"
if not isinstance(self.column_names, list):
raise ValueError(f"column_names should be list of string.")
for col_name in self.column_names:
BaseParam.check_string(col_name, model_param_descr)
model_param_descr = "Statistics's param column_indexes"
if not isinstance(self.column_indexes, list) and self.column_indexes != -1:
raise ValueError(f"column_indexes should be list of int or -1.")
if self.column_indexes != -1:
for col_index in self.column_indexes:
if not isinstance(col_index, int):
raise ValueError(f"{model_param_descr} should be int or list of int")
if col_index < -consts.FLOAT_ZERO:
raise ValueError(f"{model_param_descr} should be non-negative int value(s)")
if not isinstance(self.abnormal_list, list):
raise ValueError(f"abnormal_list should be list of int or string.")
self.check_decimal_float(self.quantile_error, "Statistics's param quantile_error ")
self.check_boolean(self.bias, "Statistics's param bias ")
return True
BASIC_STAT
¶LEGAL_QUANTILE
¶LEGAL_STAT
¶__init__(self, statistics='summary', column_names=None, column_indexes=-1, need_run=True, abnormal_list=None, quantile_error=0.0001, bias=True)
special
¶Source code in federatedml/param/statistics_param.py
def __init__(self, statistics="summary", column_names=None,
column_indexes=-1, need_run=True, abnormal_list=None,
quantile_error=consts.DEFAULT_RELATIVE_ERROR, bias=True):
super().__init__()
self.statistics = statistics
self.column_names = column_names
self.column_indexes = column_indexes
self.abnormal_list = abnormal_list
self.need_run = need_run
self.quantile_error = quantile_error
self.bias = bias
if column_names is None:
self.column_names = []
if column_indexes is None:
self.column_indexes = []
if abnormal_list is None:
self.abnormal_list = []
find_stat_name_match(stat_name)
staticmethod
¶Source code in federatedml/param/statistics_param.py
@staticmethod
def find_stat_name_match(stat_name):
if stat_name in StatisticsParam.LEGAL_STAT or StatisticsParam.LEGAL_QUANTILE.match(stat_name):
return True
return False
check(self)
¶Source code in federatedml/param/statistics_param.py
def check(self):
model_param_descr = "Statistics's param statistics"
BaseParam.check_boolean(self.need_run, model_param_descr)
statistics = copy.copy(self.BASIC_STAT)
if not isinstance(self.statistics, list):
if self.statistics in [consts.SUMMARY]:
self.statistics = statistics
else:
if self.statistics not in statistics:
statistics.append(self.statistics)
self.statistics = statistics
else:
for s in self.statistics:
if s not in statistics:
statistics.append(s)
self.statistics = statistics
for stat_name in self.statistics:
match_found = StatisticsParam.find_stat_name_match(stat_name)
if not match_found:
raise ValueError(f"Illegal statistics name provided: {stat_name}.")
model_param_descr = "Statistics's param column_names"
if not isinstance(self.column_names, list):
raise ValueError(f"column_names should be list of string.")
for col_name in self.column_names:
BaseParam.check_string(col_name, model_param_descr)
model_param_descr = "Statistics's param column_indexes"
if not isinstance(self.column_indexes, list) and self.column_indexes != -1:
raise ValueError(f"column_indexes should be list of int or -1.")
if self.column_indexes != -1:
for col_index in self.column_indexes:
if not isinstance(col_index, int):
raise ValueError(f"{model_param_descr} should be int or list of int")
if col_index < -consts.FLOAT_ZERO:
raise ValueError(f"{model_param_descr} should be non-negative int value(s)")
if not isinstance(self.abnormal_list, list):
raise ValueError(f"abnormal_list should be list of int or string.")
self.check_decimal_float(self.quantile_error, "Statistics's param quantile_error ")
self.check_boolean(self.bias, "Statistics's param bias ")
return True
stepwise_param
¶
Classes¶
StepwiseParam (BaseParam)
¶Define stepwise params
Parameters¶
{"AIC", "BIC"}, default: 'AIC'
Specify which model selection criterion to be used
{"Hetero", "Homo"}, default: 'Hetero'
Indicate what mode is current task
{"Guest", "Host", "Arbiter"}, default: 'Guest'
Indicate what role is current party
{"both", "forward", "backward"}, default: 'both'
Indicate which direction to go for stepwise. 'forward' means forward selection; 'backward' means elimination; 'both' means possible models of both directions are examined at each step.
int, default: '10'
Specify total number of steps to run before forced stop.
int, default: '2'
Specify the min subset size of final model, cannot be lower than 2. When nvmin > 2, the final model size may be smaller than nvmin due to max_step limit.
int, default: None
Specify the max subset size of final model, 2 <= nvmin <= nvmax. The final model size may be larger than nvmax due to max_step limit.
bool, default False
Indicate if this module needed to be run
Source code in federatedml/param/stepwise_param.py
class StepwiseParam(BaseParam):
"""
Define stepwise params
Parameters
----------
score_name: {"AIC", "BIC"}, default: 'AIC'
Specify which model selection criterion to be used
mode: {"Hetero", "Homo"}, default: 'Hetero'
Indicate what mode is current task
role: {"Guest", "Host", "Arbiter"}, default: 'Guest'
Indicate what role is current party
direction: {"both", "forward", "backward"}, default: 'both'
Indicate which direction to go for stepwise.
'forward' means forward selection; 'backward' means elimination; 'both' means possible models of both directions are examined at each step.
max_step: int, default: '10'
Specify total number of steps to run before forced stop.
nvmin: int, default: '2'
Specify the min subset size of final model, cannot be lower than 2. When nvmin > 2, the final model size may be smaller than nvmin due to max_step limit.
nvmax: int, default: None
Specify the max subset size of final model, 2 <= nvmin <= nvmax. The final model size may be larger than nvmax due to max_step limit.
need_stepwise: bool, default False
Indicate if this module needed to be run
"""
def __init__(self, score_name="AIC", mode=consts.HETERO, role=consts.GUEST, direction="both",
max_step=10, nvmin=2, nvmax=None, need_stepwise=False):
super(StepwiseParam, self).__init__()
self.score_name = score_name
self.mode = mode
self.role = role
self.direction = direction
self.max_step = max_step
self.nvmin = nvmin
self.nvmax = nvmax
self.need_stepwise = need_stepwise
def check(self):
model_param_descr = "stepwise param's"
self.score_name = self.check_and_change_lower(self.score_name, ["aic", "bic"], model_param_descr)
self.check_valid_value(self.mode, model_param_descr, valid_values=[consts.HOMO, consts.HETERO])
self.check_valid_value(self.role, model_param_descr, valid_values=[consts.HOST, consts.GUEST, consts.ARBITER])
self.direction = self.check_and_change_lower(self.direction, ["forward", "backward", "both"], model_param_descr)
self.check_positive_integer(self.max_step, model_param_descr)
self.check_positive_integer(self.nvmin, model_param_descr)
if self.nvmin < 2:
raise ValueError(model_param_descr + " nvmin must be no less than 2.")
if self.nvmax is not None:
self.check_positive_integer(self.nvmax, model_param_descr)
if self.nvmin > self.nvmax:
raise ValueError(model_param_descr + " nvmax must be greater than nvmin.")
self.check_boolean(self.need_stepwise, model_param_descr)
__init__(self, score_name='AIC', mode='hetero', role='guest', direction='both', max_step=10, nvmin=2, nvmax=None, need_stepwise=False)
special
¶Source code in federatedml/param/stepwise_param.py
def __init__(self, score_name="AIC", mode=consts.HETERO, role=consts.GUEST, direction="both",
max_step=10, nvmin=2, nvmax=None, need_stepwise=False):
super(StepwiseParam, self).__init__()
self.score_name = score_name
self.mode = mode
self.role = role
self.direction = direction
self.max_step = max_step
self.nvmin = nvmin
self.nvmax = nvmax
self.need_stepwise = need_stepwise
check(self)
¶Source code in federatedml/param/stepwise_param.py
def check(self):
model_param_descr = "stepwise param's"
self.score_name = self.check_and_change_lower(self.score_name, ["aic", "bic"], model_param_descr)
self.check_valid_value(self.mode, model_param_descr, valid_values=[consts.HOMO, consts.HETERO])
self.check_valid_value(self.role, model_param_descr, valid_values=[consts.HOST, consts.GUEST, consts.ARBITER])
self.direction = self.check_and_change_lower(self.direction, ["forward", "backward", "both"], model_param_descr)
self.check_positive_integer(self.max_step, model_param_descr)
self.check_positive_integer(self.nvmin, model_param_descr)
if self.nvmin < 2:
raise ValueError(model_param_descr + " nvmin must be no less than 2.")
if self.nvmax is not None:
self.check_positive_integer(self.nvmax, model_param_descr)
if self.nvmin > self.nvmax:
raise ValueError(model_param_descr + " nvmax must be greater than nvmin.")
self.check_boolean(self.need_stepwise, model_param_descr)
test
special
¶
Modules¶
param_json_test
¶home_dir
¶
TestParamExtract (TestCase)
¶Source code in federatedml/param/test/param_json_test.py
class TestParamExtract(unittest.TestCase):
def setUp(self):
self.param = FeatureBinningParam()
json_config_file = home_dir + '/param_feature_binning.json'
self.config_path = json_config_file
with open(json_config_file, 'r', encoding='utf-8') as load_f:
role_config = json.load(load_f)
self.config_json = role_config
# def tearDown(self):
# os.system("rm -r " + self.config_path)
def test_directly_extract(self):
param_obj = FeatureBinningParam()
extractor = ParamExtract()
param_obj = extractor.parse_param_from_config(param_obj, self.config_json)
self.assertTrue(param_obj.method == "quantile")
self.assertTrue(param_obj.transform_param.transform_type == 'bin_num')
setUp(self)
¶Hook method for setting up the test fixture before exercising it.
Source code in federatedml/param/test/param_json_test.py
def setUp(self):
self.param = FeatureBinningParam()
json_config_file = home_dir + '/param_feature_binning.json'
self.config_path = json_config_file
with open(json_config_file, 'r', encoding='utf-8') as load_f:
role_config = json.load(load_f)
self.config_json = role_config
test_directly_extract(self)
¶Source code in federatedml/param/test/param_json_test.py
def test_directly_extract(self):
param_obj = FeatureBinningParam()
extractor = ParamExtract()
param_obj = extractor.parse_param_from_config(param_obj, self.config_json)
self.assertTrue(param_obj.method == "quantile")
self.assertTrue(param_obj.transform_param.transform_type == 'bin_num')
union_param
¶
Classes¶
UnionParam (BaseParam)
¶Define the union method for combining multiple dTables and keep entries with the same id
Parameters¶
bool, default True
Indicate if this module needed to be run
bool, default False
Whether allow mismatch between feature length and header length in the result. Note that empty tables will always be skipped regardless of this param setting.
bool, default False
Whether to keep entries with duplicated keys. If set to True, a new id will be generated for duplicated entry in the format {id}_{table_name}.
Source code in federatedml/param/union_param.py
class UnionParam(BaseParam):
"""
Define the union method for combining multiple dTables and keep entries with the same id
Parameters
----------
need_run: bool, default True
Indicate if this module needed to be run
allow_missing: bool, default False
Whether allow mismatch between feature length and header length in the result. Note that empty tables will always be skipped regardless of this param setting.
keep_duplicate: bool, default False
Whether to keep entries with duplicated keys. If set to True, a new id will be generated for duplicated entry in the format {id}_{table_name}.
"""
def __init__(self, need_run=True, allow_missing=False, keep_duplicate=False):
super().__init__()
self.need_run = need_run
self.allow_missing = allow_missing
self.keep_duplicate = keep_duplicate
def check(self):
descr = "union param's "
if type(self.need_run).__name__ != "bool":
raise ValueError(
descr + "need_run {} not supported, should be bool".format(
self.need_run))
if type(self.allow_missing).__name__ != "bool":
raise ValueError(
descr + "allow_missing {} not supported, should be bool".format(
self.allow_missing))
if type(self.keep_duplicate).__name__ != "bool":
raise ValueError(
descr + "keep_duplicate {} not supported, should be bool".format(
self.keep_duplicate))
LOGGER.info("Finish union parameter check!")
return True
__init__(self, need_run=True, allow_missing=False, keep_duplicate=False)
special
¶Source code in federatedml/param/union_param.py
def __init__(self, need_run=True, allow_missing=False, keep_duplicate=False):
super().__init__()
self.need_run = need_run
self.allow_missing = allow_missing
self.keep_duplicate = keep_duplicate
check(self)
¶Source code in federatedml/param/union_param.py
def check(self):
descr = "union param's "
if type(self.need_run).__name__ != "bool":
raise ValueError(
descr + "need_run {} not supported, should be bool".format(
self.need_run))
if type(self.allow_missing).__name__ != "bool":
raise ValueError(
descr + "allow_missing {} not supported, should be bool".format(
self.allow_missing))
if type(self.keep_duplicate).__name__ != "bool":
raise ValueError(
descr + "keep_duplicate {} not supported, should be bool".format(
self.keep_duplicate))
LOGGER.info("Finish union parameter check!")
return True