Makes validation_fraction in keras galaxy models to replace validation_data. validation_data still can be set via fit_params. In such a case, the validation_fraction is ignored.
Makes galaxy_ml.model_validations._fit_and_score depreciated in tools. Update: _fit_and_score is still needed for KerasGBatchClassifier.
Makes verbose in model.fit for keras galaxy models.
Adds drop_last_proba for stacking classifiers in stacking ensembles tools.
Expands secondary scoring of binarize target scorers to all scores.
Adds gen_compute_scores in utils and cleans params ending with _dir in clean_params.
Adds utils dump_model_to_h5 and load_model_from_h5 which support sklearn/keras arbitrary mix and hyperparameter stored.
Bug Fixes
Fixes BinaraizeTargetClassifier pickle error when persisting a fitted object.
Version 0.8.1 / tool_main: 1.0.8.1 / keras: 0.5.0 (12-12-2019)
Version 0.8.0 / tool_main: 1.0.8.0 / keras: 0.5.0 (12-10-2019)
New Features
Adds circleci config for both api and tool tests.
Adds train_test_split tool which supports shufflesplit, stratifiedshufflesplit, groupshufflesplit and orderedtarget split.
Adds fitted_model_eval tool.
Refactors binarize target estimators. There are a lot of improvements. One of them is that the estimator family now support most sklearn scorers.
Adds clean_params in utils
Adds cv_results_ outputs for nested inner CV and unfitted searchCV object from searchCV tool.
Adds keras training and evaluation tool.
Adds support of decision_function for binarize target classifiers.
Adds matplotlib svg format option in ml_visualization_ex tool.
Adds 'sklearn.ensemble.HistGradientBoostingClassifier' and 'sklearn.ensemble.HistGradientBoostingRegressor'
Adds new regression scorer max_error.
Upgade scikit-lean to v0.21.3, mlxtend to v0.17.0, imbalanced-learn to v0.5.0, keras to v2.3.1 and tensorflow to v1.15.0.
Changes
Replaces all generators' fit with set_processing_attrs.
Raise ValueError instead of [0, 1] normalization when predictions from BinarizeTargetRegressor go out of range.
Refactor iraps_classifier module. Binarize target estimators do the same prediction as the wrapped estimator. A delicated predict_score is made to work with binarize scorers.
Changes precision-recall curve and ROC curve to take headers and upgrade plotly to v4.3.0 in ml_visualization_ex tool
Change to dynamic output of pipeline or final main estimator
Bug Fixes
Fixes random_state error in _predict_generator.
Fixes stale path issue by replace relative paths with full paths.
Adds setup.py and requirement.txt for APIs installation.
Makes Galaxy-ML APIs as a library and installable vis pypi and bioconda.
Adds GenomicIntervalBatchGenerator, an online data generator that provides online genomic sequences transformation from a reference genome and intervals. By trying to offer the same functionalities of selene, GenomicIntervalBatchGenerator is implemented by, 1) reusing selene cython backend; 2) extending keras.utils.Sequence, multiple processing and queueing capable; 3) compatibilizing with sciKit-learn APIs, like KFold, GridSeearchCV, etc. GenomicIntervalBatchGenerator is supposed to be fast and memory-efficient.
Adds parameter steps_per_epoch, validation_steps to BaseKerasModel.
Adds parameter prediction_steps to KerasGBatchClassifier.
Adds class_weight-like parameter class_positive_factor to KerasGBatchGenerator for imbalanced training.
Changes
Refactor fast array generators, introduced fit method.
Refactor iraps_classifier random index generator, reduce fit time by about 45%
Adds sklearn.ensemble.VotingClassifier and VotingRegressor (will be available sklearn v0.21).
Enhances security of try_get_attr by adding check_def argument.
Adds __all__ attribute together with try_get_attr to manage custom module and names.
Adds keras callbacks. Now supports EarlyStopping, RemoteMonitor, TerminateOnNaN, ReduceLROnPlateau and partially support ModelCheckpoint, CSVLogger.
Changes
Pumps stacking_ensembles too to version 0.2.0.
Changes KerasBatchClassifier to KerasGBatchClassifier.
Bug Fixes
Fix voting estimators duplicate naming problem.
Version 0.6.0 / tool_main: 1.0.6.0 / keras: 0.2.0 (05-13-2019)
New Features
Adds Nested CV to searchcv tool.
Adds BinarizeTargetClassifier.classifier_, BinarizeTargetRegressor.regressor_ and IRAPSClassifier.get_signature() in estimator_attributes tool.
Reformat the output of corss_validate.
Adds KerasBatchClassifier.
Makes KerasGClassifier and KerasGRegressor support multi-dimension array.
Changes
Changes min value of n_splits from 2 to 1.
Main Tool version changes on the last second number instead of the last one.
Bug Fixes
Fixes train_test_split which didn't work with default scoring.
Version 0.5.0 / tool_main: 1.0.0.5 / keras: 0.2.0 (05-13-2019)
New Features
Extend binarize target scorers to support stacking estimators, i.e., use binarize target estimator as meta estimator.
Adds cv_results attributes to estimator_attributes tool.
Adds loading prefitted model for prediction in keras_model_builder tool.
Adds save_weights and load_weights for keras classifier/regressor models.
Merges keras model builder
Changes
Refactors the multiple scoring input for searchcv and simplify cv_results output.
Refactors import system, get rid of exec import.
Bug Fixes
Fixes stacking estimators whitelist issue and other import issues.
Fixes bases typo error in stacking ensembles tool
Fixes multiple scoring error in train_test_split mode
Version 0.4.0 / tool_main: 1.0.0.4 (04-29-2019)
New Features
Adds StackingCVClassifier, StackingClassifier and StackingRegressor to Stacking_ensembles tool, and makes explicit base estimator and meta estimator building options.
Adds .gitattributes and .gitignore.
Changes
Changes extended_ensemble_ml.xml to stacking_ensembles.xml.
Moves src to subfolder Galaxy-ML
Bug Fixes
Fix safepickler classobj issue
Version 0.3.0/ tool_main: 1.0.0.3 (04-23-2019)
New Features
Makes RepeatedOrderedKFold.
Makes train_test_split tool and adds train_test_split to searchcv tool.
Adds jpickle to persist sklearn objects.
Makes TDMScaler.
Makes search parameter options in search_model_validation tool using from_dataset, the get_params output of
estimator_attributes tool.
Restructures estimator_attributes tool to be workflow friendly.
Changes
Separate OrderedKFold into model_validations module.
Refactors SafePickler class and pickle white list loading system for better CPU and Memory efficiency.
Separates feature_selector module out from utils.
Bug Fixes
Fix safepickler classobj issue
Version 0.2.0 (03-24-2019)
New Features
SearchCV tool selects param from get_params() dataset.
Adds extended_ensemble_ml tool which wraps StackingCVRegressor to ensemble machine learning.
Extends estimator_attributes tool to output get_params().
Adds support of multipleprocessing in IRAPSCore.
Changes
Removes the limit of n_jobs=1 for IRAPSClassifier
Changes named estimators in pipeline_builder tool. Use make_pipeline instead of Pipeline initiation.