Getting started
Docker
Docker image available at
registry.gitlab.com/sray/cmu-ta2:latest
D3M dataset
The AutonML can accept any type of D3M dataset. If there is no D3M dataset, this is how a raw dataset is converted to a D3M First install the d3m package
pip install d3m
python create_d3m_dataset.py <train_data.csv> <test_data.csv> <label> <metric> -t classification <-t ...>
Currently d3m package needs Python 3.6 only
Detailed description of dataset type(s), task type(s) and metrics provided in the Converting raw data to d3m datasets section
Starting script
Requires docker on your OS.
Update location of your dataset for target “input” to the docker run.
Run the following script
./scripts/start_container.sh
The above script has 4 mount points for the docker
input: Path of the input dataset
output: Directory where all outputs will be stored
static: Location of all static files (Use static directory of this repository)
scripts: Location of this repository’s scripts.
Search and predictions
The above script will do the following
Pull docker image and run search for best pipelines for the specified dataset using TRAIN data
JSON pipelines (with ranks) will be output in JSON format at /output/<search_dir>/pipelines_ranked/
CSV prediction files of the pipelines trained on TRAIN data and predicted on TEST data will be available at /output/<search_dir>/predictions/
Training data predictions (cross-validated mostly) are produced in the current directory as /output/<search_dir>/training_predictions/<pipeline_id>_train_predictions.csv.
Python code equivalent of executing a JSON pipeline on a dataset produced at /output/<search_dir>/executables/
This code can be run as
python <generated_code.py> <path_to_dataset> <predictions_output_file>
An example
python /output/6b92f2f7-74d2-4e86-958d-4e62bbd89c51/executables/131542c6-ea71-4403-9c2d-d899e990e7bd.json.code.py 185_baseball predictions.csv
If feature_importances and intermediate outputs are desired, call scripts/run_outputs.sh instead of scripts/run.sh from scripts/start_container.sh
Features importances and intermediate step outputs will be produced in /output/<search_dir>/pipeline_runs/