These scripts are used to perform the various tasks described in the rfCon manuscript. For more detailed instructions on running each script, read the included comments in each file avgVarImportance.py = a script for finding the average variable importance of each category of features chooseRandomFold.py = for generating 50%off data. Chooses random 50% of the input file's negative and positive classes. chooseRandomPercent.py = for generating 50%off data. chooseRandomPercent_v2.0.py = same as above but maintains imbalance ratios by allowing for independent pos/neg resampling. combineBlindSdaPreds.py = blind test SDA predection target data generation script for combining raw pred files. combineSdaPreds.py = SDA prediction target data generation script for combining raw pred training files convertSdaScoresToFeatures.py = create features files for SDA ensemble by combining the output of individual sda models equalize_v2.py = target file example class balancer featureCount.py = count features present featureRemoval_meanDecreaseAccuracy_sep12.py = performs sep12 feature selection featureRemoval_meanDecreaseAccuracy_sep24.py = performs sep24 feature selection featureRemoval_meanDecreaseAccuracy_sep6.py = performs sep6 feature selection findDupProt.py = find duplicate proteins based on the id tag on example lines genRegression.py = convert examples to regression examples based on the formula in the rfcon manuscript get_only_X.py = sda data input conversion labelSdaEnsemblePreds.py = label the features for sda ensemble predictions (adds comments to the examples) matchBalancedExamps.py = An efficient script for matching example lines between two different files. used to reproduce training fold files. Capable of scanning the 1TB of total feature files for over 1 million matches in under 30 minutes. matchBalancedExamps_v0.py = the same as above but less flexible input options. meanDecreaseGini_featureRemovalTest_sep6.py = sep6 feature selection with mean decrease gini multiClassifySVMplus.py = classify many svm targets at once in parallel multiRF_predict.py = classiy many RF targets at once. multiSdaPredict.py = classify many Sda targets at once in parallel multiTopPred.py = converts multiple output prediction files to CASP competition format for easier understanding. pickAndLabelRf.py = processes RF output predictions pickle_edit.py = organize input data folds for sda training. creates python pickle (X only) input data conversion. pickSinglePred.py = extract only the highest vote (pos/neg) for each sda prediction pickTopPreds_reg.py = converts output prediction files to CASP competition format for easier understanding. for use only on the sda ensemble final models which use regression based examples (not binary classification). posneg.py = determine positive and negative ratio of an example or prediction file removeDuplicates.py = compare two example files for examples derived from matching proteins and remove those cases. generate a third file which is free of these duplicates. rrEval.py = perform evaluation of a single predictor on a single target runMultiRrEval.py = multiple target performance evaluation of a predictor sda_pick_positive.py = sda prediction value processing and extraction of predictions counted as positive. svmFeaturesToRfFeatures.py = convert features to the random forest input format targetValFix.py = convert features from svm to sda format (changes target values of -1 to 0) targetVal_svmConvert.py = convert features from sda to svm format (changes target values of 0 to -1) topToArray.py = a script for processing feature selection results to create an array of the top 100 features for easier input into feature selection scripts. PDB_IDs.txt = a list of all PDB files used in our datasets. The corresponding data for each one can be downloaded from the protein data bank website at https://www.rcsb.org/