Extend this class to create a Mahout CLI driver.
Adds parsing of Spark specific options to the option parser
Extends Reader trait to supply the org.apache.mahout.sparkbindings.indexeddataset.IndexedDatasetSpark as the type read and a element and row reader functions for reading text delimited files as described in the org.apache.mahout.math.indexeddataset.Schema
A combined trait that reads and writes
Extends the Writer trait to supply the type being written and supplies the writer function
Reads text delimited files into an IndexedDataset.
Reads text delimited files into an IndexedDataset. Classes can be used to supply trait params in their constructor.
The source is supplied to Reader#readElementsFrom .
Reads and writes text delimited files to/from an IndexedDataset.
Reads and writes text delimited files to/from an IndexedDataset. Classes are needed to supply trait params in their constructor.
Writes text delimited files into an IndexedDataset.
Writes text delimited files into an IndexedDataset. Classes can be used to supply trait params in their constructor.
the destination is supplied to Writer#writeTo
Command line interface for org.apache.mahout.math.cf.SimilarityAnalysis#cooccurrencesIDSs.
Command line interface for org.apache.mahout.math.cf.SimilarityAnalysis#cooccurrencesIDSs. Reads text lines that contain (row id, column id, ...). The IDs are user specified strings which will be preserved in the output. The individual elements will be accumulated into a matrix like org.apache.mahout.math.indexeddataset.IndexedDataset and org.apache.mahout.math.cf.SimilarityAnalysis#cooccurrencesIDSs will be used to calculate row-wise self-similarity, or when using filters or two inputs, will generate two matrices and calculate both the self-similarity of the primary matrix and the row-wise similarity of the primary to the secondary. Returns one or two directories of text files formatted as specified in the options. The options allow flexible control of the input schema, file discovery, output schema, and control of algorithm parameters. To get help run
mahout spark-itemsimilarity
for a full explanation of options. To process simple elements of text delimited values (userID,itemID) with or without a strengths and with a separator of tab, comma, or space, you can specify only the input and output file and directory--all else will default to the correct values. Each output line will contain the Item ID and similar items sorted by LLR strength descending. mahout spark-itemsimilarity }}} values (userID,itemID) with or without a strengths and with a separator of tab, comma, or space, you can specify only the input and output file and directory--all else will default to the correct values. Each output line will contain the Item ID and similar items sorted by LLR strength descending.
To use with a Spark cluster see the --master option, if you run out of heap space check the --sparkExecutorMemory option. Other org.apache.spark.SparkConf key value pairs can be with the -D:k=v option.
Command line interface for ).
Command line interface for ). Reads a text delimited file containing rows of a org.apache.mahout.math.indexeddataset.IndexedDataset with domain specific IDS of the form (row id, column id: strength, ...). The IDs will be preserved in the output. The rows define a matrix and ) will be used to calculate row-wise similarity using log-likelihood. The options allow control of the input schema, file discovery, output schema, and control of algorithm parameters.
To get help run
mahout spark-rowsimilarity
for a full explanation of options. The default values for formatting will read (rowID<tab>columnID1:strength1<space>columnID2:strength2....) and write (rowID<tab>rowID1:strength1<space>rowID2:strength2....) Each output line will contain a row ID and similar columns sorted by LLR strength descending. mahout spark-rowsimilarity }}} values for formatting will read (rowID<tab>columnID1:strength1<space>columnID2:strength2....) and write (rowID<tab>rowID1:strength1<space>rowID2:strength2....) Each output line will contain a row ID and similar columns sorted by LLR strength descending.
To use with a Spark cluster see the --master option, if you run out of heap space check the --sparkExecutorMemory option.
Extend this class to create a Mahout CLI driver. Minimally you must override process and main. Also define a Map of options for the command line parser. The following template may help: