org.apache.mahout

drivers

package drivers

Visibility
  1. Public
  2. All

Type Members

  1. abstract class MahoutSparkDriver extends MahoutDriver

    Extend this class to create a Mahout CLI driver.

    Extend this class to create a Mahout CLI driver. Minimally you must override process and main. Also define a Map of options for the command line parser. The following template may help:

    object SomeDriver extends MahoutDriver {
    
      // define only the options specific to this driver, inherit the generic ones
      private final val SomeOptions = HashMap[String, Any](
          "maxThings" -> 500,
          "minThings" -> 100,
          "appName" -> "SomeDriver")
    
      override def main(args: Array[String]): Unit = {
    
        val parser = new MahoutOptionParser(programName = "shortname") {
          head("somedriver", "Mahout 1.0-SNAPSHOT")
    
          // Input output options, non-driver specific
          parseIOOptions()
    
          // Algorithm specific options
          // Add in the new options
          opts = opts ++ SomeOptions
          note("\nAlgorithm control options:")
          opt[Int]("maxThings") abbr ("mt") action { (x, options) =>
            options + ("maxThings" -> x) ...
        }
        parser.parse(args, parser.opts) map { opts =>
          parser.opts = opts
          process()
        }
      }
    
      override def process: Unit = {
        start() // override to change the default Kryo or SparkConf before the distributed context is created
        // do the work here
        stop()
      }
  2. class MahoutSparkOptionParser extends MahoutOptionParser

    Adds parsing of Spark specific options to the option parser

  3. trait TDIndexedDatasetReader extends Reader[IndexedDatasetSpark]

    Extends Reader trait to supply the org.apache.mahout.sparkbindings.indexeddataset.IndexedDatasetSpark as the type read and a element and row reader functions for reading text delimited files as described in the org.apache.mahout.math.indexeddataset.Schema

  4. trait TDIndexedDatasetReaderWriter extends TDIndexedDatasetReader with TDIndexedDatasetWriter

    A combined trait that reads and writes

  5. trait TDIndexedDatasetWriter extends Writer[IndexedDatasetSpark]

    Extends the Writer trait to supply the type being written and supplies the writer function

  6. class TextDelimitedIndexedDatasetReader extends TDIndexedDatasetReader

    Reads text delimited files into an IndexedDataset.

    Reads text delimited files into an IndexedDataset. Classes can be used to supply trait params in their constructor.

    Note

    The source is supplied to Reader#readElementsFrom .

  7. class TextDelimitedIndexedDatasetReaderWriter extends TDIndexedDatasetReaderWriter

    Reads and writes text delimited files to/from an IndexedDataset.

    Reads and writes text delimited files to/from an IndexedDataset. Classes are needed to supply trait params in their constructor.

  8. class TextDelimitedIndexedDatasetWriter extends TDIndexedDatasetWriter

    Writes text delimited files into an IndexedDataset.

    Writes text delimited files into an IndexedDataset. Classes can be used to supply trait params in their constructor.

    Note

    the destination is supplied to Writer#writeTo

Value Members

  1. object ItemSimilarityDriver extends MahoutSparkDriver

    Command line interface for org.apache.mahout.math.cf.SimilarityAnalysis#cooccurrencesIDSs.

    Command line interface for org.apache.mahout.math.cf.SimilarityAnalysis#cooccurrencesIDSs. Reads text lines that contain (row id, column id, ...). The IDs are user specified strings which will be preserved in the output. The individual elements will be accumulated into a matrix like org.apache.mahout.math.indexeddataset.IndexedDataset and org.apache.mahout.math.cf.SimilarityAnalysis#cooccurrencesIDSs will be used to calculate row-wise self-similarity, or when using filters or two inputs, will generate two matrices and calculate both the self-similarity of the primary matrix and the row-wise similarity of the primary to the secondary. Returns one or two directories of text files formatted as specified in the options. The options allow flexible control of the input schema, file discovery, output schema, and control of algorithm parameters. To get help run

    mahout spark-itemsimilarity

    for a full explanation of options. To process simple elements of text delimited values (userID,itemID) with or without a strengths and with a separator of tab, comma, or space, you can specify only the input and output file and directory--all else will default to the correct values. Each output line will contain the Item ID and similar items sorted by LLR strength descending. mahout spark-itemsimilarity }}} values (userID,itemID) with or without a strengths and with a separator of tab, comma, or space, you can specify only the input and output file and directory--all else will default to the correct values. Each output line will contain the Item ID and similar items sorted by LLR strength descending.

    Note

    To use with a Spark cluster see the --master option, if you run out of heap space check the --sparkExecutorMemory option. Other org.apache.spark.SparkConf key value pairs can be with the -D:k=v option.

  2. object RowSimilarityDriver extends MahoutSparkDriver

    Command line interface for ).

    Command line interface for ). Reads a text delimited file containing rows of a org.apache.mahout.math.indexeddataset.IndexedDataset with domain specific IDS of the form (row id, column id: strength, ...). The IDs will be preserved in the output. The rows define a matrix and ) will be used to calculate row-wise similarity using log-likelihood. The options allow control of the input schema, file discovery, output schema, and control of algorithm parameters.

    To get help run

    mahout spark-rowsimilarity

    for a full explanation of options. The default values for formatting will read (rowID<tab>columnID1:strength1<space>columnID2:strength2....) and write (rowID<tab>rowID1:strength1<space>rowID2:strength2....) Each output line will contain a row ID and similar columns sorted by LLR strength descending. mahout spark-rowsimilarity }}} values for formatting will read (rowID<tab>columnID1:strength1<space>columnID2:strength2....) and write (rowID<tab>rowID1:strength1<space>rowID2:strength2....) Each output line will contain a row ID and similar columns sorted by LLR strength descending.

    Note

    To use with a Spark cluster see the --master option, if you run out of heap space check the --sparkExecutorMemory option.

  3. object TestNBDriver extends MahoutSparkDriver

  4. object TrainNBDriver extends MahoutSparkDriver

Ungrouped