Terminal Terminal | Web Web
Home  //  Play

Manticore CSV Import Example

Difficulty: Beginner
Estimated Time: 10 minutes
Note: this course contains examples of how to work with plain(not real-time) tables, which cannot be reproduced with a default Manticore configuration file. For more details about the operating modes and the corresponding configuration settings, see our Manual

Manticoresearch - Manticore CSV import

In this tutorial you will learn how to import in Manticore a CSV file.

The enviroment will launch a small docker instance with Manticore Search.

The dataset used in this scenario is https://www.kaggle.com/carolzhangdc/imdb-5000-movie-dataset .

Manticore CSV Import Example

Step 1 of 3

Importing from CSV

One great thing about Manticore is that it can grab data from several sources in an easy way.

One of the data sources can be a CSV or TSV file. There are mainly 2 things we need to do in order to import a CSV:

  1. A small preparation of the CSV file.

Manticore requires the first column to represent the document id, which needs to be an unique integer number. If our CSV doesn't have this, we can add it easy with an awk command:

awk -v OFS=, 'NR ==1 {print "ID", $0; next} {print (NR-1), $0}' movie_metadata.csv > movie.csv

Manticore also requires to have no header in the CSV. In our sample CSV we have header, so we're going to remove it:

sed -i '1d' movie.csv

  1. Create source and table

In the source we define the source type, in this case 'csvpipe'. The data is read from the output of the 'csvpipe_command'. In our case we simply read a static file, but it can be a script written in PHP or Java that builds the data by getting it from a database/warehouse software, files or even an external API service.

Next we declare the each one of the CSV columns. Note that the first column - the unique id - is not declared as it's implicit.

    source movies
    {
      type = csvpipe
      csvpipe_command = cat /tutorial/movie.csv
      csvpipe_attr_string = color
      csvpipe_field_string = director_name
      csvpipe_attr_uint = num_critic_for_reviews
      csvpipe_attr_uint = duration
      csvpipe_attr_uint = director_facebook_likes
      csvpipe_attr_uint = actor_3_facebook_likes
      csvpipe_field_string = actor_2_name
      csvpipe_attr_uint = actor_1_facebook_likes
      csvpipe_attr_uint = gross
      csvpipe_field = genres
      csvpipe_field_string = actor_1_name
      csvpipe_field_string = movie_title
      csvpipe_attr_uint = num_voted_users
      csvpipe_attr_uint = cast_total_facebook_likes
      csvpipe_field_string = actor_3_name
      csvpipe_attr_uint = facenumber_in_poster
      csvpipe_field = plot_keywords
      csvpipe_attr_string = movie_imdb_link
      csvpipe_attr_uint = num_user_for_reviews
      csvpipe_attr_string = language
      csvpipe_attr_string = country
      csvpipe_attr_string = content_rating
      csvpipe_attr_uint = budget
      csvpipe_attr_uint = title_year
      csvpipe_attr_uint = actor_2_facebook_likes
      csvpipe_attr_float = imdb_score
      csvpipe_attr_float = aspect_ration
      csvpipe_attr_uint = movie_facebook_likes
    }

The table declaration is simple. We specify the location of the table and tell the table to use the above source.

table movies
 {
    type            = plain
    path            = /var/lib/manticore/data/movies
    source          = movies
 }

Indexing

Plain tables are build with the indexer tool. When using indexer, the generated table files must be made with permissions that allow searchd((Manticore Search server) to access them. In case of Linux official packages searchd runs under manticore user, so indexer must also run under manticore user:

sudo -u manticore indexer movies

If you are running searchd differently you might be able to omit the sudo -u manticore part, just make sure that the user under which your searchd instance is running has read/write permissions to the tables you generate.

Search daemon loads a new table if a config change is detected. In our case, the new table was already in the configuration when searchd was started. To load our table, we need to issue 'RELOAD TABLES' command:

mysql -P9306 -h0

RELOAD TABLES;

If the table is loaded, it will appear in the list of active tables:

SHOW TABLES;exit;

If the CSV receives new data, we need to reindex the table. As the table is already loaded, we will pass '--rotate' parameter to 'indexer' tool which will inform the daemon to pick the new table version, without daemon restart:

sudo -u manticore indexer movies --rotate

Using the new table

Now we can start running queries :

mysql -P9306 -h0

SElECT movie_title FROM movies WHERE MATCH('Tom Hanks');

SELECT * FROM movies WHERE match('Tom Hanks') LIMIT 40 FACET content_rating FACET INTERVAL(gross,0,10000000,100000000,1000000000) FACET title_year ORDER BY title_year DESC FACET imdb_score ORDER BY imdb_score DESC;