>>
Manticore CSV Import Example
Importing from CSV
One great thing about Manticore is that it can grab data from several sources in an easy way.
One of the data sources can be a CSV or TSV file. There are mainly 2 things we need to do in order to import a CSV:
- A small preparation of the CSV file.
Manticore requires the first column to represent the document id, which needs to be an unique integer number. If our CSV doesn't have this, we can add it easy with an awk command:
awk -v OFS=, 'NR ==1 {print "ID", $0; next} {print (NR-1), $0}' movie_metadata.csv > movie.csv
Manticore also requires to have no header in the CSV. In our sample CSV we have header, so we're going to remove it:
sed -i '1d' movie.csv
- Create source and table
In the source we define the source type, in this case 'csvpipe'. The data is read from the output of the 'csvpipe_command'. In our case we simply read a static file, but it can be a script written in PHP or Java that builds the data by getting it from a database/warehouse software, files or even an external API service.
Next we declare the each one of the CSV columns. Note that the first column - the unique id - is not declared as it's implicit.
source movies { type = csvpipe csvpipe_command = cat /tutorial/movie.csv csvpipe_attr_string = color csvpipe_field_string = director_name csvpipe_attr_uint = num_critic_for_reviews csvpipe_attr_uint = duration csvpipe_attr_uint = director_facebook_likes csvpipe_attr_uint = actor_3_facebook_likes csvpipe_field_string = actor_2_name csvpipe_attr_uint = actor_1_facebook_likes csvpipe_attr_uint = gross csvpipe_field = genres csvpipe_field_string = actor_1_name csvpipe_field_string = movie_title csvpipe_attr_uint = num_voted_users csvpipe_attr_uint = cast_total_facebook_likes csvpipe_field_string = actor_3_name csvpipe_attr_uint = facenumber_in_poster csvpipe_field = plot_keywords csvpipe_attr_string = movie_imdb_link csvpipe_attr_uint = num_user_for_reviews csvpipe_attr_string = language csvpipe_attr_string = country csvpipe_attr_string = content_rating csvpipe_attr_uint = budget csvpipe_attr_uint = title_year csvpipe_attr_uint = actor_2_facebook_likes csvpipe_attr_float = imdb_score csvpipe_attr_float = aspect_ration csvpipe_attr_uint = movie_facebook_likes }
The table declaration is simple. We specify the location of the table and tell the table to use the above source.
table movies { type = plain path = /var/lib/manticore/data/movies source = movies }
Indexing
Plain tables are build with the indexer tool. When using indexer, the generated table files must be made with permissions that allow searchd((Manticore Search server) to access them. In case of Linux official packages searchd runs under manticore user, so indexer must also run under manticore user:
sudo -u manticore indexer movies
If you are running searchd differently you might be able to omit the sudo -u manticore part, just make sure that the user under which your searchd instance is running has read/write permissions to the tables you generate.
Search daemon loads a new table if a config change is detected. In our case, the new table was already in the configuration when searchd was started. To load our table, we need to issue 'RELOAD TABLES' command:
mysql -P9306 -h0
RELOAD TABLES;
If the table is loaded, it will appear in the list of active tables:
SHOW TABLES;exit;
If the CSV receives new data, we need to reindex the table. As the table is already loaded, we will pass '--rotate' parameter to 'indexer' tool which will inform the daemon to pick the new table version, without daemon restart:
sudo -u manticore indexer movies --rotate
Using the new table
Now we can start running queries :
mysql -P9306 -h0
SElECT movie_title FROM movies WHERE MATCH('Tom Hanks');
SELECT * FROM movies WHERE match('Tom Hanks') LIMIT 40 FACET content_rating FACET INTERVAL(gross,0,10000000,100000000,1000000000) FACET title_year ORDER BY title_year DESC FACET imdb_score ORDER BY imdb_score DESC;