philosophy,mathematics,technology

MLAT : a set of natural language processing tools in OCAML


MLAT early version can be downloaded at
phimatex sourceforge file list.

A short presentation

I want to achieve something like that to obtain histograms for n-grams:
   let dico = new dictionary range default_parameters;;
   dico#add_to_forget_list file_name;;
   dico#add_source stream_name optional_parameters;;
   dico#info;; (* to get some informations *)
   dico#add_source stream_name optional_parameters;; (* to add another source *)
   dico#quantify forget_levels;;
   dico#info;; 
   
and to quantify the corpus :
   dico#set_feature_space final_dimention internal_dimension;;
   dico#component_cloud format projective_dimension;; (* to get informations, format is in ASCII|XML|HTML|ASCII4MAT *)
   dico#sources_cloud   format projective_dimension;; 
   
or in a more advanced version to produce automatic clustering :
   dico#learn_supervised_clustering new_key flags_set1 flags_set2 algorithm;; (* to clusterize set1 vs set2 *)
   dico#learn_info new_key;;
   let k = dico#add_source stream_name optional_parameters;; 
   dico#clusterize key set_of_source_keys;;
   
or to provide unsupervised clustering with something like :
   dico#learn_clusters new_key set_of_source_keys;; (* or replacing set_of_source_keys by set_of_flag_keys *)
   dico#learn_info new_key;;
   let k = dico#add_source stream_name optional_parameters;; 
   dico#clusterize key set_of_source_keys;;
   
some friends already asked me to be able to compute distances like in :
   let d = dico#distance_bewteen source1 source2;;
   let k = dico#closest_to source;;
   

TODO list

VERSION 0.1 - load it into OCAML with #use "MLAT.ml"

this is a site by CAL