Access:

» Writing an Interpreter in OCaml

Related categories: Programming in generall | OCaml

Artur Siekielski
Viewed: 5096 | Article date: 2006-04-20 15:42:15

OCaml is a relatively popular functional-imperative language with a built-in object system. It has a great compiler which generates code with performance similar to the output of the best C compilers. It also boasts a rich collection of libraries and tools; among the latter one cam mention ocamllex and ocamlyacc. In this article, the author will briefly present the aforementioned tools.

OCaml is a relatively popular functional-imperative language with a built-in object system. It has a great compiler which generates code with performance similar to the output of the best C compilers. It also boasts a rich collection of libraries and tools; among the latter one cam mention ocamllex and ocamlyacc (provided together with the compiler), assisting implementation of compilers and interpreters.

About the author

Artur Siekielski is a fourth-year student of computer science at the Faculty of Mathematics, Physics and Computer Science of the Gdańsk University. Interests: programming languages, software engineering, formal specifications.

Contact with the author: asiekiel@manta.univ.gda.pl

Thanks to those tools, a seemingly difficult task of development of a working interpreter of a programming language can be accomplished in just a few hours. What is more, the tools are so easy to use that even a person with no experience with them and not knowing OCaml itself can, after a short introduction, successfully get the job done.

The point of this article is to demonstrate that this is indeed possible. In its first part we will briefly present the aforementioned tools, in the second one we will make use of them to implement an interpreter of a simple programming language containing while loops, if instructions and assignments. Meanwhile we will be explaining programming constructs of OCaml and the usage of its compiler.

Installation of OCaml

Appropriate installation files have been provided on the disc enclosed with the magazine.

Under Linux the easiest way is to install a distribution package or, if the first option is not possible, compile the source code of the compiler available from the ocaml-3.08.3.tar.bz2 file. Having unpacked the archive we execute ./configure -prefix <path>, followed by make world opt install. In order to make it easier for ourselves to use the interactive mode, we can additionally install a wrapper called ledit - it grants us access to the history of commands. After the installation, instead of ocaml we execute ledit ocaml.

Under Windows we can use one of two installation files: ocaml-3.08.3-win-mgw.exe and ocaml-3.08.3-win-msvc.exe (the contents of the first one have been compiled in the MinGW environment, of the second - using Visual Studio). After we have run one of them a graphical installer appears - further steps here require no comments. In order to use Makefile files, we have to install GNU Make and a number of programs such as rm or cp; the easiest way here is to install MinGW and use OCaml tools from the level of the Msys shell.

Figure 1. An example derivation tree in context-less grammar

Implementation of a Simple Lexer

According to what has been said in the Structure of Interpreters inset, we should begin implementing an interpreter by implementing a lexer. We can save a lot of time here by using the aforementioned ocamllex program, which generates the source code of a lexer basing on description of its token structure (programs which do this are often called lexer generators).

Listing 1. Definition of a simple lexer (an input file for ocamllex)

 

{
type token_t =
PLUS
|
MINUS
|
LPAREN
|
RPAREN
|
NUM of int
|
EOF
}

let digit = ['0'-'9']

rule tokenize = parse
| '+'
{ PLUS }
| '-'
{ MINUS }
| '
(' { LPAREN }
| '
)' { RPAREN }
|
digit+ as num
{ NUM (int_of_string num) }
|
eof { EOF }
|
[' ' 't'] { tokenize lexbuf }

An example input file for this tool has been provided as Listing 1. Its first part is the header, enclosed within the characters: { and }, which can be any OCaml code; it is placed in the beginning of the code of the generated lexer. The second part is a description of tokens; this is not OCaml code.

In our examples the following tokens are valid: the characters: +, -, ( and ), integer numbers and the end-of-file character (eof). This is reflected by the definition of the token_t type (placed in the header) which represents allowable values of tokens. This is a so-called algebraic type; objects of this type are created by using one of the constructors (the names of which are separated by | characters) and this is they only way of creating an object of such a type. Algebraic-type objects are immutable.

Page: 1 2 3 4 5
Buy article Buy subscription
Buy now add to cart
add to cart
Standard price: 2€/$3 Standard price: 25€/$30
Buy article for as little as (2€/$3) each allow access to individual articles. Buy a full access to our Software Developers's Journal archive portal. You will be able to read the articles from all archive issues from year 2005 and 2006. For just 25€/$30 you get unrestricted access to the entire website for the whole year.
SDJhakin9

.SDJ Users:


.:Login
.:Password

[Register]
[Forgotten your password?]

...Shopping Cart

sum: 0 €
Choose currency:

...Topics

...Advertisement

www.acunetix.com www.verifysoft.com

...Conferences




...Print Edition Archive

...Affiliate Program



 

 

Subscribe | Contact Us | Newsletter | Privacy policy | Regulations | See all issues | About SDJ
Copyright C 2006 by Software Developer's Journal. All rights reserved.