How to create a DCG-predicate that is usable as both a parser and a generator?

Hi everyone!

I recently wanted to become better at Prolog (having used it in the past only in an “Introduction to Logic”-course at university). As I am also learning a bit to speak Esperanto, which is a constructed language and therefore reasonably regular/systematic I thought it would be an interesting challenge to attempt to create a parser/generator of simple Esperanto sentences with Prolog.

For instance, in Esperanto all (singular) nouns end with -o. It is thus easy to write a predicate like

stem_noun(Stem, Noun, singular) :- 
  string_concat(Stem, "o", Noun).

noun(noun(Stem, Number) -->
  [Noun], {
    stem_noun(Stem, Noun, Number)

This works fine for parsing, when used like phrase(noun(Res), ["knabo"]).. It also works fine for testing when all arguments are filled in.

It however does not work for generating, when used like phrase(noun(_), Res). because string_concat will complain that Stem is not instantiated enough.
Arguably this is reasonable; after all when generating words it might make sense to only stick to a vocabulary of known words.

Thus I might rewrite stem_noun as:

stem_translation("knab", "boy").
stem_translation("flor", "flower").
% etc
% ...

stem_noun(Stem, Noun, singular) :-
  stem_translation(Stem, _),
  string_concat(Stem, "o", Noun).

However, this will restrict the user to only using words for which the stem is known. As (a) it is infeasible for the vocabulary to be complete as new words get made from time to time and (b) names of for instance persons are also used as nouns, this would be too restrictive.

So how is this managed? One approach I considered is

stem_noun(Stem, Noun, singular) :-
  (nonvar(Noun) ; stem_translation(Stem, _)),
  string_concat(Stem, "o", Noun).

which will make the flow different depending on the direction the predicate is used. I believe it “works as intended”, but I’m not sure how idiomatic this is. Are there better ways to approach this problem?

1 Like

Some simple advise.

  1. When first learning to use DCGs for both parsing and generating, I have found that it makes more sense to develop them independently until you have them both working. Then if there is enough commonality in the code they can be combined. Sometimes it works seamlessly, sometimes it appears to work seamlessly when using the code but the code is not easy to understand (e.g. lots of conditionals), sometimes it turns into a nightmare and can’t be done seamlessly. Using constraints also helps at times.

  2. Since you are using a spoken language and not a programming language for your exercise you will need to look at work using Natural Language Processing. Luckily there is a nice free intro book.

“Prolog and Natural-Language Analysis” by Fernando C. N. Pereira and Stuart M. Shieber (site) (free pdf )