current endeavors : 100 pushups

topics:  main-page   everything   99things   things-to-do   software   space   future   exercise & health   faith  
  thought   web   movies+TV   music   mymusic   food   curiosity   tidbits   I remember   wishlist   misc   links


During June 2008 I began to think more about AI: How to represent facts about the world then how to use that to parse language. To view this research log from beginning to end (rather than seeing more recent posts), click here.

The Turing Test
September 27

The Turing test is a setup where you have a person sitting at a computer chatting via something like an instant messenger program (like MSN Messenger) with someone at the other end that could either be another person or a computer. The job of the person is to figure out whether what they're chatting with is a person or a computer. If it is a computer and they can tell, then the computer has failed the Turing test. Designing AI to pass the Turing Test is one of the holy grails of computer science.

My guess is that the first AI to pass this test will be more akin to a magician pulling off an amazing feat, and less about creating a machine that truly does what the Turing test is supposed to test for: AI with roughly the same intelligence as a human mind. It would be like a magician who causes someone to disappear, convincing an audience full of guests that he's done the impossible. In reality, what he has done is use smoke and mirrors in a clever way -- he certainly didn't make a person vanish into thin air. And so passing the Turing test becomes a game of deceit; creating a facade that fools the observer.

This doesn't make the Turing Test an unworthy pursuit, and no doubt the machine will have to be mighty clever, but I think we need to realize that the first AI to pass it may not be as advanced as we would suppose.

Link: BBC: George, an online robot breaking new ground

Update: July 8, 2008: To be fair, I think it depends on how you define the Turing test. If the test is passed the first time any human being thinks they're chatting with a person when in reality they're chatting with an AI, then I think my above observation is a good one. (And perhaps there have already been people who have been fooled) But if you define passing the test to involve fooling people who are trained at interrogating AIs with a > 90% success ratio, then I think it's going to take a lot more than smoke and mirrors.


Artificial Intelligence
July 8

During the last year and a half, I've spent a bit of time thinking about the Turing Test and AI in general.

Over the last month, I've had some more time to expand on and refine those ideas, and the process has been very rewarding. My brain is bursting at the seams with ideas! I'd like to start writing these ideas down as blog posts so that I don't forget them.

At this point, I'm torn between the possibility of doing a masters that focuses on AI versus one that focuses on inventing a new programming language.


Does it matter how an AI encodes information?
July 8

One of the fundamental things you need for an AI is a way to encode information about the world. You need a data structure.

Computer programmers are used to arrays, hashes, linked lists, trees, etc. But the question arises: How does the brain do it? And is that an important consideration?

One of my theses is that the data structure used will have a significant impact on the process of developing an AI and, ultimately, the end result. Said differently, I theorize that intelligent algorithms arise quite naturally out of the "right" data structure, and much more unnaturally out of the "wrong" data structure.


Encoding information like the brain does
July 8

Although it remains a mystery how the brain encodes information, we do have some clues. The brain uses neurons to create a web of interconnectedness, forming new connections as necessary.

From a math perspective, this is somewhat analogous to a graph. A graph has nodes and edges. Directed graphs have nodes and directed edges.

Thesis: Modeling an AI's understanding of the world (data structure) using nodes and edges will give rise more naturally to intelligent algorithms.


Relationship
July 8

Thesis: One of the fundamental properties of the universe is relationship.

In both the physical and spiritual world, the idea of relationship looms large. It is at the very fabric of the universe.

Without relationship, numbers couldn't exist, space and time couldn't exist -- pretty much nothing except for perhaps a singularity, and even that doesn't make much sense.


Encoding a relationship using nodes and edges
July 8

A relationship consists of three things: Two entities and their relation:

(entity1) (relation) (entity2)

An ambiguous relationship between two entities can be modeled using nodes and edges as follows:

(entity1) --- (entity2)

This is a binary relationship, but what we're really after is a three way, directed relationship. For example, "Daniel likes software":



This solves the problem of encoding the information, but we're already rubbing up against our desire to stick to only a directed graph representation, which don't include directed ternary edges. We'll let this slide for the moment.


Has AI delivered?
July 8

Here's a good article that talks about the lofty promises of AI, which go back as far as the 50s, and have yet to materialize. The article also gives some examples of recent AI successes. One of summary points is that the classical definition of AI isn't where most of the progress is being made. Rather, it is in specialized system.

A rather astonishing figure is that Microsoft Research employs nearly 1,000 PhDs that are actively working in the field of AI. Wow. That's about a factor of 10 larger than I would have guessed.


The is_a relationship
July 9

Perhaps the most fundamental relationship is the is_a relationship. Some examples:

(grass) (is_a) (plant)

(Daniel) (is_a) (person)

This directional relationship can be used in two different contexts:

1.To indicate that one general concept builds upon another. ex. (grass) (is_a) (plant)

2.To indicate that a specific object in the world is of a certain type. ex. (Daniel) (is_a) (person)

The difference here is that (grass) is a general concept whereas (Daniel) is a real thing.


The has_a relationship
July 9

The is_a relationship allows us to encode a lot of information about the world, but it only gets us so far. The next logic step is to introduce the has_a relationship which allows us to describe what something consists of. Some examples:

(person) (has_a) (name)

(dog) (has_a) (tail)


Challanges with has_a
July 9

The difficulty with the has_a relationship becomes apparent when you try to represent the fact that a person has two eyes. The following doesn't do the trick, and raises the question: Does "has_a" imply that an entity has just one of the specified thing?

(person) (has_a) (eye)

In this instance, we can be clever and say:

(left_eye) (is_a) (eye)
(right_eye) (is_a) (eye)
(person) (has_a) (left_eye)
(person) (has_a) (right_eye)

But this approach doesn't work in the general case.

The has_a relationship actually represents a couple related ideas:

1.
has(n), where n is a number from 0 to infinity
has(0) means "has none of"
has(1) means "has one of"
has(2) means "has two of"
etc.

2.
has(n+), where n is a number from 0 to infinity
has(0+) means "has zero or more of"
has(1+) means "has one or more of"
has(2+) means "has two or more of"
etc.

This gives us a lot more flexibility. For example:

(person) (has(2)) (eye)

(person) (has(0+)) (sister)

If you're uncomfortable with the fact that "eye" and "sister" aren't plural here, you need not worry: That is a carry over from the English language which isn't necessary to encode the bare meaning.

Although the use of has(n) and has(n+) does the trick in these cases, they represent a strict departure from one of our primary goals, which is to be able to represent knowledge with little more than a directed graph. This example underscores how tempting it is, even early on in the design process of an AI's knowledge representation, to abandon a directed graph representation.

A problem highlighted here is how we deal with relationships that are parameterized. Is there a way to represent parameterized relationships using a graph? In a sense, we already have, since every ternary relationship has one parameter: The relationship itself. (Remember that the fundamental relationship is the binary relationship which simply says that there is a relationship between two things but doesn't specify what that relationship is)

What is emerging here is that a relationship probably isn't best represented using just an edge of a graph. Instead, each instance of a relationship needs to be represented by its own node in the graph. This allows us to attach properties to the relationship itself.

Watered down has_a

Another strategy is to water down the has_a relationship to mean "has zero or more of". The problem with this is that it doesn't tell us a whole lot. The advantage is that it maintains the goal of using a directed graph.

Specific but limited has_a

A third strategy is to allow more precision but in a limited way by using specific relationships rather than generalized ones:

has_1
has_2
has_3
has_4
...
has_0+

This is probably the weakest long term solution, but there is perhaps some merit in adopting its simplicity for the sake not abandoning our directed graph mandate while allowing for forward progress.

Complex "has"

What is being hinted at here is that we need to represent the has relationship with its own node and then attach properties to it.


Representing relationships with nodes
July 9

To allow us to attach properties to instances of relationships, relationships must themselves have a node.



In the above diagram, the white circle in between (person) and (eye) is itself a node, and so we're back to being able to represent a relationship with a simple directed graph.

We can represent this notationally like this:

person has:2 eye

Complex relationships

How could we model "has zero or more"? One possibility:



This could be represented textually as:

person has:(>:0) sister


The layers of an AI
July 9

An AI, in the classical sense, has different layers, and each of those layers represents an area of study with its own data structure and algorithms.

Core layer

The core of the AI is the data structure in which it represents its understanding of the world. Paired with this are a set of algorithms that allow basic operations to be performed on the data structure, and building on top of that are algorithms to use the data structure to evaluate whether an idea "makes sense", to answer questions via reasoning, etc.

Language in layer

Beyond this core is the need to interface with language as an input. This is a substantially different problem and requires new data structures and algorithms, but this layer has a strong dependency on the core layer to evaluate whether a possible interpretation of a phrase makes sense, and ultimately to store the resultant understanding.

Language out layer

AI doesn't just need to accept language as an input, it also needs to use language as an output medium.

Auditory in layer

Translating the spoken word into a textual representation.

Auditory out layer

Translating text into the spoken word.

Social layer

Any social agent in this world needs more than intelligence and language abilities: It needs an understanding of how to behave appropriately. For example, having a conversation is quite a complex interaction, with many unwritten rules.

Vision layer

Creating a mental model of a spacial environment via image analysis.

Personality  layer

Perhaps least important, but still of interest, is the personality layer, perhaps mostly for its relationship with the social layer. How does an AI add color to its personality? Is this something that happens implicitly, or does it represent another layer of complexity that needs to be developed?

My areas of interest

I'm personally the most interested in the core and language in layers.


Graphing
July 9

To make graphing more practical, the fact that relationships have their own node will be an implied understanding. We'll also use has_a to mean has:1. For example:



The full graph would have been:




Textual representation
July 10

Although graphing can be useful for reinforcing our directed graph representation, we ultimately need a textual representation.

Terminology

Keywords: A keyword is an alphanumeric string that may contain underscores but not spaces. We'll also use symbols such as = as keywords.

Entity: An entity is analagous to a node in a graph. A keyword can be associated with an entity, but the same keyword can't be used to identify more than one entity.

Connection: A connection is analagous to an edge, and can be non-directional, uni-directional, or bi-directional.

Relationship: A relationship has seven parts:
A node to represent the instance of the relationship
A non-directinoal connection from the first entity (ex. Daniel)
A connection to the second entity (ex. person). This is often a directional connection, but can also be non-directional.
A non-directional connection to the type of relationship (ex. is_a)

Entities

An entity can be defined by simply listing a keyword on its own line. For example, the following six entities can be defined like this:

is_a
has
1
Daniel
person
first_name

Note that while entities can be explicitly defined, they are implicitly defined if and when they are first used in a connection or relationship.

Connections

A non-directional connection can be defined like this:

x - y

This results in a non-directional connection between x and y.

A uni-directional connection can be defined like this:

x -> y

A bi-directional connection can be defined like this:

x <-> y

Note that while connections can be explicitly defined, they are most often implicitly defined when relationships are used.

Relationships

A non-directinoal relationship can be defined like this:

Daniel friend Graham

This implies the following entities:

Daniel
friend
Graham

It also implies the following connections, where x is an un-named node in the graph:

Daniel - x
x - friend
x - Graham

A uni-directinoal relationship can be defined like this:

Daniel is_a> person

The greater than sign acts like an arrow to represent the direction. This implies the following entities:

Daniel
is_a
person

It also implies the following connections, where x is a new, un-named node in the graph:

Daniel - x
x - is_a
x -> person

Since certain relationships, such as is_a and has are used so often, we will hard code into our system the fact that they are directional so that we don't have to specify the arrow each time.


Assigning values
July 10

Given the following graph, how do we assign a value for Daniel's age?



Assigning values to a path

One option is to allow a value to be assigned to a path of the graph. For example, the path:

Daniel -> person -> age

... could be assigned the value 27.

Although simple and intuitive, this approach abandons the directed graph representation, since there is no direct way to represent a path in a graph having a value.

A directed graph representation

We can stick with our directed graph representation as follows:



Notice that the value assignment involves a relationship within a relationship: The entity "Daniel" is related to "age", and the definition of that relationship instance is an equality relationship with "27".

This works, but is more complex than seems necessary. In terms of the graph representation, it can be represented by:

Daniel x age
x = 27

Where x is a new, un-named node. Note also that here, = does not represent any new notation and instead is simply a relation between two things uniquely referenced by the = keyword.

An alternative graph representation



This representation is simpler, and can be represented using two directed connections:

Daniel -> 27
27 -> age

It also allows us to represent values for has:n relationships quite easily. For example:



ie.Daniel has two sisters, Rebekah and Hannah.

A textual representation

For ease of use, we'll define value assignment as:

Daniel.age = 27

In cases where more than one value is required, we'll use:

Daniel.sister[0] = Rebekah
Daniel.sister[1] = Hannah

Note that at the moment this is a bit dishonest since there is no inherant order in multi-values for how they are being represented in the graph.

A note about node labels

To make graphs readable, we have been labeling nodes with their associated keyword. But this is merely an aid for the reader, and is not a part of the representation itself. Thus, our current representation is more acturately graphed as follows:



This raises an important point: Until we connect our representation to language in some way shape or form, it can't do very much.

A note about collisions

Consider the following:

Daniel.first_name = Daniel

This is a circular reference, since the "Daniel" entity is used twice. The problem is that the two references to "Daniel" are actually intended to reference different things:

1.The first use is intended to reference the entity Daniel which is a person
2.The second use is intended to reference the entity Daniel which is a word (a string of characters)

This highlights how using keywords to uniquely identify entities becomes tricky due to collisions. It also hints at the need already to connect our representation with language, since "Daniel" is a string of characters. ie. A word.


Representing names (words)
July 10

Although we haven't formally started designing the language component of the AI, we're already needing to reference names, which are words. Words are strings of letters.

Textual representation

A word can be represented by surrounding it with double quotes. For example:

"Daniel"

Graphical representation

The keyword used to uniquely identify a word entity is the same as its textual representation. ie. The word surrounded by double quotes. Example:




Representing variant structure
July 10

Using the has relationship to represent structure allows us to represent a lot about the world, but sometimes the structure of something can vary.

For example, the following isn't always true:

man has:1 wife

How can we represent that fact that a guy can be married or single?

Using weak relationships

The first option is to weaken all relationships so that they represent the way things might be. That way, the following...

man has:0 wife
man has:1 wife

... means:

Maybe a man doesn't have a wife
Maybe a man has a wife

Using scalar weighted relationships

A second option is to optionally assign a scalar value between 0 and 1 to a relationship to indicate roughly how often it is the case.

0.4: man has:0 wife
0.6: man has:1 wife

... means:

Approximately 40% of the time, a man doesn't have a wife
Approximately 60% of the time, a man has a wife

As usual, this represents a bit of a challenge to the directed graph representation, but this is a case where I feel it is appropriate and perhaps analagous to neural connections having weights.


Summary: Entities, relationships, is_a, has, and assignment
July 10

The framework we've laid out so far will allow us to model quite a bit about the world. For example, we can model the following:

Daniel is_a man
man is_a person
person has_a age
Daniel.age = 27
person has_a first_name
Daniel.first_name = "Daniel"
person has_a last_name
Daniel.last_name = "Bigham"
0.4: man has:0 wife
0.6: man has:1 wife
Daniel has:1 wife
wife is_a woman
woman is_a person
Meredith is_a wife
Daniel.wife = Meredith
Meredith.first_name = "Meredith"
Meredith.last_name = "Bigham"
0.4: person has:0 sister
0.6: person has:(>:0) sister
Daniel has:2 sister
sister is_a woman
Rebekah is_a sister
Hannah is_a sister
Daniel.sister[0] = Rebekah
Daniel.sister[1] = Hannah
Rebekah.first_name = "Rebekah"
Rebekah.last_name = "de Bueger"
Hannah.first_name = "Hannah"
Hannah.last_name = "Bigham"



The need for language
July 14

Developing a data structure for an AI's knowledge about the world is the first step, but until it is connected to the real world via a "language in" layer, it can't do very much. People need to be able to interact with it.

The basic setup that is required is to have a language layer that parses simple English statements and then updates the AI's data structure to reflect what was said. What makes this interesting is that, inevitably, the data structure is also involved in the process of language parsing.

A more advanced language layer also allows simple questions to be asked, completing the loop, so that a human participant can make a statement and then ask a question to test whether the AI understood the statement.

For example:

Statement: "My name is Daniel"
Data structure representation: speaker.first_name = "Daniel"

Question: "What is my name?"
Answer: "Daniel"


An example of language parsing
July 14

Before we get too generic about how to parse language, let's start with a simple problem: How to parse the statement "My name is Daniel" and update our data structure to reflect this statement.

Defining inputs and outputs

Our input is a list of words:

"My", "name", "is", "Daniel"

Our desired output is a value assignment that will modify our data structure:

speaker.first_name = "Daniel"

... where speaker is a new entity in the data structure to represent the person we're conversing with.

Transformations

We can achieve our goal by applying a series of transformations to the list of words. For instance:

my {noun} -> speaker.$1

What this says is that if the word "my" is followed by a noun, that could be referring to speaker.noun. In our example:

my name -> speaker.name

However, we have a bit of an issue since we want speaker.first_name. What this highlights is that "name" is still a word; it hasn't been mapped to an entity in the data structure yet. What we need is a mapping from words to entities. In our example, we want:

"name" -> first_name

Thus, when a transformation such as "my {noun} -> speaker.$1" is applied, a second step will be resolving the noun to its possible entities.

What we have now is an intermediate representation, such as:

[speaker.first_name] is Daniel

The first part, speaker.first_name, is fully transformed, but the rest of the statement is still a list of words. The next transformation we need is:

{noun} is {word} -> $1 = $2

What this says is that if the word "is" gets placed between a noun and a word, that could mean that the word defines the value of the noun. In our example, the word "Daniel" defines the value of "My name":

[speaker.first_name] is Daniel -> speaker.first_name = "Daniel"


The language-in layer
July 15

In summary:

The language-in layer's input is a list of words.

The language-in layer's output is one or more assignments that can be applied to the AI's data structure.

Transformations are applied iteratively to convert inputs to outputs. Each transformation consists of an input specification and an output specification. For example:
{noun} is {word} -> $1 = $2

There will be many cases where it will be ambiguous which transformation to apply. ie. There will be more than one possibility. A depth-first or breadth-first search will need to be employed here, possibly using heuristics to determine which transformations to try first.

Some transformations imply additional work:
Output specifications that contain x.$1 imply that $1 needs to be mapped from a word to an entity before the transformation can be applied.
Output specifications that contain x = y imply that y need not necessarily be mapped to an entity. In some cases, it will remain a string. For example:
speaker.first_name = "Daniel"

Transformations require two major data sets:
A mapping from words to entities. This highlights that one word might map to several different entities. When a word is encountered, it is ambiguous which entity it represents until the context is taken into account.
For each entity that represents a word, we need to define whether it is a noun, verb, etc.

This is only a very basic outline but gets the ball rolling.


The language-in data structure
July 15

Introduction

Separate from the AI's core data structure is its language data structure. We can represent the language data structure using a separate textual representation.

Mapping words to entities

We can represent mappings like this:

"dog" -> dog
"cat" -> cat
"house" -> house
"house" -> house_v

Notice how the word "house" maps to two different entities: House the noun and house the verb.

Defining nouns, verbs, etc.

We need to define the entities that words map to as being a noun, verb, etc. We will modify the above representation to do this inline:

"dog" -> dog: noun
"cat" -> cat: noun
"house" -> house: noun
"house" -> house_v: verb

Although our noun/verb/etc. designations fall into the realm of the language-in textual representation, they end up getting applied as is_a relationships in the core data structure.


Asking questions
July 15

A nice thing to have is the ability to query the AI with questions. This allows a statement to be posed and then a question to be asked to determine whether the AI understood the statement.

Parsing a question uses transformations much like parsing a statement. The only difference is that our end goal is a reference to an entity's value rather than an assignment.

For example:

What is my name?

We already have transformations that will get us to:

What is [speaker.first_name]?

Next we apply:

what is {noun} -> $1

Which results in:

[speaker.first_name]?

A built-in rule is that any value followed by a question mark prompts that the value be output:

"Daniel"


Stories
July 15

The basis for interacting with a basic AI is to tell it stories and then to ask questions to evaluate its understanding. I say stories, but in the most trivial sense, a story is just a simple statement about the world. Our first story is as follows:

My name is Daniel

Our question that follows is:

What is my name?

And the expected answer is:

Daniel

More advanced stories might consist of several statements and several questions, each question having an "expected answer".

In this way, stories act as unit tests and provide a method of regression testing. Progress is made by giving an AI the ability to understand new stories without breaking its understanding of previous stories.


Story 2: I am 27
July 15

Story: I am 27
Question: How old am I?
Answer: 27

Transformation 1

I am {#}. -> speaker.age = $1

We introduce a special notation for referring to numbers: {#}

We also introduce the use of a period in the input specification to imply that it represents a whole statement. This prevents it from being used to parse the following: "I am one step away from grounding you for the whole week!".

Transformation 2

How old am I -> speaker.age

For the time being we are using a very simplistic parsing strategy for a statement like this. But this highlights how we can progress in a very iterative manner, starting with simplistic transformations and revisiting them later as needed.

Alternative transformations

The following represents another, more generic, possibility:

How old {noun} -> $1.age

am I -> speaker


Story 3: My wife's name is Meredith
July 15

Story: My wife's name is Meredith
Question: What is my wife's name?
Answer: Meredith

Transformation 1

{noun}'s {noun} -> $1.$2

Application:

wife's name -> wife.first_name

Note: This uses the following word/entity mapping:

"name" -> first_name

Transformation 2

From story 1:

my {noun} -> speaker.$1

Application:

my [wife.name] -> speaker.wife.name

Required knowledge about the world:

Before we can apply this transformation, we need to verify that speaker.wife makes sense. This requires the following knowledge about the world:

speaker is_a person
0.5: person is_a man
0.5: person is_a woman
0.4: man has:0 wife
0.6: man has:1 wife

Transformation 3

From story 1:

{noun} is {word}

Application:

speaker.wife.name = "Meredith"


Story 4: I live in Waterloo
July 15

Story: I live in Waterloo
Question: Where do I live?
Answer: Waterloo

Note: We'll cheat for the time being by using an overly simplied transformation.

Required knowledge about the world

Waterloo is_a city
"Waterloo" -> Waterloo (noun)

Transformation 1

I live in {city} -> speaker.residence.city = $1

Transformation 2

Where do I live -> speaker.residence.location

But we don't know the value of speaker.residence.location, so we're stuck. This highlights the need to be able to derive facts from other related facts. What we need is a rule that says: "If I know a person's city of residence, then I can derive in a more abstract sense where they live".


Rules
July 16

If an AI knows the value of speaker.residence.city, then it should be able to derive the value of speaker.residence.location. ie. Generalize.

This represents a whole area of research: How do we give an AI with some knowledge about the world the ability to deduce other things?

One strategy is to employ rules.

Rules are like transformations in that they have an input specification, called the conditions, and an output specification, called the implication.

Conditions:

$x is_a residence
$x.city = $y

Implication:

$x.location = $y

What this says is that if we have an entity in our data set that is a residence, and its city attribute has a value, then we can deduce more generally the location of that residence as being the same.

Rules are used in reverse, however: We start with the problem, "How can I deduce speaker.residence.location", and search for rules that can imply that information. When we find one, we evaluate whether the we can meet the conditions necessary to make the implication.

Textual representation

rule: $x is_a residence & $x.city = $y -> $x.location = $y

Alternate form:

rule:
  $x is_a residence
  $x.city = $y
  -> $x.location = $y


Story 5: I am 27. What year was I born in?
July 16

Story: I am 27.
Question: What year was I born in?
Answer: 1980 or 1981 (assuming the current year is 2008)

We've already parsed the statement. The new challenge is answering the question, which requires us to derive speaker.birth.date.year from speaker.age. What makes this even more complicated is that the answer could be one of two years, depending on what month and day the person was born in.

Required knowledge about the world

person has_a birth
birth is_a event
event has_a date
date has_a year
date has_a month
date has_a day
"I" -> speaker
"born" -> birth: noun

Transformation 1

what {fragment} -> $1

The word "what" doesn't do a whole lot other than make it clear to the reader that a question is being asked. We can drop it. We introduce the term {fragment} to mean "list of words".

Application:

What year was I born in? -> year was I born in?

Transformation 2

{noun} was {noun} {verb} {participle} -> $2.(($3 $4 $1))

Application:

year was I born in -> speaker.((born in year))

Here we introduce another intermediate form that is partly represented in English. The two round brackets indicate that the English fragment inside has yet to be evaluated, but represents a property of speaker.

Transformation 3

{verb} in {year} -> $1.date.$2

Application:

speaker.((born in year)) -> speaker.birth.date.year

Of interest here is the mapping of the word "born", which is a verb, to the noun birth.

Rules

Although we can transform the question to speaker.birth.date.year, we don't know its value. We only know speaker.age, so we'll need to use rules to derive speaker.birth.date.year.

0.5: rule:
  $x.age = $y
  -> $x.birth.date.year = current_date.year - $x.age

0.5: rule:
  $x.age = $y
  -> $x.birth.date.year = current_date.year - $x.age - 1

Aside from introducing some math into our rules, what is interesting here is that determining a person's birth year given their age isn't as simple as subtracting their age from the current year.

To accommodate this, we also introduce a sort of weight with each rule, which says that there is roughly a 50% chance that the rule can be applied. Thus, when the AI tries to derive speaker.birth.date.year, it will find both rules, and because it will be ambiguous which rule to apply, it must report both answers as possibilities.


current_date
July 17

The introduction of the current_date entity is especially interesting. In people, this entity might be much like any other, but in designing an AI, it's tempting to wire it into the computer's clock.

It's strange to think what it would be like to know the time intuitively.


Story 6: I was born in Canada
July 17

Story: I was born in Canada.
Question: What country was I born in?
Answer: Canada

Required knowledge about the world

person has_a birth
birth is_a event
event has_a location
0.9: location has_a country
"Canada" -> Canada: noun
Canada is_a country
country is_a location
"born" -> birth: noun

Parsing the statement

{noun} was {fragment} -> $1.(($2))
I was born in Canada -> speaker.((born in Canada))

{verb} in {location} -> $1.location = $2
speaker.((born in Canada)) -> speaker.birth.location = Canada

Parsing the question

what {fragment}? -> $1
What country was I born in? -> country was I born in

{noun} was {noun} {verb} {participle} -> $2.(($3 $4 $1))
country was I born in -> speaker.((born in country))

{verb} in {location} -> $1.$2
speaker.((born in country)) -> speaker.birth.country

Comments

Of interest is the following transformations:

{verb} in {location} -> $1.location = $2
{verb} in {location} -> $1.$2

The first is used to parse a statement, the second to answer a question. What makes the second transformation more interesting is that it takes an intermediate form as its input. In this example, the fragment "born in country" wasn't a part of the original question and isn't quite valid English.

Also of interest is that we set speaker.birth.country rather than speaker.birth.location.country. Under the covers, the engine needs to search the has_a graph to determine which child of birth has a country node. The successful search path is:

birth is_a event
event has_a location
location has_a country


AI and search engines
July 26

One of the most sensible uses of an AI that can read information, update a data structure to reflect that information, and then answer questions about that information, is a search engine. In fact, that's exactly what a search engine is, the difference being that today the internal representation that search engines use doesn't contain much meaning.

What this brings to mind is that the Internet is an important aspect of AI: It serves as the raw knowledge base for an AI to read to learn about the world. Now, this has some serious limitations given that anyone can write whatever they want on the 'Net, but overall I think the Internet will serve as one of the primary means for AI's learning about the world.

Let's fast forward into the future and ask Google some questions that will test its understanding of the world:

Search: What is Bill Clinton's birthday?
Answer: August 19, 1946

Search: What year did World War II end?
Answer: 1945

Search: What month comes after November?
Answer: December

Search: How many feet are there in a yard?
Answer: 3

Search: What day of the week was it when man first landed on the moon?
Answer: ...


Powerset.com
July 26

A website that came to my attention a couple of months ago is www.powerset.com. This search engine has tried to make advances like I've just described. For example, when I search for "What year did World War II end?", it comes back with:

World War II: End date
September 2, 1945 (62 years ago)


Not bad. That's definitely an advance. What makes this especially impressive is that the search engine didn't suggest an article, but gave me the answer to my question directly as a value.

While this search engine is impressive, their implementation isn't very exhaustive. Other simple questions such as "How many feet are there in a yard?" don't return a specific value. As search engines continue to advance in the coming years, more and more of these questions will result in specific answers, and to push the boundaries, these search engines will need better language parsing and better internal representations of the world.

Here's a list of queries that Powerset was able to answer:

Ok: Bill Clinton's birthday
Fails: What is Bill Clinton's birthday?
Ok: What year was Bill Clinton born?
Ok: Bill Clinton's daughter
Ok: Who is Bill Clinton's daughter?
Fails: What is the name of Bill Clinton's daughter?
Ok: Who is Bill Clinton's wife?
Fails: Who was the first president of the United States?
Fails: First president of the United States
Fails: What year did man first go to the moon?
Ok: What is the capital city of Florida?
Ok: What is the capital city of Belgium?
Partial: Who invented the telephone?
Ok: What is the population of Mexico?
Fails: How many states are there in the United States?



Design VS Implementation
July 26

Something that's tempting for programmers is to, well, program. And so when we're in a research and design mode, there becomes a tension between not wanting to slow down the research by writing code, but also not wanting to let theoretical ideas get too far down the road before they're implemented and tested.

If anything, I think that the temptation is to implement algorithms before necessary and so slow down research. Something that I've really enjoyed designing a new programming language is to let my imagination have free reign and not concern myself with the implementation, and I think that that can be a very good strategy for encouraging creativity and innovation.

That being said, I'm interested in implementing the AI research that I've completed thus far. This won't necessarily serve as a solid platform to keep building on, but it should provide a lot of insight into things that I've perhaps overlooked.


Implementation thoughts
July 26

I'd like to use C#
I'd like to have a web interface so that other can play with it
The basic web UI would simply allow statements and questions to be posed
A more advanced UI would allow the user to edit the AI's internal data structures
The input to the program will consist of the following:
A file that will represent the AI's knowledge of the world, consisting of entities, relationships, values, rules, etc.
A file that will represent the AI's mapping of words to entities
A file that will contain the linguistic transformations
A set of stories. Each story will consist of one or more statements, followed by one or more questions, each question having an expected answer.
For now, the data files will be parsed and read into memory at the start of the program's execution. Perhaps in the future they would be stored in a database.
I'll use NUnit to execute each of the stories to make sure that as new stories are added, old ones aren't broken.
I may implement a UI to make it easier to represent what the engine is doing. A big part of a successful implementation would be developing debugging tools... however, I don't know how much of that I want to get into at this point. I will likely create them as needed.

As for what to name the C# project, I'll go with Plato for now.

To make the implementation manageable, I'll break it down into "Exercises" that will iteratively get me closer to my goal.


Exercise 1: Publishing an ASP.NET website
July 27

Summary

Create an ASP.NET website with Visual Studio and publish it to the web.

Solution

Unfortunately, netfirms.ca would charge me an extra $10/month for ASP.NET hosting
Registered domain name http://platoai.com for $11/year
Signed up with http://webhostforasp.net for $4.50/month



Exercise 2: Entities, Relationships, and Values
July 27

Summary

Design an Entity class
An entity can have an id
Implement Set method which allows a property of an entity to be given a value. For the time being, implement words as entities whos IDs are surrounded by single quotes. ex. 'Daniel'
Implement a Get method which returns the value of a entity's property
Implement an AddRelationship method
Design a Relationship class as a subclass of Entity
Design a Brain class which allows entities to be created via CreateEntity and relationships to be created using CreateEntity. Make the Entity and Relationship classes internal to this class and expose their functionality with the interfaces IEntity and IRelationship.
Enforce entity IDs, if specified, to be unique.
Pass the following test case:
[Test]
public void Test1()
{
   IEntity is_a = Brain.CreateEntity("is_a");
   IEntity has_a = Brain.CreateEntity("has_a");

   IEntity Daniel = Brain.CreateEntity("Daniel");
   IEntity person = Brain.CreateEntity("person");
   Brain.CreateRelationship(Daniel, is_a, person);
   IEntity first_name = Brain.CreateEntity("first_name");
   Brain.CreateRelationship(person, has_a, first_name);
   IEntity wDaniel = Brain.CreateEntity("'Daniel'");
   Daniel.Set(first_name, wDaniel);

   Assert.AreEqual("'Daniel'", Daniel.Get(first_name).Id);
}}

Solution

Click here


Exercise 3: is_a and has_a
July 29

Summary

Implement an IsA function for the Entity class which will perform a depth-first search on is_a relationships to determine whether an entity has a direct or indirect is_a relationship with another entity.
Implement a HasA function for the Entity class which will perform a depth-first search on the is_a relationship followed by a depth of one search of has_a.
Throw an exception from Entity.Get as well as Entity.Set if the requested property does not exist.

Test case:

Click here

Solution

Click here


has_a: Properties VS possessions
July 29

As I was working on an implementation for exercise 3 tonight, I realized that there is a difference between has_a in the sense of a property VS has_a in terms of a "possession".

Example of a property:

Daniel has_a nose

Example of a possession:

Daniel has_a dog

Where this difference comes into play is in the implementation of the HasA function, and can be highlighted by the following:

Test case 1:

Daniel has_a human_body
human_body has_a human_head
human_head has_a nose

The property Daniel.nose makes sense because each of the above has_a relationships describe properties, or parts.

Test case 2:

Daniel has_a dog
dog has_a tail

The property Daniel.tail doesn't make sense because the relationship Daniel has_a dog is describing dog as a possession of Daniel, rather than as a property.


danielbigham.ca now supports rss
July 29

After long last, I set up rss for my website. Each blog topic represents its own 'feed' which can be subscribed to with any modern browser or RSS reader.

I don't use rss much myself because most of the website that I check on a daily basis are such that I want to see the stories surrounded by the layout of the website, and they are updated frequently enough that I don't want to be notified each time something new is posted.

I guess where I should use rss is to keep tabs on sites that only update once every few days and I want a notification to go look for the new content.


Exercise 4: Parsing
July 30

Summary

Write a simple parser to parse input consisting of one or more lines according to the following grammar. The parser should then update the brain instance accordingly.

Define an entity:

{entity}

Define a relationship:

{entity1} {relation} {entity1}

Specify a value assignment:

{entity1}.{entity2} = {entity3}
{entity1}.{entity2}.{entity3} = {entity4}
etc.

Test cases

Click here

Solution

Click here

Web UI

Click here


Exercise 5: Language transformation classes
July 30

Summary

Implement classes to represent language transformations.

Specification

Click here

Interfaces:

IToken: A token.
IInputToken: Tokens that may be contained by a transformation's input specification.
IPropertyPathToken: Tokens that may be contained by a property, as part of its path.
IOutputPropertyPathToken: Tokens that may be contained by a property which are themselves contained in a transformation's output specification. (Permits variable tokens)

Required classes:

Transformation
TransInput
TransOutput
Token
LiteralToken
EntityTypeToken
EntityToken
PropertyToken
VariableToken
Fragment
FragmentToken

Enumerations:

TokenType: Literal, EntityType, Entity, Fragment, Variable, Property

Test cases

Click here

Solution

Click here


Algorithm: Applying transformations
July 31

For an introduction to these concepts, see: An example of language parsing

The language parsing framework that has been laid out thus far uses a set of transformations. Say this set is of size N, where N is a large number. Perhaps N = 1,000,000.

If our reduction algorithm, that is, the algorithm that reduces a sentence to our internal representation, is O(N^2), then it's not feasible. We don't want to spend on the order of 1,000,000,000,000 (a trillion) operations just to parse a simple sentence.

Even if our algorithm is O(N), we're spending on the order of a million or perhaps a billion operations to parse a simple sentence. This is feasible, but not desirable.

My intuition is that the algorithm should be at most O(log(N)), but more realistically constant time on the size of the transformation set. What this hints at is the use of a hash or tree to index the transformation set.

For example, the input specification:

{person} is {#}

... could be indexed on the first token, followed by the second token, followed by the third token. Determining whether a transformation exists for this input specification, then, involves a lookup in an index.

The implication of this is that the first step in processing a sentence is to break each of its words down into all of the possible entities that they might represent, and then permuting those to create a long list of possible interpretations. This list could be extremely long, perhaps even O(W^5) or O(W^6), and hints at the need to process long sentences in chunks/fragments rather than as a whole. (15^6 = 11,390,625)

One curious fact about language is that people don't interpret statements as a whole, but rather from beginning to end, parsing as we go. When we listen to someone speak, we get served one word at a time and must assemble the meaning iteratively. Perhaps this is a hint to the parsing strategy that should be used. Could it be that human languages, at there very core, are designed to be parsed in an iterative manner from beginning to end?


platoai.com
August 1

Tonight I uploaded my first demo program to platoai.com, which is an ASP.NET account I've purchased from webhostforasp.net. It's an implementation of some core concepts I've been working on with regards to AI and language parsing.

As I complete further exercises, I'll upload additional programs.

Plato Version 1: Exercise 4


Exercise 6: Language transformation parser
August 5

Summary

With classes to represent language transformations, we can now write a parser.

Test cases

Click here

Solution

Click here


Exercise 7: Mapping words to entities
August 5

Summary

Before we can transform language, we need to convert words to entities. We can do this my defining word-to-entity mappings. For example:

'I' -> speaker: noun

What this says is that the word "I" might refer to the speaker entity, which is a noun. The specification that speaker is a noun is a convenience feature and simply ensures that the relationship speaker is_a noun exists. Any entity can be used in place of noun. For example: adjective, verb, adverb, etc.

Implement a representation and parser. Also implement a web UI that allows the user to specify one or more word-to-entity mappings to be parsed.

In addition to these mappings, allow the user to specify a word, which will return all of the entities that that word maps to. Implement this feature with classes called EntityTree and EntityTreeNode which will represent the tree of is_a relationships that stem from a word. The tree should be doubly linked so that, given a node, you can get either its children or its parent, and it should prevent duplicate tree nodes. The tree will contain a method named GetList which will traverse the tree and return a flat list of EntityTreeNodes.

Test cases

Click here

Solution

Click here

Web UI

Click here


Exercise 8: Language transformations algorithm
August 5

Summary

Create a web application to allow a user to type a statement or question and have the system respond.

If at any time the system doesn't understand, no response should be given. (Only debug output: See below)

If a question is asked, and the system understands, the system should respond with the value that answers the question. If the system understands but doesn't know the answer, it should respond with "I don't know [...]", and fill in the square brackets with the path to the requested property. For example: "I don't know [speaker.age]". At this time, only questions that can be answered with a value are permitted.

If a statement is entered, and the system understands, it should update its data structure and respond with "Ok: ...", and fill in the appropriate assignment. For example: "Ok: speaker.age = 27".

A link will be provided named "Show details", which will show a hidden pane. This pane will show debug output, which is described later on.

A link will be provided to a second page entitled "Brain state" which will contain three large text areas. The first text area will contain a dump of the brain's core state. The text area can be edited and a "Save" button clicked to update the brain state. Likewise, there should be text areas for editing the word-to-entity mappings as well as the transformations.

Applying transformations

The core of this exercise is to create an algorithm that, given a statement or question, determines which sequence of transformations to apply to end up with a representation that can be directly interpreted.

The first step is to transform a list of words into a list of EntityTrees. Each tree represents the is_a relationships of the word it corresponds to. Rather than storing the tree, however, it should be converted into a flat list using the GetList() method. We'll call this list of lists the entityTypeLists structure.

The next step is to permute the lists to arrive at all possible combinations of entity types. For example, taking the first entity type from each list arrives at one combination. Taking the second entity type from the first list and first entity type from the rest of the lists arrives at another combination, etc. We'll call this the permutedEntityTypes list.

The core algorithm iterates over the permutedEntityTypes list. Each permutation of entity types consists of a list of entity types. Taking the first two entity types, we determine which transformations can be applied. To determine this, we need an indexed structure that allows us to get all applicable transformations given two sequential entity types. For example, we could use a two-deep hash.

Once we have determined what transformations we can apply, we go ahead and apply each one separately. (If there are no transformations to apply, then we need to look for transformations that can be applied starting at the second token, etc.) For each application, we have to ask ourselves whether the resultant fragment is complete. ie. In a format that can be directly interpreted. If it is, we add it to a list of interpretations. If it is not, we add it back onto the permutedEntityTypes stack.

Note that because transformations take Fragments as their input and produce Fragments as their output, the permutedEntityTypes list needs to be a list of Fragments. Thus we'll call it the unresolvedFragments list.

The class that orchestrates this process should be called TransformationSearch.

Debug output

The debug output will be displayed if the "View details" link is clicked. The primary purpose of the debug output is to see what happened when unexpected response was given.

For each word, the list of possible entities will be listed. Format:
word1: entity1, entity2, entity3, etc.
etc.

The initial list of permuted fragments will be listed.

If a fragment is clicked, the list of applicable transformations will be listed.

If one of the applicable transformations is clicked, the resultant fragments and outputs will be listed.

This is effectively a recursive process that allows the user to drill down into the search space.

Test cases

Click here

Solution

Click here

Web UI

Click here


Better brain scans
August 7

An important tool in the quest to understand the human brain is our ability to image it. You might think that today's fancy MRI scans are impressive, but I'll bet that the imagery they'll be producing in 2100 will make today's imagery look like the stone age.

Today I noticed this lovely link on Gizmodo.

Here's a nice video too.


Romona on KurzweilAI.net
August 13

I was just playing with Ramona on KurzweilAI.net. Gotta love it:

Daniel: What is my name?
Ramona: He was killed in an auto accident the day before I was born.

hmm. That's a pretty specific and pretty wrong answer!


Animating digitally rendered people
August 20

http://gizmodo.com/5039308/emily-isnt-real-but-would-you-have-guessed

Checkout this amazing video. It's weird to think that the day is coming when they'll be able to animate digitally rendered people so that you truly can't tell whether what you're looking at is real or synthetic. Eek!


Interesting language parsing article on CodePlex
August 21

http://www.codeproject.com/KB/recipes/englishparsing.aspx

This is quite an interesting looking article about natural language parsing.


Penn Treebank Project
August 21

Here's an interesting project, called the Penn Treebank Project.

The Penn Treebank Project annotates naturally-occuring text for linguistic structure. Most notably, we produce skeletal parses showing rough syntactic and semantic information -- a bank of linguistic trees. We also annotate text with  part-of-speech tags, and for the Switchboard corpus of telephone conversations,  dysfluency annotation. We are located in the  LINC Laboratory of the Computer and Information Science Department at the University of Pennsylvania.

Also:

CC    Coordinating conjunction  RP    Particle
CD    Cardinal number           SYM   Symbol
DT    Determiner                TO    to
EX    Existential there         UH    Interjection
FW    Foreign word              VB    Verb, base form
IN    Preposition/subordinate   VBD   Verb, past tense
      conjunction
JJ    Adjective                 VBG   Verb, gerund/present
                                      participle
JJR   Adjective, comparative    VBN   Verb, past participle
JJS   Adjective, superlative    VBP   Verb, non-3rd
                                      ps. sing. present
LS    List item marker          VBZ   Verb, 3rd ps. sing. present
MD    Modal                     WDT   wh-determiner
NN    Noun, singular or mass    WP    wh-pronoun
NNP   Proper noun, singular     WP$   Possessive wh-pronoun
NNPS  Proper noun, plural       WRB   wh-adverb
NNS   Noun, plural              ``    Left open double quote
PDT   Predeterminer             ,     Comma
POS   Possessive ending         ''    Right close double quote
PRP   Personal pronoun          .     Sentence-final punctuation
PRP$  Possessive pronoun        :     Colon, semi-colon
RB    Adverb                    $     Dollar sign
RBR   Adverb, comparative       #     Pound sign
RBS   Adverb, superlative       -LRB- Left parenthesis *
                                -RRB- Right parenthesis *

* The Penn Treebank uses the ( and ) symbols,
  but these are used elsewhere by the OpenNLP parser.

This is all stuff I need to get my head in to.


Principle of maximum entropy
August 21

Something to investigate is the principle of maximum entropy which the CodePlex project relies on.


Stanford Parser
August 21

Today I came across the Stanford Natural Language Parsing Group's website, and was delighted to find that they have put their language parser online.

Click here

This should be a great learning tool.


Finished exercise 8
August 28

This past month, I have been making progress by defining exercises for myself much like one would be assigned in a university course. This week, I finished my 8th exercise, which is a milestone because it serves as an end-to-end implementation of the basic design that I've been researching.

It allows the following things to be defined:

Entities
Relationships between entities
Value assignments
Word-to-entity mappings
Linguistic transformations that allow statements to be transformed into value assignments and questions to be transformed into value queries.
A transformation search engine to make it all come to life

The source code is written in C# and can be found here.

The web UI can be found here:

http://www.platoai.com/plato1.8/


www.a-i.com
August 30

I was doing some Googling tonight and have come across the most impressive bot yet at www.a-i.com:

User says:
My name is Daniel

Plato says:
Hello Daniel. Nice to meet you .

User says:
What is my name?

Plato says:
Daniel. Your name is Daniel.

User says:
I am 27

Plato says:
I see. You're 27 years old.

User says:
How old am I?

Plato says:
You are 27.

User says:
My age is 27.

Plato says:
I'll try to remember that.

User says:
My age is 28.

Plato says:
I'll try to remember that.

User says:
How old am I?

Plato says:
You are 28.

User says:
What is my age?

Plato says:
You are 28.

User says:
How old are you?

Plato says:
Well, my basic 'Alan' personality was created on Jan 1, 2000. But as for myself - I'm just a few weeks old.

User says:
What is your age?

Plato says:
I already told you that, Daniel.

User says:
My wife's name is Meredith.

Plato says:
I'll try to remember that.

User says:
What is my wife's name?

Plato says:
Meredith.

User says:
My last name is Bigham.

Plato says:
So, Daniel Bigham, what would you like to talk about?

User says:
What is my last name?

Plato says:
I do have the tendency to forget, but I still remember your last name: Smith. Sorry... Bigham.

Besides some of the "cute" responses, I'm impressed! For once a bot that can interpret statements, create a mental model of the world, and then play back that knowledge when asked questions.

From their about page: Ai Research is headed by Jack Dunietz, a 23-year veteran of the technology industry, who has established several successful publicly traded firms. The Ai team consists of scientists, researchers, engineers, and language specialists. All work together to nurture Ai's child machine, and to ensure that it will successfully pass the Turing Test within the next 10 years.

Impressive. But what I find odd about these smart people is that they make statements such as "to ensure that it will successfully pass the Turing Test within the next 10 years"? Do they really think that, or is it something they tell themselves or tell investors to keep up interest? Odd.

They've still got some work to do:

User says:
My eye color is blue.

Plato says:
Your eye color is blue?

User says:
yes

Plato says:
Agreement is the seed of salvation.

User says:
What is my eye color?

Plato says:
I will have to think about it.

Our research plan is based on an iterative cycle, designed to improve the language skills of the system with each software update ("brain upgrade").

Excellent approach, I like it.


Speaking
September 6

Although I haven't put much thought into the language out problem (transforming ideas into written statements), something occurred to me today: Any time a person, or an AI, goes to say something, they have the opportunity to try and parse what they're about to say using their language in layer. If the resulting meaning is the same or similar enough to the idea they were trying to express, they have the green light to go ahead and speak. Otherwise, the speaker needs to go back to the drawing board and figure out how else to express their idea.


Exercise 9: Relationships with properties
September 6

Summary

The core objective of exercise 9 is to allow relationships to be defined that themselves have parameters. For example:

person has(count:0+) sister

This says that a person can have 0 or more sisters. It defines the base relationship person has sister and then attaches the property count = 0+ to that relationship. Another way it could be notated:

(person has sister).count = 0

A transformation's output specification must support this new notation, which may contain variables. For example:

{noun} {has_have} {number} {noun} -> $1 has(count:$3) $4

A secondary objective is to ensure that entities can have not only properties, but properties of properties. For example:

speaker.eye.color = blue

Test cases

Click here

Solution

Click here

Web UI

Click here


Exercise 9 thoughts
September 6

It feels good to have completed the next exercise after the long haul implementing exercise 8... the hope was that I wouldn't burn out and stall making more progress, so this is a good sign.

Adding parametric relationships as a transformation output was fairly challenging. It made me realize that I needed to generalize the transformation process so that, regardless of the output specification's structure, the same logic could be applied. Hopefully this will pay dividends as I continue to add new transformation types.


Numbers and the human brain
September 6

Something that I realized while working on exercise 9, is that it is quite mysterious how the human brain represents and deals with numbers, or more generally, quantities. While concepts and entities are fairly discrete things, numbers are more slippery. While they are discrete, to a certain extent, there are an infinite number of them. ie. You can't create an entity in the brain to represent each one. How do you create a kind of generic entity in the brain to represent any given number?

I have cheated in a sense, since I have implemented numbers using the computer's internal representation. I think this is the most sensible thing to do: Why lament on why the computer makes some aspects of intelligence easier?!

On a related note, I am pondering how to create a scheme to transform numbers like "five hundred and twenty three" into the brain's internal representation.


Parsing numbers
September 6

After some thought, here is a strategy for parsing numbers.

Step 1: Word mappings and entity types

The following word mappings and entity types are required:

'one' -> 1: digit
'two' -> 2: digit
'three' -> 3: digit
'four' -> 4: digit
'five' -> 5: digit
'six' -> 6: digit
'seven' -> 7: digit
'eight' -> 8: digit
'nine' -> 9: digit

'eleven' -> 11: teen_number
'twelve' -> 12: teen_number
'thirteen' -> 13: teen_number
'fourteen' -> 14: teen_number
'fifteen' -> 15: teen_number
'sixteen' -> 16: teen_number
'seventeen' -> 17: teen_number
'eighteen' -> 18: teen_number
'nineteen' -> 19: teen_number

'twenty' -> 20: group_of_ten
'thirty' -> 30: group_of_ten
'forty' -> 40: group_of_ten
'fifty' -> 50: group_of_ten
'sixty' -> 60: group_of_ten
'seventy' -> 70: group_of_ten
'eighty' -> 80: group_of_ten
'ninety' -> 90: group_of_ten

'hundred' -> 100: multiplier
'thousand' -> 1000: multiplier
'million' -> 1000000: multiplier
'billion' -> 1000000000: multiplier
'trillion' -> 1000000000000: multiplier
'quadrillion' -> 1000000000000000: multiplier

number_part
digit is_a number_part
teen_number is_a number_part
group_of_ten is_a number_part
100+_number_part is_a number_part
100+_number_part is_a number

Step 2: New transformation type

The first step is to introduce a new transformation type which evaluates a numerical formula. For example:

{group_of_ten} {digit} -> # $1 + $2

The # prefix indicates that the transformation's output specification is a numeric formula.

In addition, it would be helpful to be able to specify as a part of any transformation, what the result's entity type should be considered. For example:

{group_of_ten} {digit} -> # $1 + $2 (number_part)

Step 3: Transformations

{group_of_ten} {digit} -> # $1 + $2 (number_part)
{number_part} {multiplier} -> # $1 * $2 (100+_number_part)
{multiplier} {number_part} -> # $1 + $2 (100+_number_part)
{100+_number_part} {100+_number_part} -> # $1 + $2 (100+_number_part)
{100+_number_part} {number_part} -> # $1 + $2
{100+_number_part} and {number_part} -> # $1 + $2
{number} + {number} -> # $1 + $2
{number} * {number} -> # $1 * $2


IsA Trees
September 8

Something that I've noticed is that IsA trees, which are directed graphs that describe how an entity relates to other entities via the is_a relationship, are quite interesting and seem to be a vital part of constructing an AI. It will be interesting to see how my use of this structure evolves over time.


Exercise 10: Parsing numbers
September 9

Summary

This exercise involves introducing yet another transformation type, the numeric transformation, which allows numbers to be added or multiplied. It is to be used to allow numbers, written in long form, to be parsed into their representative integers.

For example, "one thousand one hundred and eleven" is parsed to become 1111.

See this post for more details.

Test cases

Click here

Solution

Click here

Web UI

Click here


Exercise 11: Drawing IsA trees
September 9

Summary

Enhance the view details mode of the web application to support drawing graphical IsA trees. Accomplish this by using GraphViz to render a PNG.

Solution

C# code
Perl code

Web UI

Click here

NOTE: The graph drawing component will only work if my home PC is turned on.




Deriving facts using rules
September 11

The next area of research I'd like to explore is that of deriving facts about the world using rules.

Rules consist of two parts:

1.A list of conditions that must be met for the rule to apply.

2.A conclusion or implication, which says something is true if all of the conditions are true.

What makes rules useful is that any part of a condition can left undefined and represented by a variable, which is then used in another condition or the conclusion.

For example:

Conditions:

1.A person's first name is $1
2.A person's last name is $2

Conclusion:

The person's full name is '$1 $2'.

This can notated as follows:

rule:
  $1 is_a person
  $1.first_name = $2
  $1.last_name = $3
  -> $1.full_name = '$1 $2'

Some more examples:

rule:
  $1 on_top_of $2
  -> ! $2 on_top_of $1

rule:
  $1 is_a person
  $1 born_in Canada
  -> $1 is_a Canadian

rule:
  $1 is_a person
  $2 is_a person
  $1.birth_date > $2.birth_date
  $1 older_than $2

etc.

Rules come in very handy when trying to answer questions. For instance, consider the following story:

My first name is Daniel
My last name is Bigham
What is my full name?

The AI wouldn't know the value of speaker.full_name implicitly because it was never told its value. But it can derive that value given what it knows.

There are three primary cases where rules can be applied:

1.When the AI is told something, it can search for rules for which what it was told matches a condition of the rule. Finding these rules, it can then check whether it has all the necessary information to indicate that the conclusion is true, and if it is, it can update its brain state with the derived information in addition to what it was specifically told.

In the example above, an AI could apply speaker.full_name = 'Daniel Bigham' as soon as the second statement was made, long before the question was even posed.

2.When the AI is asked something. In other words, when asked for the speaker's full name, the AI could search for rules whose conclusions match the information it is looking for, and then determine whether all of the conditions are met.

3.In a sort of recursive scenario, checking to see whether a rule can be applied may involve using other values that are not explictly known. This can trigger a second rule search to see whether that value can be derived, and so on. What you get is a kind of search tree. Obviously you wouldn't want a scenario where the AI goes off and spends 5 minutes exploring a huge search tree, so the depth to which it explores would need to be set to a reasonable number.


Exercise 12: Rule class design
September 11

Summary

The objective of this exercise is to complete a class design for rules as described in an earlier post.

A rule can have one or more conditions.
A rule has exactly one conclusion.
A rule condition is basically the same as a transformation output specification, whereby any token can be replaced with a numbered variable.
Likewise, a rule condition is basically the same as a transformation output specification.
The rule conclusion can only use variables that are referenced in one or more rule conditions.
A rule has a type: A rule is either an "assignment rule" or a "relationship rule".



Exercise 13: Rule parser
September 11

Summary

Write a parser to allow rules to be added to the brain. Add a hash to the brain called "rules" that indexes rules on the following three items:

1.Their type
2.Their first non-variable entity type
3.Their second non-variable entity type

Test cases

Click here


Exercise 14: Implement SolvePathVariable
September 11

Summary

Inputs:

A parent entity, which might be null.
A parent relation type which describes the mandatory relationship between the parent entity and the variable entity that is being solved for.
A child entity, which might be null.
A child relation type which describes the mandatory relationship between the the variable entity that is being solved for and the child entity.

Outputs:

All entities that satisfy the conditions.

To accomplish this, new indices and their respective look up functions will need to be added to the Entity and Brain classes:

New index:

Given an entity Y and relation type R, what entities X satisfy the relationship X R Y?

New lookup functions:

1.GetInverseHasList: Given an entity X, which entities have a has_a relationship with X?

2.GetInverseIsAList: Given an entity X, which entities have a is_a relationship with X? (Actually, it appears that this one might not be needed)


Exercise 15: Implement SolveAssignmentVariable and SolveRelationshipVariable
September 12

SolveAssignmentVariable Summary

If the variable is within the path, use SolvePathVariable. Once solved, if the path is free of variables and the value is known, ensure the value matches.

If the variable is the value of the assignment, then simply look it up.

SolveRelationshipVariable Summary

If there is only one variable, then use the appropriate entity's relation index to return all possibilities.

If there are two variables, then this subroutine should not have been called.

Input:

An assignment and a position within that assignment.

Output:

A list of entities.


Exercise 16: Implement SolveCondition
September 12

Summary

Inputs:

A rule condition with 1 or more variable tokens.
A hash which defines the values of zero or more variables.
The current token position.

Outputs:

A list of hashes. Each hash represents a set of variable values that satisfies the condition.

Algorithm:

Tokens are processed from left to right.
Non-variable tokens are skipped over.
Each variable which is found to exist in the variable hash should have its value substituted.
Each unknown variable token needs to be solved. This list of possible entities is used to permute the variable hash into N new hashes. For each hash, the function recurses.
This subroutine should not be called for relationships with two variables.
The output of a recurse is a list of hashes. Each of these hashes represents part of the final output of the function.



Exercise 17: Implement SolveConditions
September 12

Summary

Input:

A rule, with 1 or more conditions.
A hash to be used to store variable values. Initially it is only populated with the values of variables that could be determined by comparing the rule conclusion to what is to be derived.

Output:

A list of hashes. Each hash represents a complete set of variable values that satisfy all conditions.

Step 1:

Iteratively solve all conditions that have only variables with known values. If any of the conditions fails, then return an empty list.

Step 2:

Iteratively solve all conditions with a single unresolved variable using SolveCondition. For each hash returned from SolveCondition, recurse. Add each hash returned from our recurse to our output.

Step 3:

Iteratively solve all conditions with two unresolved variables. If a condition fails to solve, then try the next one, so on and so forth. If none of the two-variable conditions resolve, then return an empty list.


Exercise 18: Completed rule implementation
September 12

Implement SolveRule and Derive:

SolveRule

SolveRule Input:

A rule.
The derivation target: An ITransOutput that represents what is trying to be derived.

SolveRule Output:

A list of entities. For relationship rules, this will be either null to represent failure or the true entity to represent success. For assignment rules, this will be a list of entities that satisfy the value of the assignment.

SolveRule Algorithm:

Run SolveConditions.
If the resultant list is empty, return an empty list.
If the resultant list isn't empty and the conclusion type is a relation, then return the true entity.
If the resultant list isn't empty and the conclusion type is a property value, then process each result iteratively. Use the conclusion's value specifier to create the result. This can be accomplished by implementing an Eval function for IValueToken which takes a dictionary of variable values, substitutes them, and returns the resultant entity.

For each result:

Add the resultant token to the output list.

Derive

Derive is called with a PropertyToken or RelationshipToken.

Algorithm:

Look up rules that might be able to derive the value/relationship.
For each potential rule, determine whether it is really a match. In the process, determine the value of any variables in the conclusion by doing a token-by-token comparison of the conclusion with the derivation target. Implement this functionality as RuleSolver.Match.
Attempt to solve any rules that match.
If any rules can be solved, return the resulting value, which will be an IEntity for derived property values or a bool for derived relationships.

Test Cases

...

Web UI

Add a new text area to the web UI that allows rules to be specified.
If the answer to a user's question is a property whose value is not known, try to derive it. If it can be derived, return the result. Always indicate whether a value was derived.



Milestone: Working implementation of rules
September 16

As of today I now having a working implementation of rules. There are still some details to be fleshed out, but the value derivation engine works.

http://www.platoai.com/plato1.18/

This represents a major milestone for a fledgling AI project: The ability to use rules to derive facts about the world using what it has already been told.

The two working examples are:

"My first name is Daniel"
"My last name is Bigham"
"What is my full name?"

And:

"I was born in Canada"
"What is my nationality?"

The next thing on my agenda is to be able to use math in rules. For example:

"I have two sisters"
"I don't have a brother"
"How many siblings do I have?"

rule:
  $1 is_a person
  $1 has(count:$2) sister
  $1 has(count:$3) brother
  $count = # $2 + $3
  -> $1 has(count:$count) sibling


Verbs
September 21

Something that I've been inconclusive about is how to represent verbs using an entity-relationship model, but I've realized that it's perhaps not as awkward as I thought. For example:

"I like running" -> speaker likes running
"I'm going to the store" -> speaker going_to store
"I'm driving on the 401" -> speaker driving_on highway401

The verb and preposition get fused together to represent the relation -- the thing that relates the two entities.


Current inputs and future inputs
September 24

Here I'm brainstorming a series of inputs that I'd like to work towards supporting. These are all statements about me.

Currently supported statements, and their core representation:

My first name is Daniel
speaker.first_name = 'Daniel' 
My middle name is James
speaker.middle_name = 'James' 
My last name is Bigham
speaker.last_name = 'Bigham'
speaker.full_name = 'Daniel Bigham' (derived) 
I am a Christian
speaker is_a Christian 
I work at Navtech
speaker.employer = NavtechInc 
I am a man
speaker is_a man 
I am 27
speaker.age = 27 
My birth month is November
speaker.birth.date.month = November 
My birth year is 1980
speaker.birth.date.year = 1980 
My wife's name is Meredith
speaker.wife.first_name = 'Meredith' 
I live in Waterloo
speaker.home.city = Waterloo 
I live in Canada
speaker.home.country = Canada 
I live in Ontario
speaker.home.province = Ontario 
My eye color is blue
speaker.eye.color = blue 
My hair color is strawberry blond
speaker.hair.color = strawberry_blond

Future:

My birthday is Nov 20
Requires parsing 'Nov 20' into a structure that contains a day-of-month and a month
speaker.birth.date = $d, $d is_a date, $d.month = November, $d.day = 20

...


"On Intelligence"
September 24

I've done a little more poking around recently to see what else is out there, and today I came across what looks to be a real winner. The book On Intelligence, surprisingly "old" book publish in 2004, was written by Palm's inventor Jeff Hawkins. From what I've read, Jeff's ideas can be summarized as Hierarchical Temporal Memory. I remember blogging about a news article that made reference to that term, but I didn't delve into reading more at the time.

I couldn't resist: I've ordered the book off of ebay.



Intelligence VS Learning
September 24

Something that I'm debating in my mind is the connection between intelligence and learning. The reason I ask is that many of the comprehensive AI efforts out there are based on producing a system that can learn as a means for achieving intelligence.

My intuition, on the other hand, is that this approach is like trying to run before you can walk. (I'm not convinced of this, but it's a premonition)

I feel that it should be easier to create a system that is intelligent, but has very limited learning capacity. A system such as this would surely require a lot of hand crafting, which would no doubt be very expensive, but ultimately much more attainable than a system that can simply learn from experience.

I think there's plenty of evidence against my assertion... I get the sense that many people have tried over the decades to create brittle, hand crafted systems, and failed, but I'm still tempted to think this is the way to go.


Speech recognition. etc.
September 24

Ever since Meredith got home from Seattle and told me about "Grace", the AI that runs the Microsoft Home of the Future, (I love the name Grace as the name of an in-home AI agent!) I seem to be drawn in by the promise of creating a full spectrum AI system with the following components:

A speech recognition layer
A language parsing layer
A core knowledge representation
A deduction engine
A web interface for looking up well defined information such as the weather via web services
A voice synthesis layer

And to top it all off:

A subtle, well designed microphone, speaker, and LCD installation in the kitchen.

I'm not really interested at this point in working on the voice recognition or voice synthesis layers, I just want a cheap, but easy to use and ultimately effective system.

Here are some fun use cases to think of:

"Grace, what is the weather forecast for tomorrow"?
theweathernetwork.com would load on the LCD for Waterloo, ON.
Grace would speak the weather forecast.
"Grace, how many new emails do I have?"
My GMail account would load on the LCD.
Grace would speak the number of unread emails.
"Grace, what time is it?"
Grace would speak the current time.
"Grace, what is my Social Insurance Number?"
"Grace, what is aunt Beryl's phone number?"
"Grace, what is the police phone number?"

A little bit more out there:

"Grace, order meal #1 from swisschalet.com"
Grace would speak the details of meal #1 and ask for confirmation.
Grace would speak an update once the meal was ordered, and would indicate the expected delivery time.

etc. How fun would that be to tinker with!


Exercise 19
September 25

Summary

Create an application which can, upon hearing a specific voice command, reply by speaking a specific phrase.

Solution

Microsoft's Speech API makes it surprisingly easy to add both voice recognition and text to speech to an application. I was hoping to use version 5.3 of the API, but after downloading and install Windows SDK 6.1, which includes Speech API 5.3, it wasn't listed in Visual Studio 2002 in the "Add References" dialog.

I ended up using version 5 of the API, which is all the way back from 2001. The text-to-speech quality is brutal, but the programming steps would be about the same regardless of which API you use.

Download

Click here


Quality of Microsoft TTS 5.1
September 25

I'm currently downloading Microsoft's TTS (Text To Speech) API version 5.3, which was release within the last year.

I sure hope the quality of the new engine is better than the one in version 5.1 Have a listen to the synthesis of "What is tomorrow's weather forcast"? Yuck! Granted, this is using 2001 technology.


Best Text-to-Speech yet: IVO
September 25

Have a listen to "What is the weather forecast for tomorrow"? Very impressive.

I've tried some other phrases and the results haven't been as impressive as this one example, but overall I like what I hear.


Technology as art
September 26

When Meredith and I were in Madrid this summer, there was an exhibit that used technology and the Internet to create a dynamic display that linked keywords on Internet websites against one of a number of topics, which were color coded, and then marked on a world map where that website was located. It was more complicated than that, but what I found interesting was that technology was being used to create art, and it was in an art gallery!

Here's my idea:

Create a visually interesting object such as a polished half sphere which would contain invisible cameras, microphones, and speakers. It would use the cameras to visually track people walking around the room, and when someone came close to the object, it would say, "Hi, what's your name?" in a pleasant voice indistinguishable from a person's. The person would then say something like "Daniel", "Daniel Bigham", "My name is Daniel", etc. and using voice recognition the object would be able to parse what was being said. It would then say "Hi Daniel, nice to meet you".

If the person then walked away and came back, the computer, having tracked the person walking away and coming back, would say, "Hi Daniel". If the person left the room and came back, an attempt would be made to recognize the person based on a visual fingerprint.

To be fun and impressive:

1.The voice synthesis, and specifically, name synthesis, would need to be nearly perfect, and sound very pleasant, with good intonation.

2.The system would need to be able to construct names that it hadn't previously heard by analyzing the phonemes spoken, and then to be able to re-speak those phonemes.

3.The visual tracking algorithms would have to be really good.

I think this would be a neat "art" piece, albeit a bit noisy.


Exercise 20: Deriving properties of relationships
October 2

Summary

In addition to deriving property values and the existence of relationships, there are times we want to derive properties of relationships. For example:

"I have two sisters"
"I don't have a brother"
"How many siblings do I have?"

rule:
  $1 is_a person
  $1 has(count:$2) sister
  $1 has(count:$3) brother
  $count = # $2 + $3
  -> $1 has(count:$count) sibling

Web UI

Click here

Solution

Click here


Entities and relationships VS the brain
October 4

As I'm reading chapter 4 of On Intelligence, the author makes the comment that "All memories are stored in the synaptic connections between neurons".

Immediately my mind wandered to the ideas of entities and relationships that I've been playing with these last few months: Is this statement about the brain analogous?

In a sense, I already knew that it was since I used this model for the very reason that it was analogous to nodes and edges, and ultimately, neurons/axons/synapses. But what suddenly struck me is that, since the brain uses neurons at different levels, so it is likely that it would be useful for what I'm working on to use the entity/relationship pattern at different levels and in different contexts.

What I'm getting at is that while an "entity" can represent a concrete idea such as "person", "dog", "run", etc., it might be as well suited to represent something at a completely different level of processing, such as how bright a certain pixel on the screen is, etc. These other "layers" are completely different areas of study, but this is an interesting idea to consider.


Book review: On Intelligence
October 6

On Friday when I got home from work, there was my ebay order in some brown cardboard packaging. It was a busy weekend, but somehow I managed to read the whole book!

Overall, I'm excited by Jeff Hawkin's efforts. Like the author, I've been dumbfounded by the lack of any overarching theories of the brain. There are literally thousands of neurobiologists working on the minute details, but those details haven't been brought together in any satisfactory way.

The author's theory is that the neocortex plays the central role in intelligence. He is convinced that the cortex is a layered, hierarchical structure that uses prediction to evaluate, and ultimately, interpret, inputs. This theory is intimately linked with the observation that the cortex is remarkably uniform in nature regardless of whether you're looking at the regions which process vision, sound, touch, etc: It is thought that all of these areas use essentially the same cortical algorithm.

Here is my positive and negative feedback:

The Good

I like the overall theory very much. I think Jeff is on to something.

I love the thought of predictions flowing down from the upper layers connecting with sensory inputs flowing up from the lower layers. I envision it sort of like lightening where you have two probing fingers that finally meet and than *WHAMO*. This is an image I've had in my head a number of times when I've thought about how to use a directed graph of neurons to solve problems, so it was a big "aha" moment to see the idea being used in this way.

The Bad

While I think the theory explains a lot, I have this lingering sense that there is a lot that it doesn't explain. It seems more focused on perception: How we transform millions of inputs into an interpretation, but what about rational thought? Jeff's description is that rational thought is simply the result of higher layers in the hierarchy, but this analogy doesn't quite fit for me.

I was a bit alarmed when I read the opening of the book and the author, in a very forceful way, says that behavior is not where intelligence is at. Wow. That's a pretty bold thing to say, especially for a computer guy. I think I get what he's trying to say: That you can be intelligent without behaving. I agree 100%, but the way in which he says it makes it sound like using a behavior-mindset is bad, which I don't agree with. I think he went too far poo-pooing behavior.

Consciousness. Ok, let me be honest: The purpose of Jeff's theory isn't to explain consciousness, and he spends all of about a page talking about it, but I was a bit disappointed that Jeff poo-pooed the concept of conscious experience a bit. C'mon people: It's pretty much the most amazing, unexplained problem in the universe.

Evolution: Of course, I disagree with Jeff when he delves into the evolutionary history of the brain, etc.

Even though I've listed more negative stuff, my overall impression of the book is extremely positive. I think the author has done a great job putting together a solid theory, and has gone so far as to list a bunch of testable predictions. It will be interesting to see how the theory fares.

See also: http://www.rni.org/OnIntelligence.html



Exercise 21: Query transformations
October 6

Summary

In language, we often qualify an ambiguous noun with an attribute that resolves that ambiguity. For example, since I have two sisters, I can't just refer to my older sister by saying:

"My sister"

I need to say either:

"My older sister"

or

"My sister Rebekah"

In the first case, "older" is a qualifier. In the second case, the name "Rebekah" resolves the ambiguity.

What we need is a transformation such as:

my sister {first_name}
-> $x: speaker sister $x, $x.first_name = $1

In other words, we're not just transforming a fragment into a literal entity, we're transforming it into an entity that needs to be determined by resolving a set of conditions.

For the other example, we could use:

my older sister
-> $x: speaker sister $x, speaker sister $y, $x.age > $y.age

Solution

Click here

Web UI

Click here


Exercise 19 completed
October 6

Although I wasn't able to get version 5.1 of Microsoft's Speech API to work in Windows XP / Visual Studio 2002, I was able to get version 5.1 to work. The text-to-speech quality is brutal, but it works.

Solution

Click here


Milestone: Working implementation of transformation queries
October 7

Today I completed an implementation of transformation queries which represents another milestone for this project. The end result is being able to interpret a phrase such as "my oldest sister", which qualifies an entity with a set of conditions that need to be solved.

The implementation actually went very well, and it was nice to be able to extend and re-use the essence of the rules implementation to do the heavy lifting. After I finished the implementation, I was pleased to see that not only did my base work, but combining it with previous functionality worked right away. Here are some examples:

"My older sister's name"
"My younger sister's last name is Bigham"

I have yet to actually play with this new concept very much, but I get the sense that it adds a lot more power to the transformation system.


Exercise 22: Query transformation examples
October 7

Summary

The point of this exercise is to evaluate the query transformation system by exploring a number of examples.

"Daniel's sister Rebekah": Create a more general transformation:
{noun} 's {noun} {first_name} -> $x: $1 $2 $x, $x.first_name = $1

"My grandpa Bigham": Qualifying a relation using a last name.

"My laptop computer"
my {adjective} {noun} -> speaker.$2: speaker.$2 is_a $1

"What is my older sister's brother's name?": A tricky question.

Web UI

Click here


"Nearly" passing the Turing Test?
October 13

http://gizmodo.com/5062385/computer-nearly-passes-turing-test-for-artificial-intelligence

Today I realized just how "gray" the Turing Test really is. I've been aware for some time that a man by the name of Hugh Loebner set up an official Turing Test, but I didn't realize it would be this easy for developers to use smoke-and-mirror techniques to score well on it. It's all a little disappointing, really.

If I were to define a Turing Test, it would involve producing a machine that attained something like a 90% fool rate after sessions with 10 highly trained interrogators which each asked the machine 100 questions. It's the "highly trained interrogators" that is key here.

For example, I can guarantee you that none of the machines tested this weekend would have even a small hope of answering a statement/question such as this:

Question: "I'm thinking of something that is big and round and 6 billion people live on it. What am I thinking of?"

Trained interrogators would ask simple questions such as this, and today, the machines would likely score 0% on a series of 100 questions and attain a 0% fool rate. And that, I think, would demonstrate where we're at!

On the other hand, I think it's good to have a contest that allows the current state of the art to show off and gives researchers something manageable to work towards. But I wish they wouldn't call it the Turing Test, and then start talking about sentient life forms... please...


IVO and Cepstral voices
October 13

During the last few weeks the best text-to-speech systems that I've come across are Cepstral and IVONA. They both have impressive products at the right price ($30-$49), so I've purchased the Jennifer voice from IVONA and the Allison voice from Cepstral to play with. It's nice to see companies working on this problem and offering their services at very affordable prices.


Exercise 22: IVONA Speech Synthesis
October 13

Summary

Write a class that will connect to the Expressivo application via TCP port 17024 and send a "SayIt|..." command.

Solution

Click here


Exercise 23: Grace
October 13

Summary

Construct a Windows application in C# that runs in the tray and constantly listens for audio input. If the word "Grace" is recognized, then accept audio commands for the next two seconds. Or, if the word "Grace" is detected as the first word in a recognized string, then act on the audio command that is found in the latter part of the string. In addition:

Use a grammar.
Log all activity to a log file using log4net
Respond to the command "What time is it?", or "What is the time?", or "What's the time?", or "What's the current time?" by reading the time using the Expressivo Jennifer voice. The time should be spoken as, for example, "four oh nine".
Test the program in a kitchen or living room setup with a laptop where the laptop is across the room from the speaker.



Disappointment with voice recognition
October 13

My initial playing around with Microsoft's Speech API was quite successful... its speech recognition accuracy seemed very solid, even if I took my headset off and rested it beside the monitor, or put in on the ground. (Wow!) That was until I tried using it in dictation mode, which increases its vocabulary from a few words to a few thousand words. Suddenly it was recognizing a short sentence correctly with < 5% accuracy.

I figured by reducing its vocabulary to say 150 words I'd get the accuracy back, but I couldn't find any easy way to do that. As a work around, I created a custom grammar that supports statements consisting of 1, 2, 3, 4, 5, or 6 words, each word needing to come from a list of 150 possibilities. Even then its accuracy was remarkably poor. For example, here is what it recognizes "My name is Daniel" as on five different attempts:

"My name is Daniel" was recognized as:

my name is Daniel
I name is Daniel
my name is Daniel
nineteen's Daniel

"I live in Waterloo" was recognized as:

"mon"
"man"
"I man name Waterloo"
"Bigham eighteen mon"
"mon"
"I live name one million"
...

In other words, about 50% in the best cases, and < 10% in many other cases. I would have hoped that the accuracy would have been at least 95% with such a small vocabulary.


Exercise 24: Basic context
October 14

Summary

The goal of this exercise is to implement basic support for the contextual words he, she, his, her, it, and its.

Context must be supported within a statement and across statements. For example:
Within a statement: "Daniel likes his bike."
Across statements: "Daniel is 27. He likes running."
If an entity is_a instance and is_a male, then it qualifies as a he.
If an entity is_a instance and is_a female, then it qualifies as a she.
If an entity is_a instance and is neither a male or a female, it qualifies as an it.
A very simple strategy will be used whereby any token which is interpreted to be a he, she, or it will become the contextual resolution for its respective class until it is replaced by the next match.
The entities context.he, context.she, and context.it will hold this information once a statement or question has finished processing.
The is_a instance relationship will be used to specify whether an entity is an actual instance of an object, rather than being a concept/class of object.
A short form notation will be introduced such that if an entity is defined as, for example, i:DanielBigham, an implicit relationship DanielBigham is_a instance will be created. All objects that should fulfill context must have this relationship.
Add basic support for multiple sentences. For the time being, simply split on ". ".

Tasks

Add support for the i: syntax and apply it where necessary so that all of the appropriate entities have the is_a relationship.
Add an IsInstance property to IEntity.
Add basic support for multiple sentences. For the time being, simply split on ". " and "? ".
Design and implement a Context class which will be used for resolving contextual words within the transformation process.
Add a ContextType enumeration with He, She and It as members.
Add a hash named localContext to the Context class that maps a ContextType to an entity. This will be used within the transformation process where we don't want to update context.he, context.she, or context.it since any time we consider an interpretation we might end up abandoning it.
Add properties He, She and It to the Context class, which return the current context for each respective type.
If the context isn't found in the localContext hash, get it from the context.he, context.she, and context.it entities. (Thus, the Context object will need a reference to the brain state)
Implement a ProcessToken class, which will take a token as its input and update the local context as needed.
Implement a Clone method.
Whenever a token is processed during the transformation search, call context.ProcessToken.
Add a Context property to the Fragment class, and set it with a clone of the current context each time a fragment is added during the transformation search.
Add a new transformation output type called TransOutput_ContextToken and thus a new token type called ContextToken. The syntax should be:
he -> context(he)
she -> context(she)
it -> context(it)
 
Update context.he, context.she, and context.it after a statement/question is successfully parsed.

Web UI

Click here


IBM research paper on far-field ASR
October 14

I came across this research paper which was interesting to read through. The highlights were:

1.The affirmation that far-field speech recognition is challenging.
2.A mentioned word-error-rate (WER) of around 60%! That would translate to a command-error-rate of < 10%.
3.Mention of microphone arrays. They appear to help, but not as much as one might hope.

Here's another paper.


Exercise 25: Alternatives to far-field ASR
October 17

Summary

Although far-field ASR may be an ideal component of an in-house AI, current technology doesn't afford it as an option. This exercise is to determine a viable alternative.

Solution

A wireless microphone is a viable alternative to far-field ASR.

A Bluetooth headset is a good option in terms of cost, non-intrusive design, and range of operation.


Exercise 26: Web queries
October 17

Summary

To be useful, an in-house AI needs to be able to answer questions that involve getting data from the web.

At its base level, this will be handled as a linguistic transformation. For example:

what temperature is it outside? -> temperature()

A new mapping type called Programs will be defined, which will associate function names with their implementations. For example:

temperature -> Plato.Programs.Temperature.Query

Optionally, a DLL name can be specified:

temperature -> plato.dll: Plato.Programs.Temperature.Query

Arguments can be specified as well:

what's the current temperature in {city} ? -> city_temperature($1)

city_temperature -> Plato.Programs.Temperature.CityQuery

All arguments are of type string and consist of the entity's ID. The function return value is also a string.

Implement the Query and CityQuery functions mentioned here.


Exercise 27: Visual feedback
October 17

Summary

Implement a class that, given a string, opens a full screen window with no title bar, a black background, and the given text displayed in white letters centered horizontally and vertically. The text should remain visible for 3 seconds and then the window should disappear.


Exercise 28: Integrating Plato with Grace
October 17

Summary

The goal of this exercise is to integrate the Plato engine with the Grace UI, and to demonstrate the temperature() functionality.

Combine the Grace and Plato projects into a single solution.
Create data files (.txt) to represent each of Plato's data structures, and have Grace load those into memory when it initializes.
Every 5 seconds, have Grace examine the file sizes and reload them as needed.
Create a new data structure for Grace which will represent the phrases that will be recognized. They may use entity types, such as:
what is the current temperature in {city}?
 
Optional words are denoted with round brackets, and vertical bars can be used to denote several options. For required words that have several options, use square brackets. For example:
[what is|what's] the (current) temperature in {city}?
 
Transform these specifications into an SR grammar, write it to disk, compile it, and then load it.
When a statement is recognized, pass it on to Plato.
If Plato can't parse the statement/question, abort all operations. Simply do nothing.
If Plato's output is a value, display it and speak it.
If Plato's output is an assignment or relationship, briefly display a full-screen green window. (1 second)
If Plato's output is a TransOutput_Program, display a full-screen gray window while the query executes. If the query completes properly, display and speak the answer. If it fails, display a full-screen red window for 1 second.
Test the setup in a kitchen with a laptop, speakers, and a Bluetooth headset.

...


Exercise 29: Queries
October 18

Summary

Implement the following queries:

1.
What time is it?
2.
What is {first_name}'s phone number?
3.
We need more {food_item}.
4.
What's on the grocery list?
5.
Clear the grocery list.

Support at least 10 phone numbers and 25 grocery items.

Implementing the "What's on the grocery list?" query will involve enhancing the query system so that it can return multiple results. Implementing the "Clear the grocery list" command will involve some new syntax to allow a relationship to be cleared.

Web UI

Click here

Grace application

Click here

NOTE: The Grace application requires that you have Expressivo installed for text-to-speech.


Exercise 30: Web UI
October 18

Summary

An in-house AI's main interface is voice communication, but it should also support remote interaction via the web.

Create a web UI to view, add to, and clear the current grocery list. Add a web service to Grace to allow the ASP.NET page to submit a textual query and get a string response back.

Solution

Since Grace is a standalone Windows application (not an ASP.NET web site), adding a web service isn't quite as straightforward. It requires the Web Service Extension 3.0 and a URI such as soap.tcp://99.236.137.8/graceservice.

Aside from the web service quirks, implementing this functionality was actually quite straight forward!

Click here

Web UI

Click here

NOTE: This will only work if my PC is running.


Milestone: Working version of Grace published
October 29

Tonight marks a significant milestone: Getting the Grace application up and running and published on the web with a ClickOnce installer. (Note that Expressivo is required) The application responds to voice commands starting with the word "Grace". Some example queries are:

"Grace, what is the current time?"

or

"Grace, what is the temperature in Tokyo?"

Other simple queries revolve around phone numbers and the family grocery list. It will be fun to expand on this over the next few months.


The gold pen
November 2

A few months ago I was looking for a pen and ended up grabbing one out of the drawer that had gold ink. It wasn't what I was looking for, but it worked. I ended up using that pen for about three months and filled my notebooks with thoughts, questions, designs, diagrams, and now that it has run out of ink I realize that I've grown quite fond of it. I think I need another gold pen :)

Thanks Emily! .. and Bob! :)


Exercise 31: Music, math, and email queries
November 2

Implement the following use cases:

1."Play {song_title}", "Play the song {song_title}", and "Play the song {song_title} at {number} percent volume". All of a user's iTunes songs should work. Implement "Pause", "Pause the song", "Resume", "Resume the song".

2."What is X * Y?", "What is X + Y?". For now, only implement math for numbers between 1 and 19 since command and control voice recognition won't recognize larger numbers by default.

3."Ask Meredith when she'll be home". Send a pre-formatted email to Meredith asking her when she'll be home, and use a return email address of "grace.bigham@gmail.com". Check the Gmail account once every minute for a reply. If the email is replied to, speak the response.

Solution

Click here


Exercise 32: Large numbers
November 4

Summary

Implement a command and control grammar to recognize large numbers so that Grace can be asked things such as "What is five hundred and ninety times forty one"?

Enhance the voice command syntax to support named rules. For example:
number_part: {group_of_ten} {digit}

What this says is that a rule named number_part consists of a group_of_ten token followed by a digit token.

Allow rules to be referenced using the following syntax:
{number_part}

Since this overlaps the syntax used to reference an entity type, we will treat entity types as an implicit rule. If there is a rule that uses the same identifier, it will override the entity type.

Use the large number translations as a basis for the voice command grammar.

Solution

It appears that voice recognition grammar rules can't contain any circular references, so the transformations that I've been using can't be directly adapted. For the time being, I'm using the following:

small_number: [{digit} | {teen_number} | {group_of_ten} | {group_of_ten} {digit}]
number: [{small_number} | {small_number} {multiplier} | {small_number} {multiplier} (and) {small_number}]

The other issue I ran into is that my approach of enumerating all entity type permutations before trying to parse a phrase isn't scalable past about 6 words. I've bumped into it before, but it's a bit more obvious when dealing with numbers since numbers tend to be very wordy.

I think I'll take a crack at trying to rewrite this part of the language parsing algorithm to see whether it is possible to enumerate on the fly. I expect it is, and this approach might make the algorithm scalable up to say 10 words, or it might make it scalable past 15, it's hard to know.


Exercise 33: Unit conversions and dietary information
November 5

Use cases:

Unit conversions. For example:
"Convert X kilograms to pounds"
"Convert X kilometers to miles"
"Convert X metres to feet"
Alternatively "What is X kilograms in pounds?" or "How many pounds are in X kilograms?"
Support: pounds, kilograms, grams, ounces, metres, feet, inches, feet_and_inches, yards, miles, kilometres, celsius, fahrenheit, kelvin, cups, tablespoons, teaspoons, litres, millilitres, gallons, cubic metres

Dietary information:
How many calories are in {quantity} of {food_item}?
ex. How many calories are in a can of Pepsi? 
How many grams of {nutrient} are in {quantity} of {food_item}
Nutrient types: carbohydrate, protein, fat 
Support at least 10 food items.

Solution

Click here

Grace application

Click here


Voice Tracker Array Microphone
November 6

While I was doing some Google searches the other day, I came across a link to a developer who had put together a from-scratch voice recognition application for his pocket PC. I seem to be impressed by the spirit of curiosity in people that drives them to figures out how complicated things work, and so I sent him an email to tell him that I liked his project. I also mentioned in the email that I was interested in the field of voice recognition and specifically home automation. His response surprised me: He had just started working on the very same project!

He sent me a link to the Voice Tracker Array Microphone and it has been very eye opening. It's great to see an example of a company working on the very technology that I was reading about in a paper a couple of weeks ago. Perhaps this microphone would be capable of implementing far-field command and control ASR in a kitchen environment?


One Voice Technologies
November 6

I followed a link from the Voice Tracker Array Microphone to One Voice Technologies, which is a company that is actively pursuing the home automation sector. Reading their website, it sounds like they understand the big picture pretty well and have some primitive but working products. Very cool!


Exercise 34: Appointments
November 6

Summary

"My next dentist appointment is on November 6th at 3:45" or "I have a dentist appointment on {date} at {time}" or "I have an appointment with {person} on {date} at {time}". Appointments to support: Dentist, massage, doctor, church, and work. Also support non-specific appointments such as "I have an appointment on {date} at {time}".
When an appointment is created, add it to the individual's Google Calendar.
"What appointments do I have today?" should result in "You have a dentist appointment at {time}".
"What appointments do I have this week?" should result in "You have a dentist appointment on Thursday at 3:45, a massage appointment on Friday at 7:00, and a church appointment on Saturday at 8:00".
For the time being, implement a speaker specification via the command "This is {person} speaking". So, for example. "Grace, this is Daniel speaking. I have dentist appointment on November 6th at 3:45". To make this work it will be necessary to implement a secondary command and control grammar that doesn't require the "Grace" prefix. This grammar should be enabled for three seconds after a previous command is interpreted, at which point the standard command and control grammar should be reverted to.

Solution

Click here

Comments

This one ended up being a very involved effort -- not a few days, or even a week, but several weeks. I needed to overhaul the linguistic transformation algorithm for performance reasons, which I did as exercises 39 and 40, and there were several other significant changes that needed to be made. Finishing this exercise has culminated in having a working installation of the application in our Kitchen here at home, running on an iMac. I will use the ability to create and query appointments as a test case over the next few months to see how the technology feels to use.


Exercise 35: Order meal #1 from Swiss Chalet
November 6

Summary

This exercise will explore the challenges involved with carrying out a complex web action autonomously where XML web services aren't available.

Using screen scraping techniques to extract information from web pages. These techniques need to be easily adapted when service providers change their web pages.
Error handling to ensure that problems are detected and operations aborted upon unexpected results or upon unhandled exceptions within the Grace application.
Communication of errors in a concise way using text-to-speech.
Storage of credit card information in an encrypted format on the hard disk.
Relaying price of and expected delivery time via text-to-speech.



Exercise 36: Ordering groceries
November 6

Summary

Building upon exercise 35, implement a basic system to order groceries from http://www.grocerygateway.com/.

Support 100 food items added to the grocery list via "We need more {food_item}" or "Add {food_item} to the grocery list".
Add a "display the grocery list" command. This should display a full-screen white form with the header "Grocery List" at the top and with the list in a scrollable container. A command "Close that" should hide this form.
Add a "print the grocery list" command.
The command "Order the groceries for delivery at {time}" should place an order via the web. Once the user speaks this command, the computer should reply with "Speak the confirmation code to continue". The confirmation code should consist of the word "delta" followed by the number that represents the current day of week. So for Tuesday, the confirmation code is "delta two". Alternatively, the command "cancel that" can be used to abort. Techniques used from exercise 35 should be employed to implement a system that detects unexpected results and aborts operations. A successful order should be communicated as "Your grocery order was placed successfully. The total came to 85 dollars and 21 cents.".



Speaker identification
November 6

Speaker identification is the task of listening to a person's speech and figuring out who they are. People are pretty good at this, although it can be tricky over the phone.

The use case in home automation is resolving the words "I", "me", and "my". If a user says, "When is my next dentist appointment", the computer needs to determine who is speaking to be able to answer the question.

In thinking about this, I realized that the problem is made easier by the fact that a user is always going to be using the words "I", "me", or "my" in these cases, and so the system, if it has a voice print for the user saying these words, has a place to start.


Exercise 37: Alarm clock
November 6

Summary

An interesting use of this technology is an alarm clock:

"Set the alarm clock for 6:30"
"Wake us up tomorrow morning at 6:30"
"Wake us up tomorrow morning at 6:30 with Chariots of Fire"

The concept of an alarm clock could be extended in many ways. For instance, it could read a weather forecast, remind a person about what they're doing that day, etc. An example:

Good morning.
It's Thursday, November 6th.

Your grandpa Bigham would have been 98 years old today.

It is currently 4 degrees celsius;
today will be sunny with a high of 14.

Remember your dentist appointment at 3:45.

-> Plays CBC podcast of the morning news


Using computers for meal planning
November 6

It strikes me that computers and home automation could be used to improve health and give people more free time if they were used for meal planning. Consider the following scenario:

"Grace, make a meal plan for this week."

The computer would have the following information at its disposal:

The family recipes, including ingredients lists, and therefore, nutritional value. Each recipe would have a rating to indicate how much it was liked.
An inventory of what food was currently in the fridge, freezer, and cupboards.
A record of what meals were eaten at home in previous weeks.

Using this information, a meal plan could be put together that would:

1.Be nutritionally balanced and not deficient in any important area.
2.Use foods more often that are already in the fridge, freezer, or cupboards.
3.Use foods more often that are in season, and likewise, foods that could be purchased from local producers.
4.Use recipes more often that are liked.
5.Not use recipes that were already used very recently.

There are a number of variables to optimize, but that's what computers do best: Crunch possibilities and come up with something that is optimum. Since different families would value different things, there could be "sliders" to adjust how important the different criteria were, such as how important it is to use local foods.

Another, related use case would be to ask: "Grace, recommend a recipe for tonight", which would take into account what foods were already in the house to plan a tasty meal, helping to use up foods that would otherwise go bad and reducing unnecessary trips to the grocery store.


The word 'Love'
November 6

Tonight I saw the following error:

Error: The word 'Love' is not recognized

That's a very fitting statement for an AI...


LifeSize video conferencing
November 10

Just for fun today I Googled for HD video conferencing and came across LifeSize, a company that specializing in making that very thing happen. I'm really impressed! I'd love to see a demo of their stuff.


Kitchen computing
November 10

Something that I'm coming to realize is that my interests in natural language processing, voice interface, and home automation all converge on the kitchen.

And so this concept of "kitchen computing" comes to mind, which seems like a relatively untapped area. We've been using our PCs in the study, bedroom, or in the den for a couple of decades. In the last 10 years, computing has entered the living room in the form of the Media Center PC, Xbox, and PS3. But the kitchen has yet to change much. I think that the lack of progress in the areas of natural language processing, voice interfaces, microphone technology, and home automation are the reasons for this, and so as I explore these areas, I'm intrigued by the possibilities.

Some random thoughts:

I see kitchen computing as being much more "hands free" than in other areas, more so than the living room. More often our hands are busy or we're in the middle of doing something and don't want to walk across the room to use a traditional PC. (Or our hands aren't clean) This is where the need for better microphone, voice interface, and natural language parsing technology come in.

The kitchen is really the hub of the family's operations, and it's the place that most often brings us together, so it's an area that is ripe with opportunity.

Meal planning

Meal preparation

Purchasing groceries

Managing the family calendar

Email is a very efficient means of communication, and the BlackBerry and other hand held devices have pushed (hah) email to the next level. But there's still room to grow. Paired with a voice interface and home automation system, email can be gotten and sent on behalf of a person greatly extending their reach. For example, consider standing in the kitchen working on supper and saying "Grace, ask Meredith when she'll be home", or "Ask Billy where he is", or "Tell Billy that supper is at 6". A person with a BlackBerry receives the question within seconds, and their typed reply can be spoken to the person who asked the question in the kitchen: "Meredith has replied. She says: I'll be back by 6". There are many advantages to this kind of communication over making a voice phone call.

HD video conferencing: An LCD in the kitchen becomes a viewport into the world, and brings with it the possibility of "replacing" the phone with HD video conferencing, the next leap after VOIP. Back in the '90s we would imagine the day when we'd have video phones, and that day is getting closer. Both the broadband Internet and camera technology exist today to make this a reality.



Exercise 38: Upload the project to SourceForge
November 15

Summary

Any project of a reasonable size should live in a source code repository.

Solution

Fortunately SourceForge is free. Setting up a project with a CVS repository was a little more involved than I thought it would be, but I got it working.

https://sourceforge.net/projects/platoai/


Google taking spoken queries?
November 15

This article hints that Google is working on a voice interface for the iPhone. Very interesting. I guess cell phones are an ideal candidate since you hold them right up to your face, so the proximity to a person's mouth is great, and cell phones already need to have decent quality microphones since that's part of their core functionality: To record a voice into a digitized format and transmit it.

Prediction: The years 2010 to 2020 will see very significant developments in the area of voice interfaces.


Exercise 39: Alternative transformation algorithm
December 2

Summary

Implement an alternative transformation algorithm that doesn't start by permuting each of the entity type sets, but rather works by permuting the entity type sets of each adjacent words.

This alternative approach will be known as the Adjacent Pair Permutation algorithm, or APP algorithm. As part (b), add a hash that will track what fragments have already been considered so that they aren't re-considered. This in effect prunes the tree. The resulting algorithm will be called Unique Adjacent Pair Permutation.

The original approach will be known as the Exhaustive Permutation algorithm, or EP algorithm.

Motivation

It is impractical to permute each of the entity type sets of a sentence, since a 15 word sentence with 10 entity types per word results in 10^15 permutations. That's one thousand trillion, and the initial permutations are only the first level of a tree!


Exercise 40: Mathematical analysis of the Adjacent Pair Permutation method
December 2

Summary

The APP algorithm processes each each pair permutation. Each pair permutation forms a search tree. Perform an analysis on the number of nodes in a typical pair permutation tree given that:

There are W words
There are E entity types per word
The fraction of pair permutations that transform is Q

Solution

I wasn't quite sure how to solve this one, so I called on Mr. Vaughn Climenhaga for some help, and he delivered! Thanks Vaughn!

The following solution assumes that only the first two pair permutations are considered, which is a best-case scenario. (I'm unclear on whether the algorithm is feasible with this limitation)



So basically O(E^2W * Q^W).

In the case where W = 15, E = 10, and Q = 0.1, we have:

344,926,315,789,473,684,201 = 344,926 * 316 trillion

In the case where W = 15, E = 10, and Q = 0.01, we have:

6,553,401

In the case where W = 15, E = 10, and Q = 1/150, we have:

44,175

In the case where W = 7, E = 10, and Q = 0.01, we have:

25,401

In the case where W = 7, E = 10, and Q = 1/150, we have:

3,892

Discussion

In practice, the algorithm's search tree appears to be much, much smaller than this, so perhaps Q can't be modeled as a constant.

The other important consideration is that Unique APP avoids branching to fragments it has already searched, which heavily prunes the tree.


BlackBerry Bold
December 5

After work today Meredith showed off her new BlackBerry Bold, and am I ever impressed. It has only been in the last year or so that the BlackBerry has really grabbed my attention: They've improved to the point that they are extremely attractive, versatile devices.

The Bold's screen is fantastic. I'm blown away by the resolution of these new displays, and on-screen fonts are extremely crisp and well proportioned. In addition to the great screen, I love how the overall look of the screen and upper portion is flat and glossy.

After playing with the music player, video player, and camera, I am reminded by the amazing convergence that is happening and will continue to happen. We're now at the point of combining telephone, email, web search, maps, music, video, camera, calendar + tasks, GPS, voice notes, ... wow. Incredible.

It occurred to me this week that the BlackBerry and iPhone represent great AI devices. I guess Google's voice search hints at this, but I'm thinking more directly about what I've been working on. Imagine wearing a wireless earbud and a wireless lapel microphone. The BlackBerry/iPhone would interpret speech and relay responses to user via the earbud.

The amazing thing is that it would be as if the whole of the Internet were inside your head. You could be anywhere at any time and ask almost any question and get an answer "inside your head".



Nuance = Nuisance
December 15

After reading David Pogue's very positive review of Dragon Naturally Speaking 10, I figured I'd check it out and see if there was an SDK to play with. There was mention of an SDK on their web site, but strangely no download link, only a link to send a sales request.

I couldn't believe the response: It costs $5000 to download their SDK, or if you just want to take it for a spin, an evaluation costs $2500. What a joke!

Maybe I'm missing something, but it seems like Nuance doesn't understand why SDKs are something you give away: So that developers will create systems that use your product, thus driving sales. A did a bit of Googling and found other people scratching their heads.

I looked up the company's stock and it lost 65% of it's value between May and November.



Review of the Voice Tracker array microphone
December 25

A few weeks ago I decided to go ahead and order the Voice Tracker array microphone. It arrived in the mail, and it was one of those times like Christmas where you tear open the packaging and behold something shiny and new.

After doing some testing I can say that I'm very impressed with the technology. Acoustic Magic has done some very good work on this product, and I'm happy with my purchase. (My only regret is spending $50 more for the USB version: I was under the impression that it would mean a better analog-to-digital conversion, but in reality it's just a cheap dongle that most likely isn't any better than your sound card)

The first test I did was using the USB connector, which I figured would have the best quality. The results weren't very good: Only 75% accuracy with a command and control grammar from 12 feet away in the kitchen.

The next test I did was with the standard microphone Jack set at about a 15% input level. I arrived at this setting by using sound recorder and testing what the audio coming in at 12 feet sounded like, which even at 15% I felt was louder than the input that was coming in through the USB connector. (Even with the USB connector's volume set at 100%)

The results of the test was 92% accuracy, which is pretty good considering the distance.

The final test I did was setting the microphone input to about 30%. This resulted in only 1 error out of 40, for an accuracy of 97% from 12 feet, which I was extremely pleased with.

There were still some frustrations, such as it interpreting my saying "17" as "70" more often than not, but overall I can't complain. The other thing to realize is that I'm using XP's voice recognition, which is circa 2001, AKA SAPI 5.1. Perhaps if I were to repeat the test with SAPI 5.3 in Vista I would have more luck with the 17 VS 70 problems. My hope is that Windows 7 will include another update to SAPI, but only time will tell.

Anyway, I am extremely pleased with the results. Thank you Acoustic Magic!


The next six months
December 25

After working on this project for 6-7 months, we've arrived at Christmas '08, and it is nearly time to direct energies elsewhere:

Because I've spent so much time working on this project, I've been procrastinating on my goal of reading the Bible in a year, so I will need to devote 5-6 hours a week on that until June.

With the hope of biking across Canada this summer, I need to start getting back into shape, and there is lots of planning to be done.

Littlehamberg is set to arrive late March / early April!

Work for Wilf on various web sites.

Hopefully gone all summer biking across the country!

With all of these things on the horizon, I don't expect to be spending more than a couple of hours a week on the Plato/Grace projects. Before I completely change gears, I would like to get the following up and running:

1.The plan is to take one of my two LCDs and set it up in the corner of our kitchen with the Voice Tracker array microphone in front and below it. This will require purchasing a used computer (hopefully for cheap) off of Kajiji and putting it below the kitchen counter. Connecting the monitor/mic to the computer will unfortunately involve drilling a 2" hole through the kitchen counter.

2.Grace will require some changes to reduce the number of false positives: That is, when she responds to a perceived command that is just two people talking in the background. To help prevent this, I'm going to change the grammar to have "Grace" be a command that must be recognized on its own before the speaker can speak the command. Currently the word "Grace" can simply be spoken as the first word of a sentence, and I expect that plays into the number of false/positives. I'll also need to add a "Confirm/Cancel" mechanism to certain commands to give additional control and further reduce undesired actions such as spontaneously playing songs.

3.I need to finish up exercise 34, which allows appointments to be added. This will serve as the base use case of the system, allowing me to add events and ask queries such as "What appointments do I have this week?" or "What is my next appointment?". I will use this over the next number of months as a test bed for the technology.

4.Program state needs to be saved on a regular basis and whenever the program closes.

5.Some basic power management needs to be added so that the computer monitor is turned off soon after a command/question has been answered. This will save energy and make the system more elegant. (Who wants an LCD monitor shining from the corner of the room all the time?) Power management to put the computer into hibernation mode also needs to be added so that when we leave for work and at bed time, the system will automatically turn off to save energy.

6.An interface needs to be added to the outside world so that, via the web, commands can be given and queries answered from any computer or handheld device. I'll also add a command line utility to my Linux environment at work so that I can type something such as:

grace "I have a work appointment tomorrow at 9:30 am"

7.I would like to create a simple, concise web site to present the ideas I've cultivated while working on this project. This will likely be a work in progress over the next few months.

And with that, it will be time to wrap things up until next fall at the earliest!


Microsoft speaker identification
January 5

Something that I'd like to add to the Grace application is speaker identification. There are two benefits:

1.If you can identify the speaker, you can switch to their recognition profile which improves the accuracy.

2.The speaker can use words like "I" and "me". ex. "I have an appointment tomorrow at 3:00 PM".

Until I get something like this working, it would be impossible for Meredith to make use of a kitchen setup, since the recognition accuracy for her voice against my recognition profile would be unusable.

I came upon an interesting page at Microsoft today: http://research.microsoft.com/en-us/projects/whisperid/default.aspx. It's neat to see that they're working on some of the same things.


A significant day
January 11

Today was a significant day for me: I finished work on the first phase of my "Grace" project, which concludes about 7 months and 250 hours of work. The application is up and running in the kitchen, powered by a 20" iMac, a Voice Tracker array microphone, and a lot of elbow grease!

I opened up our calendar and spoke each of our January appointments to the application, followed by a few spoken queries, and things appear to be working quite well. Meredith came downstairs for me to show it off, and we had some good laughs as we often do. If it wasn't for Meredith's idea to get an iMac, we'd have a 2" hole drilled in our kitchen counter right now!

Working on this project has been a phenomenal opportunity for me personally. There is something magical about using your imagination, letting your mind dream, and then with hard work turning that vision into reality.

Along those lines, one of the over arching themes of this project has been to use my own imagination rather than researching and adapting what others have done, and I think that is the aspect of the project that I enjoyed the most. Surely research is important, and the way things are usually done, but I wonder whether progress is sometimes limited by the approach of always adapting the work of others. For me, it is wonderfully freeing to blow the doors wide open and simply dive in with as much imagination as possible.

So with that, I express my thankfulness to our Heavenly Creator for a wonderful opportunity, when life is sweet and appreciated dearly.



Video podcasts, the future, the past, and a whole lotta rambling
January 19

Today was a little mind blowing: I feel like someone hit the fast forward button and I'm sitting here at some point in the future. (Ok, so technically every moment in time is the "future")

Let's review: Last night I was in iTunes and I came across podcasts, which I've never really explored. When they showed up on the scene a few years ago I liked the idea but I didn't find enough content to get me excited. As I came across them again last night, I was intrigued. Ok, so maybe it was the fact that it was a link to a NASA podcast and the space nerd inside of me was tickled.

This morning as I was getting ready for work, I checked out the NASA podcast, and what I saw next blew my mind: An NBC Nightly News podcast, an ABC News podcast, and a CBS News podcast. Video. Podcast. Video on demand. TV on demand! Throw a kitchen computer into the mix and you have the ability to watch the news in the evening or morning as you're sitting there enjoying your munchies. Revolutionary, really.

So tonight Meredith and I watched the news together, which we haven't done in a really long time. We cut our cable last year, and even before that, stopping what we were doing and going downstairs in time to catch the news was far too much work. It was like it became culturally passe for us to have to sit at the TV at a specific time of day. No: Not our generation; we want to watch something when we want to watch it. Or, most times, we won't be bothered to watch it at all.

And the basement, it's a nice "far off" place to watch a movie, but we live our lives on the main floor. So having the news beamed to the kitchen, on demand, to a beautiful 20" iMac screen, it's mouth watering. So video podcasts have become one of the killer apps for this concept of a "kitchen computer" being a central tool in the home.

Before I close, I wanted to touch on what made this day feel so "in the future". Beyond the video-on-demand, the actual subject matter of the news was Barack Obama's inauguration. Some of the African Americans that were interviewed talked about it being a quantum leap, and I think they're right on the money. Bono's remarks, too, were very revealing. I forget how exactly he put it, but basically he said that it changes everything, that somehow, in some mysterious way, this changes everything. I don't know whether he's right, but I feel too like we're sensing the tip of an iceberg: That this event is symbolic of something big.

Next: After watching ABC News, I was curious to find out more about this "Andrew Bird" musician that they interviewed, so I typed his name into the iTunes music store and within 30 seconds I had purchased two of his songs, which are fantastic by the way. Do you see how everything was connected? I was able to get the News when and where I wanted it, and that meant that I actually watched it, and because I'm watching it on a PC, I then immediately put into action the things that the news brought to mind, and an economic transaction happened within 1 minute. That's an economy moving at the speed of light compared with how that series of events would have unfolded 20 years ago. Pretty amazing stuff, thus the feeling that I'm sitting here in the future.

And to contrast all of these themes, the Old Testament: I've just finished first and second Kings, and the enormity of the world and its history is fresh on my mind: What a rich globe we live on, what a rich history. And in addition to the Bible being at the centre of Christian theory, it is a mind blowing (term of the day apparently) account of history. I find it breath taking that I can read intimate stories of people, families, and an entire civilization thousands of years in the past. Almost everything in this world fades to dust, but these scriptures and stories are in tact. Wow.

The future, the present, the past, if you haven't been reminded in a while, this is an amazing planet. Praise be to God!


Evaluation of Grace: Part I: 3rd Party Technology
February 26

It has been a couple of months now using "Grace", so it's time to do some evaluation of the technologies I've used to make this application work. (The context here is that last year I wrote an application named "Grace" that runs on a computer in the kitchen and can be interacted with via voice)

Here are the biggest challenges, the things that don't work well:

Did you say something?

An aspect of SAPI (5.1) that I have found very frustrating of late is how increasingly often it interprets/recognizes non-vocal noise as if it were speech. Back in January when I first starting using Grace, this was a significant but manageable issue, but in the last couple of weeks it has made Grace almost unusable in the noisy kitchen environment. Simply walking across the room or opening a drawer causes SAPI to recognize the command "Grace". Put a glass on the kitchen counter and it will recognize the sentence "Open my inbox". This is where I draw the line: Behavior like that is ridiculous, especially the later example. A couple of days ago I was making some bread and chatting with Meredith, and apparently it heard the sentence "Open my inbox" about 5 times.

I expect that one of the culprits here is that SAPI tries to "learn" over time, adjusting its internal probabilities so that words or phrases that it has heard more often are more likely to be recognized. The obvious problem with this approach is that once you have used a command or phrase a few dozen times, it becomes weighted so heavily that more and more often background noise will match the word or phrase, to the point that you start seeing behavior like I have described above. I believe there is a way to disable this adaptation, which I will likely have to do, but there is a downside to doing this, because I expect that for the most part, this adaptation has a positive effect on recognition rates.

Overall, this is a commentary on where voice recognition technology is at for use in environments that aren't perfectly quiet. If I were to assign a grade on how well SAPI protects itself from recognizing noise as speech, it would have to be an "F". More research needs to be done in this area.

Keeping the monitor off

Grace is primarily a voice interface: You speak a command or query, and it speaks back the answer. To make this work, the computer needs to be running, but there is no need for the monitor to be on until if and when information needs to be displayed to the user. Indeed, in today's world where the environment and energy conservation are important issues, it would be very wasteful to have a computer monitor on all day when it's not needed.

There are Windows APIs that a program can use to put an LCD monitor into and then later out of sleep mode, and at first glance, this seems to solve the problem: The software can keep the monitor off until information needs to be displayed, at which point, it can turn the monitor on. LCDs can come out of standby mode within a second or two -- perfect, right?

Unfortunately, SAPI contains a "feature" whereby audio input automatically takes the monitor out of standby mode. The reasoning is that if a computer is employing a voice interface, audio input is the equivalent of a mouse movement or keyboard key press. Thus, if you're in the kitchen and open a cupboard or even shift in your chair, the monitor turns back on.

The only workaround that I've come up with is to run a loop that tells the monitor to go to standby mode 20 times a second, so that when SAPI goes to bring the monitor out of standby mode, the software immediately overrides it. I worry though that this may be causing additional stress on the hardware. And even with this workaround in place, the software needs to make sure that a black window is completely obscuring the screen, otherwise when you move around in the kitchen the monitor is constantly flickering as it comes out of and then back into standby mode, displaying the Windows desktop for a fraction of a second each time. Gross.

Microsoft: The conclusion here is that for SAPI to be used in an always-on environment where electricity needs to be conserved by keeping a monitor in standby mode, this setting needs to be configurable. Until that time, ugly ugly hacks are required.

When to listen

Another challenge is for the software to know when to listen and when not to listen. For example, if you are playing some music in the kitchen, you obviously don't want SAPI listening. Fortunately, iTunes offers a COM interface that allows the software to know when music starts and stops, so recognition can be enabled or disable.

Unfortunately, I'm not currently aware of any integrations for Windows Media Player, so there doesn't seem to be any way of being smart about stopping/starting recognition while listening to a radio station through Media Player. Perhaps there is a more direct way to accomplish this via DirectShow, etc.

...

Ok, so those are the challenges, the things that don't work very well. Here are the things that work pretty well, but have room for improvement:

...

Recognition accuracy

While far from perfect, I'm relatively happy with recognition accuracy, that is, when you are actually speaking to the software. Grace uses a fairly complex command and control grammar that allows for natural language commands and queries, and accuracy isn't bad. I'm sure this is an area of research that will improve over time, but I can live with where things are at.

One area that hasn't worked that well is numbers. For example, the recognizer seems to have a lot of difficulty distinguishing between words like "seventy" and "seventeen".

Occasionally it will recognize completely bizarre statements that are nothing even close to what I said, but this doesn't happen too often. Interestingly, accuracy seems to be improved when commands and queries are longer VS shorter. For instance, playing a song by saying "play the song Chariots of Fire" will result in fewer mis-recognitions than if the grammar allowed for "play Chariots of Fire". This is a nice attribute to have for a system that prefers commands and queries be spoken in natural language, but sometimes it does make more sense for a command to be short and concise, and it's frustrating when that translates to more mis-recognitions.

iTunes

It has turned out that iTunes has been an important component of a kitchen computer: Music playback, yes, but more importantly video podcasts. I can watch the nightly news by saying "Play the ABC news podcast", likewise the NASA podcast, and TED podcast.

Having a COM interface has made interfacing with iTunes possible. Without a COM interface, there would have been some serious problems such as knowing when to listen and when not to listen. And as it turns out, many podcasts seem to have a relatively low volume, so the software can also adjust the system volume to an appropriate level when a podcast is being viewed, and then restore it to the default level when it stops being played.

While iTunes has been a very important piece, there are frustrations: For instance, if the Windows tray opens an information balloon, video playback drops to about 0.2 frames per second, and you have to get up and fight with the computer trying to close the darn thing before you can continue watching your program. It also seems impossible to make the video full screen via the COM API, which is unfortunate.

...

And finally, things that have worked very well:

...

iMac

The iMac hardware is really ideal for a kitchen installation. It's very quiet, pretty, and compact, all of which are very important. And of course, it now runs Windows.

What I'm most impressed by is how quiet it is: Probably an order of magnitude quieter than many desktop computers I've owned, and ends up being virtual silent in the kitchen environment. This can easily be a show stopper for a kitchen installation since a noisy fan is extremely tiring to listen to, and many people, myself included, wouldn't have patience for it.

I also love how the iMac looks: The screen is a beautiful glossy black when it's off, which looks great in the corner of the kitchen, and the anodized aluminum looks similarly nice. I wouldn't want a ugly computer in the corner of my kitchen, so this is an important attribute for it to have.

The compactness: I couldn't be more pleased with how compact it is: It saved me drilling a hole in my kitchen counter which would have been required if I had used a desktop + LCD monitor. Even the keyboard is understated. Perfect.

And finally, the Apple remote! What a wonderful gadget, and this turns out to be quite important because there's no way to pause audio or video, skip tracks, or adjust the volume using a voice interface because SAPI isn't going to be able to hear you over the audio that the computer is plying.

My one gripe has been that the wireless adapter appears to have gone flaky and then died on me -- and what's with Apple mice? I replaced the standard mouse with a wireless Microsoft mouse.

Anyway, the iMac has been a very important component of this project and has worked remarkably well. It was Meredith's idea too, so good thinking Meredith!

VoiceTracker Array Microphone

I'm very happy with this purchase: It's an array microphone that even works from 12 feet away, albeit with moderate performance at times from that distance.

A project like this is only really possible with a high quality array microphone. I experimented with Blue Tooth headsets, but:

1.Who wants to wear one around the house?  Not me.

2.Recognition accuracy sucked.

Another alternative would have been to use a high quality wireless microphone, but the whole idea here is for the system to be hands free, because when you're in the kitchen, you're often busy doing things, or have wet or grimy hands and don't want to have stop what you're doing to handle a device.

So bravo to the VoiceTracker team!

My one beef here is that the USB adapter they send you has been gimped so that it only produces 1/10th the volume that it would by default. This makes recognition from 12 feet lousy. I would normally just bypass this and plug the microphone directly into the iMac, but as I discovered, the iMac doesn't have a microphone input. How's that for frustrating! I ended up purchasing something called an 'iBooster' to get around this, but I'm unclear as to how well this is working. I wonder whether it is causing clipping when I'm actually close to the computer, and I'm a bit confused because the line input volume seems to jump around: Does Windows automatically adjust line input volume when it's used for SR? I'll need to do some more playing around with this.


Wolfram Search
March 9

Along side Powerset comes Wolfram Search, due to launch in May. This is exciting stuff, and I can't wait to see what they've come up with. From the little I've read, it sounds as though they're actually modeling the knowledge and then using natural language parsing to allow the user to query that model. Very cool.



My first BlackBerry application
April 6

A couple of weeks ago, Meredith came home from work with a shiny new BlackBerry Bold for me, which I had been eying for quite some time. Getting a data plan through Rogers turned out to be more than a little frustrating: A $25 data plan quickly turned into $43/mo because they charge you $10/mo for not taking a voice plan (lame) and another $7/mo for a system access fee. (sigh)

Anyway, once I started playing with the device, I was curious to see what the development tools were like, and I have to admit, I'm impressed! RIM makes available a Java Development Environment "JDE", which consists of an IDE and tools for copying apps to the device, etc.

Last weekend I got my application up and running. A fun experience!


Smart alarm clock
April 21

Something that I've wanted to try for some time is to create a smart alarm clock that would determine whether you have an appointment the next day and wake you up in time. (I'm great at forgetting that I have a chiro appointment at 8:15 am)

The other half of this is to have the alarm clock speak useful information to you such as:

The current weather and forecast weather.
What appointments you have that day.
What birthdays of family or friends are that day, or what wedding anniversaries.
etc.

I got a BlackBerry last month and realized shortly thereafter that it, in combination with its charging dock, make a great piece of hardware for implementing this alarm clock.

Here's a video

I also came across this link today which is quite interesting.


IBM does Jeopardy
April 26

http://gizmodo.com/5228887/ibm-prepping-soul+crushing-watson-computer-to-compete-on-jeopardy

Very cool. A group of IBM researchers are trying to develop a question-answering system that uses natural language processing. I will be very curious to see how successful they are. I wouldn't be surprised if they're not entirely successful at this.


Home automation
May 2

After doing a little bit of research, it appears that Z-wave technology is making it realistic to remote control many household devices, such as lights, thermostats, garage doors, etc. All that is required is a USB stick for your computer, a C# API, and then Z-wave devices. Even the dimmer switch that we just bought for the baby's room appears to support Z-wave.

It would be fun to be able to control household devices via a voice interface as well as a BlackBerry interface.

Here's one clever application: Tie in my BlackBerry's GPS coordinates to Grace, and then configure it to watch my rate of speed and location. If my rate of speed is approximately that of a bicycle and I'm pedaling down our street towards home on my way home from work, open the garage door.

A lot of fun could be had with this stuff!


Gizmodo on Wolfram Alpha
May 5

Article

This is the first time I've stumbled across coverage of Wolfram Alpha on one of the main news sites that I read. Of the examples they gave, I'm pretty impressed by Wolfram Alpha's response.

One of the curious aspects to me is the presentation layer, which figures out what information to present/compute for a given thing, and then how exactly to present it. For instance, the graphic that illustrates the great circle path between two cities. I wonder how they've made that work in such a broad way?


Evaluation of Wolfram Alpha
June 1

This is long overdue, but here are some initial tests that I've performed on Wolfram|Alpha. The tests are broken down into six rounds of tests -- I only wrote the round 2 tests after writing and performing the round 1 tests, etc.

Round 1

7/8: PASS: 0.875
7 / 8: PASS: 0.875
what time is it?: FAIL
the current time: PASS: 1:55:29 pm
what is the current date?: FAIL
the current date: FAIL
what is the current temperature in Toronto?: PASS: 15 C
what is the capital city of Canada?: PASS: Ottawa
what is the capital city of Ontario?: FAIL
what is the current temperature in the capital city of Canada?: PASS: Ottawa
convert 0 degrees Celsius to Fahrenheit: PASS: 32 F
0 Celsius = ? Fahrenheit: PASS: 32 F
0 C = ? F: PASS: 32 F
what is President Clinton's first name?: FAIL
what is Bill Clinton's birthday?: PASS: 19-08-1946
how old is Bill Clinton?: PASS: 62 years

SCORE: 11 / 16

Round 2

what is 7 plus 8?: PASS: 15
what is the third prime number?: FAIL: Expected 5
what is the first day of the week?: FAIL: Expected Sunday or Monday
when did World War II end?: FAIL: Expected 02-09-1945
how many miles are in a kilometer?: PASS: 0.6214
how many miles are in a marathon?: PASS: 26.22
how far is Waterloo Ontario from Ottawa?: FAIL

SCORE: 3 / 7

Round 3

how many calories are in an apple?: PASS: 91 calories
how many grams of fat are in a BigMac?: FAIL
how many calories are in 10 apples?: FAIL
how many calories are in a cubic meter of cheese?: FAIL
what is Bill Clinton's first name?: FAIL
how far is it from the moon to Earth?: FAIL

SCORE: 1 / 6

Round 4

how many prime numbers are less than 100?: FAIL
how old is Canada?: FAIL
who was the first Prime Minister of Canada?: FAIL
what is the square root of one hundred forty four?: PASS
what is four fifths times five?: PASS
what is the population of Canada / USA?: PASS

SCORE: 3 / 6

Round 5

x = y^2: PASS
x = y^2 where y = 4: FAIL
how many nautical miles is it from Toronto to Ottawa?: FAIL
how long would it take sound to travel from Toronto to Ottawa?: FAIL
what is the wavelength of red light?: FAIL
is water denser than lead?: FAIL

SCORE: 1 / 6

Round 6

how many vowels are in the word "Test"?: FAIL
how many letters are in the alphabet?: FAIL
what is $300 + 10%?: PASS
how many water atoms are in a liter of water?: FAIL

SCORE: 1 / 4

FINAL SCORE: 20 / 45

Overall, I'm quite impressed. As far as I know, this has never been done before.

Still, deep down inside, I think everyone knows that systems can be created to do much better than this. The question is whether Wolfram|Alpha will evolve to become a system that scores 90% on a test like this, or not.

Anyway, for the time being Wolfram|Alpha takes the crown as being the most impressive search system that allows natural language (ish) queries and specific search results.

Well done!


Bing?
June 1

Ok, I have yet to really test Bing, but since I've heard a couple places claim that Bing has some similarities to Wolfram|Alpha, I thought I'd repeat my first round of Wolfram|Alpha tests on Bing. Wolfram|Alpha scored 11/16 on these tests... so how did Bing do?

7/8: PASS: 0.875
7 / 8: PASS: 0.875
what time is it?: FAIL
the current time: FAIL
what is the current date?: FAIL
the current date: FAIL
what is the current temperature in Toronto?: FAIL
what is the capital city of Canada?: PASS: Ottawa
what is the capital city of Ontario?: PASS: Toronto
what is the current temperature in the capital city of Canada?: FAIL
convert 0 degrees Celsius to Fahrenheit: FAIL
0 Celsius = ? Fahrenheit: FAIL
0 C = ? F: FAIL
what is President Clinton's first name?: FAIL
what is Bill Clinton's birthday?: FAIL
how old is Bill Clinton?: FAIL

SCORE: 4 / 16

Ouch. So Bing fails utterly in terms of providing specific answers, which I thought was supposed to be a big part of what it was about. (Given Microsoft's acquisition of Powerset and other things I've heard)


Nutrition
June 30

I've been working on some enhancements to my Grace project that allows for queries such as "how many calories are in 1 Tbsp peanut butter?", or "how many calories are in {recipe}"? (And likewise for other nutrients such as fat, saturated fat, trans fat, carbs, fibre, sugar, protein, vitamin A, vitamin C, etc.) It also allows you to say "I had 3/4 cup raisin bran", etc., and then later on to ask "what have I eaten today?", or "how many calories have I eaten today?". (And likewise for the other nutrients)

As I've been collecting nutritional information for various foods, I'm reminded of how thankful I am for laws that have been passed to promote consistent labeling. (Thank you government!)

One complication is that I'd like the actual number of g/mg/ug for the various vitamins and minerals, but the nutrition labels give a "% Daily Value". I did a Google search for the Government of Canada web page, but I was unsure whether these percentage values were based on the recent RDA/AI (recommended dietary allowance /adequate intake) numbers.

I ended up sending an email to the link on the "Contact Us" page and within 7 minutes I got an email reply back with a bunch of helpful information and a link that answered my precise question. Wow! I'm impressed.


Nutrition graph
July 8

Here's my nutrition graph for July 8 2009:


This is the output of my Grace project.


Grocery shopping
July 21

Today was the first day I had been in a grocery store since spending some time working on some nutrition software.

To give some background, I have written a subroutine so that you can ask "recommend a food", and the software will consider what foods will maximize your nutrition graph for today -- that is, to minimize the amount of vitamins, minerals, and nutrients still required for the current day, given what you've already eaten.

I have noticed that this subroutine often recommends foods such as:

GoLean cereal
Raisin Bran cereal
milk
orange juice
cranberry juice
bananas
pears
yogourt
chicken breast
...

And so, as I walked along I had it in my head to pick up some GoLean, Raisin Bran, OJ, bananas, pears, etc.

In addition to picking these items up, I was more interested in comparing the nutrition labels on various kinds of bread, as well as orange juice. I chose the bread that minimized sodium while maximizing omega-3... and I picked the orange juice with 30% DV of calcium; I got the PC variety rather than the Tropicana variety since it was $2.99 VS $4.50.

I real breakthrough was finding Miss Vickies unsalted chips. They're just potatoes and sunflower oil, and they taste great! I can't believe it: A potato chip with 0 sodium!

I have also started entering costs into the system so that, while maximizing nutrition, it will also minimize cost. Fun stuff.


Having fun with food
August 30

Over the last couple of months I've implemented some more features for my Grace project. These features are focused around food, which was the original dream for the project, so it has been nice to finally get around to that.

At first, I focused on expanding my database of foods and adding their nutritional information to the program. Next, I added the ability to tell the computer what you have eaten and have it draw a nice graph showing you what essential nutrients you are missing for the current day. Using the nutrition tracking has been an interesting exercise, and has confirmed my suspicion that I consume upwards of 5000 mg of sodium each day, twice the recommended amount. It has also been very helpful when certain other members of the family want to keep track of what they're eating without always having to look up points values.

With a good food database in place, I was able to move on to more interesting things. The first step was to use the food database to create a recipe database, and thus to be able to calculate the nutritional information for any of our recipes. The second step was to allow the program to keep an inventory of all of the foods in our kitchen by saying, for example, "add milk to the fridge", or "add frozen strawberries to the freezer".

The immediate reason for keeping an inventory was that we have so many cans in the cupboard (and things in our freezer) that it becomes very frustrating if you want to figure out whether you have a certain ingredient -- you end up pulling half the cans out of the cupboard searching. Now, you can simply ask "do we have any refried beans?", and if you need to narrow down the location, you could ask "where are the refried beans?".

Although the inventory is helpful, the real money is being able to answer the question "what can recipes can I make?". For example, we were on our way back from church today, and as is usually the case, both of us were famished and didn't have much energy to figure out what we could cobble together for lunch. We figured we'd just grab some food on our way home, but instead I typed "what recipes can we make?" into my BlackBerry and a list of several options came back, such as Rotini and Fresh Tomato, Basil, and Parmesan. That sounded pretty good, so we drove home instead and had a nice home-cooked meal.


Starting at Wolfram Research
September 7

Here I am in Boston, on Labour Day, and tomorrow I start working for Wolfram Research. When I first read about Wolfram Alpha last spring, I never imagined that something like this would have worked out. What a neat opportunity!

Speaking about opportunity, my team has it's work cut out for it. As I've read what the press has had to say about W|A this summer, one thing is clear: The part of the system that most limits it's ability to answer people's questions is it's ability to understand user input. Much of the time, the system has the data and algorithms it needs to answer a question, but is unable to map a question to those data and algorithms.

And of course, this isn't surprising, since it is really the language parsing layer of W|A that is the most difficult. Ever since computers were invented, people have dreamed about creating a computer that could understand questions -- and after several decades of trying, people become disillusioned with this vision. As of 2009, there are still decades of work to be done. It is a very difficult problem!

In a recent report written by Stephen Wolfram, it is stated that the team has been able to reduce the "fall through" rate by 10%, which is fantastic progress for only 1 summer of work! Hopefully this trend continues, and I look forward to helping make that happen.

Today we did a "duck tour" of Boston, which involved an amphibious vehicle... first driving around, and then boating down the Charles river. I even got to drive the boat, and let it be known that I was the only person out of about a dozen that could keep the boat going straight! What a hero.

We also took a tour of Fenway Park. One of the tour guides said that we were the first group in about six months to be allowed onto the field. And when I say "field", I mean the warning track. They wouldn't let us on the grass. Think he was telling the truth, or just trying to make us feel special?! Eli has been fantastic for the most part, but he was pretty upset when we made it back to the hotel this afternoon... apparently, disrupting a baby's nap schedule is a recipe for one very angry baby! Poor guy.

It will be fun getting up tomorrow, navigating the subways, and then arriving to meet the folks of Wolfram Cambridge!


Wowza
October 19

Impressive? Very. Scary? Yes.

http://www.wired.com/magazine/2009/10/mf_optigenetics


The prez made a funny
November 24

http://gizmodo.com/5411752/obama-to-robots-im-watching-you

lol, I love it.


Defining intelligence
December 16

Thought of the day:

Intelligence is the ability to form high level concepts from low level observations, and then to be able to relate those high level concepts to new low level observations to solve problems.


Good Job Ray
February 10

Ray Kurzweil has some pretty elaborate and seemingly unlikely theories about the future, but reading the following was a wonderful thing:

"I get very excited about discussions about the true nature of consciousness, because I‘ve been thinking about this issue for literally 50 years, going back to junior high school. And it‘s a very difficult subject. When some article purports to present the neurological basis of consciousness... I read it. And the articles usually start out, “Well, we think that consciousness is caused by...” You know, fill in the blank. And then it goes on with a big extensive examination of that phenomenon. And at the end of the article, I inevitably find myself thinking... where is the link to consciousness? Where is any justification for believing that this phenomenon should cause consciousness? Why would it cause consciousness?"

It's very rare that I read someone's thoughts about consciousness and think "Wow, this person really gets it, they're thinking about it in a very similar way to how I think about it".

But I have to say, this was very encouraging.