The notes from last semester are available in chronological order at my backup site. Click on the category link on the right-hand side to view notes from that class only. Saves you from having to read bottom-up…
The notes from last semester are available in chronological order at my backup site. Click on the category link on the right-hand side to view notes from that class only. Saves you from having to read bottom-up…
Posted on January 10, 2006 in Computer Architecture, Computer Forensics I, Programming Languages | Permalink | Comments (2) | TrackBack (0)
Things to be familiar with for the exam:
Posted on December 02, 2005 in Programming Languages | Permalink | Comments (1) | TrackBack (0)
In determining the "goodness" of a programming language, you must find a good balance of what you're looking for — things like readability, writability, reliability, etc.
Syntax issues arise, such as, "How am I going to represent a collection or list — commas, spaces, square brackets, curly braces?"
A lot of things will affect the cost of a language. If you know some Java, it's probably going to be easier to learn C than if you had come from Fortran (you're familiar with int, float, curly braces, etc). You spend less time becoming productive and more time being productive. Because many of our current languages are modeled after existing languages, so the history of programming languages is important.
The movement from machine language to any kind of symbolic (e.g. assembly) language was a big step forward. That jump added readability and writability because it was hard to read and write machine language. The assembly languages did not, however, move toward thinking link a human. You had to think like the architecture (Von Neumann).
Fortran proved that a high level language could be devised that could adjust to the programmer and be useful. There were a lot of constraints on the languages, and their developers didn't make the best choices with array dimensions, statements to include in the langauge, words to represent things, symbols that could be used in more than one way, etc.
One way to improve a language is to find out what the language does to allow programmers to make common errors. New languages can make errors easy to catch and correct or just disallow the error altogether.
Pointers were seen as a step backwards, but they had the ability to deal with dynamic data structures. Pointers made things less readable and more writable. We understand that pointers are error-prone, so languages like Java create dynamic data structures and have all the error-prone pointer functions running in the background.
Support was added to Turbo Pascal that enabled us to deal with objects and instances of objects. We had a sorted dynamic list notion; initial lists were declared to have an initial size and a grow factor. It grows and cleans up after itself and returns to the function when the list needs to grow. Some of the burdens on C programmers have been lifted in languages like Java and Turbo Pascal.
Posted on November 30, 2005 in Programming Languages | Permalink | Comments (0) | TrackBack (0)
When we type a number like
const x = 6143; const y = 0xa2b3;
The compiler sees a string of ASCII characters. We are thinking x is storing the number 6143 or hexadecimal a2b3 (decimal 41651). So if we are looking at a hex number…
<hex_num> ::= <hex_digit> | <hex_num> <hex_digit> <hex_digit> ::= '0'|'1'|'2'|…|'a'|'b'|'c'|'d'|'e'|'f'|'A'|'B'|…|'F'
Now we need an evaluation function (a meaning).
V('0') = 0
V('1') = 1
.
.
V('9') = 9
V('a') = 10
.
.
V('f') = 16
V('A') = 10
.
.
V('F') = 16
We also need "the value of the left-hand side is the value of the right-hand side under those circumstances." Looking at the symantics…
V(<hex_digit_x>) = V('x')
Then we have to deal with the first rule above…
for <hex_num> ::= <hex_digit>
V(<hex_num>) = V(<hex_digit>)
value of l.h.s. = val of r.h.s.
<hex_num>[1] ::= <hex_num>[2] <hex_digit>
V(<hex_num>[1]) = V(<hex_num>[2]) * 16 + V(<hex_digit>)
value l.h.s. is 16 * val of r.h.s._hex_num + val of hex_digit
<hex_num> ::= <hex_num> <hex_digit>
::= <hex_num> 3
::= <hex_num> <hex_digit> 3
::= <hex_num> b3
::= <hex_num> <hex_digit> b3
::= <hex_num> 2b3
::= <hex_digit> 2b3
::= a2b3
We touched on pre- and post- conditions in class (i.e. a=6 needs a precondition that a is an int and a post-condition that a is equal to 6).
Exceptions are unexpected error conditions (anything unexpected will be an exception that you'll have to deal with). For example, you expect someone to put in a number and they type "twelve" (because they are just not smart). Divide by zero is a type of exception… Range errors (index out of range) lead to buffer overflow, DoS attacks, etc.
C++ was one of the first language to deal with exception handlers. In C++ you throw an exception and in ADA you "raise" an exception. The difference is that C++ didn't use the word "raise" because they had a raise function in another library (they didn't want conflict between the names).
When you start talking about exception handling being built into the language? Are you going to have default exceptions such as divide-by-zero? If you are, you run into a slight problem: you might want to handle them differently in different places. The idea of a zero divide error being an exception, you'd have to say "on zero divide" throw an exception function. You might want to terminate the program gracefully by closing open files and exiting instead of a blue screen of death. In another spot, if you have a count of zero on a list average, you might just want to say "there's no information in the file" and you want to continue processing. You shouldn't have to close the program because you tried to perform a function on a non-existant file.
We also discussed event handlers and where to return to after the handler has done its job. Given that the event could have occurred in a variety of places, how do you call an exception handler?
Read Chapter 14!
Posted on November 28, 2005 in Programming Languages | Permalink | Comments (0) | TrackBack (0)
These are the notes exactly from class, but there are lots of [sic] errors in them. Consult the book for this section instead!!
subprog::-<ret_type><name>[1](<arg_list>) begin <body> end <name>[2]
<name>[1].string = <name>[2].string
S ← nil
S ← <num>,S
<num> ← 0|1|2|3|4|5|6|7|8|9
The deviations from average (4) are -1, 0, and +1. To get the deviations, you need to get the average. For the average, you need the total and the count. This grammar can generate a list that is arbitrarily long. To get the sum and the count, you would have to traverse the list. You might traverse the list up to three times, but you should do something like counting and totaling at the same time and pass through the list only once. You can't use the grammar to make that happen.
To keep track of the root, add a rule to your grammar:
Root ← S
The sum is going to come from the values of the children. Nil should get a value of zero and 3, 4, and 5 will have their own respective values. To add, you can start at nil and move up the parse tree. Once the total and count arrive at the root, you can generate the average and send it back down to generate the deviations. To do this we need additional rules. Note that with the following you could generate a divide by zero error.
L.H.S. count = R.H.S. count
L.H.S. sum = R.H.S. sum
(synthesized - get info for parent from child)
L.H.S. average = L.H.S. sum / L.H.S. count
R.H.S. average = L.H.S. average (pass info from parent to child)
S ← nil
L.H.S. count = 0
L.H.S. sum = 0
S ← x,S
L.H.S. count = 1 + R.H.S. count
L.H.S. sum = x + R.H.S. sum
R.H.S. average = L.H.S. average
To avoid the divide by zero error, you could change the rule to
Root ← <num> | <num>,S
…but of course that would add some cases to the rules above.
R count = 3, sum = 12, avg = 4 ←
|
S count = 1+2 = 3, sum = 3 + 9 = 12, avg = ? ← 4
/|\
3 , S count = 1+1 = 2, sum = 4+5 = 9, avg = ? ← 4
/|\
4 , S count = 1+0 = 1, sum = 4, avg = ? ← 4
/|\
5 , S count = 0, sum = 0, avg = ? ← 4
|
nil
Posted on November 21, 2005 in Programming Languages | Permalink | Comments (0) | TrackBack (0)
It's probably a little late for this, but here are some LISP notes sent to me by one of the students in our class. They're much more in depth than what I have on this site and may be helpful for the next LISP assignment and final exam.
Posted on November 19, 2005 in Programming Languages | Permalink | Comments (0) | TrackBack (0)
A grammar is a metalanguage that can describe a programming language. From Wikipedia:
Formal grammars fall into two main categories: generative and analytic.
In short, an analytic grammar describes how to read a language, whereas a generative grammar describes how to write it. For example,
S S::=if <cond> then <stmt>
S is the "start" symbol. The "if" and the "then" are terminal symbols. "Cond" and "stmt" are additional non-terminals (yet to be defined). A rule will be made up of something that looks like a left-hand side, a replacement operator, and a right-hand-side. The right-hand side will be made up of terminals and non-terminals. The above rules describe a context-free grammar.
S
S::=if <bool_exp> then <stmt>
<bool_exp> ::= bool_var and <bool_exp> |
bool_var or <bool_exp> |
bool_var
"And" and "or" usually have special symbols depending on the language it describes, so I've left the words here for this general description.
For the most part, a generational description (like a grammar) is a more effective way of teaching a programmer than, say, learning a language from a compiler. However, a programmer doesn't go back to the grammar to check if something is valid code; they throw it at the compiler and see what it returns.
Subprog ::= <ret_type> <name> ( <arg_list> )
begin
<stmt_list>
end <name>
With non-terminals, you cannot force them to be the same thing in two different places. So, <name> won't necessarily mean the same thing on line 1 as it does on line 4 above.
S → aSa | ab S → ab
S → aSa → aabb = a2b2
S → aSb → aaSbb → aaabbb = a3b3
≡ {anbn | n >= 1 }
There's a small section in Chapter 3 on Attribute Grammars, which allow you to add additional information in. You have a rule for the grammar, and in addition to the rule, you have some attribute functions. So you would have something like
Subprog ::= <ret_type> <name>[1]( <arg_list> )
begin
<stmt_list>
end <name>[1]
And then you would have an attribute rule in addition to the form rule with some additional information like
Rule: <name>[1] = <name>[2]
With this, you can force the <name>s to be the same. So with a C compiler,
x = y + z;
You can check whether y and z may be added together. If y or z are a float and x is an int, the language will know to reject the statement because of a type mismatch.
<assign_stmt> ::= <var> = <exp> <var> ::= x|y|z <exp> ::= <var> | <var> | <var>
The tree for this would look something like this (forgive me for not making it pretty for you):
<assign_stmt>
/ | \
<var> = <exp>
| / | \
x <var> + <var>
| |
y z
Readability and writability were also important in describing languages. Grammars made it more writable to describe recursion, etc.
Posted on November 16, 2005 in Programming Languages | Permalink | Comments (0) | TrackBack (0)
When designing a language you need to decide on data types and operators (quotient, remainder, power, sum, difference, power), and how each data type and operator will be implemented. Also you have to decide whether you'll use hybrid operators. "Closed" operators are something like +, -, etc. Hybrid operators might be comparisons where the inputs are different from their outputs (something like less than).
The number of bits and the storage mechanism you choose will create a range of values that variables of a certain data type can assume. Most languages have chosen to use a standard ASCII character for the character data type, but we may start to see 16- or 32-bit unicode characters (to support a Chinese character set, etc.).
The character data type gives rise to one of the earlier structured data types, a string. A string is a sequence of characters, but you don't have to directly include a string data type because they included something more general that would do the job, an array. For all practical purposes, a sequence of characters can be thought of as an array of characters that you know the length of (or some other attribute). The length is implemented by either storing the length with the array or by marking the end of a string with a null terminating character.
PASCAL borrowed a C array of bytes and chose to interpret the 0 byte as the length (a short unsigned integer). That gave them up to a maximum of 255. Bytes #1-255 were interpreted as ASCII characters, not integers. So, the first byte of a string is a short unsigned integer and the last 255 bytes are characters. Every operation on the string must update the length byte, and there can't be a string longer than 255 characters. You may also specify the length so you don't waste bytes on unused characters.
VAR Y: STRING[4];
will give you a string of 4 bytes and a length byte, where C would give you bytes 0-3 (where 3 must be the null-terminating character).
In C, you must remember to allocate the space for the null terminating character. Since arrays in Modula-2 know how big they are, you don't have to mark the end of a string with a null-terminating character unless it's not full. For four-letter words in Modula-2, you would just declare an array of 4 characters and you'd use all of the space allocated for the string.
The notion of a linked list of characters is not very effective, although it might be a nice way of manipulating dynamic strings. The pointers are too big (~4 bytes) so it's just not space effective. There is an alternate model, whereby the language can allocate a string space. The string space is a huge array of characters, which we'll call SS (string space). We can think of this as global data like a heap, which holds any strings we want to put in there.
x:string
In the above, X has a start and a length. So if you had a start at 3 and a length of 4, you might have something like this in the string space:
SS ----- | | 0 ----- | | 1 ----- | | 2 ----- | L | 3 ----- | O | 4 ----- | V | 5 ----- | E | 6 ----- | | 7 ----- | | 8 -----
Over time, the string space gets fragmented. You have to decide whether new strings get added with a best-fit or first-fit algorithm, or by occasional garbage collection. If you change the model so that strings are lists of pairs instead of just one pair, the string can be fragmented where pairs are little linked-lists.
Consult this page of notes for information on storing unions and structures.
Some of the issues we talked about with expressions are parenthetical (operator precedence, associativity, overriding, etc). In APL, everything has the same precedence with right-to-left associativity, which is really unusual. Without operator precedence and/or associativity, we have ambiguity, and different compilers might then produce different results, making the code where it isn't portable.
Subprograms introduce parameter passing (in, out, in-out, pass-by-reference, pass-by-value, copy result). Copy result/in-out has the same long-term consequence of pass-by-reference but the same drawback as pass-by-value. Copy result, then, is kind of a different way of implementing in-out parameter passing.
We also talked about overloading names, local referencing environments…
Posted on November 09, 2005 in Programming Languages | Permalink | Comments (0) | TrackBack (0)
In the LISP environment, it's common to use one subprogram as input for another subprogram.
f(g(x))
It's also possible in C with pointers
f( &g(x) )
In the LISP model, f acts on the return value of g. Suppose you set f up so that its argument is of type int and you call it with a function that returns a float. C can't put a float into an int, but C doesn't know that because it doesn't type-check arguments.
In C, all functions are defined at the file level. In addition, you can define variables inside functions (but you can't define functions inside functions). In PASCAL, you can define functions inside of functions.
prog
fA
fB
fC
fD
fE
fH
In the above code, fA can call fB, fC, and fD but not fE. Only fD can call fE. The rule is that a function can see its children and its siblings, but not beyond its children. So they can see at most one layer down, and their siblings.
Suppose fA has a variable x and fE uses x. fE can use x! As long as fD doesn't have an x and fB and fC don't, it keeps going up the hierarchy until it finds a match.
At runtime, each function needs to know the location of its static parents in addition to the function call.
Data encapsulation and data hiding have become important issues, particularly in Java. You have an object which encapsulates data and processes. You use the object by sending a message and having it return a value back.
Depending on your grammar, a language might evaluate expressions left-to-right, right-to-left, or according to operator precedence.
E ← E + F | E - F | F
F ← F * G | F / G | G
G ← (E) | id
The above grammar would represent "normal" order of operations for multiplication, addition, subtraction, parentheses, and division. C has many more layers of precedence than this.
Posted on November 07, 2005 in Programming Languages | Permalink | Comments (0) | TrackBack (0)
Macros saved time in the early days and even now because it makes for less to read and less to have to copy/paste. Macros are an expansion model. In the early days of computing, it saved punch cards and now it makes programs more readable because you can say something like, for example,
output = sort( input );
instead of repeating the sort() code over and over every time you need to sort. Implementing the macro was trivial because it was nothing more than taking a two-pass assembler and adding a third pass for preprocessing. Now, in a three-pass assembler, you expand macros first, build the symbol tables second, and finally generate the program. We're naming things (abstraction) and moving toward reusability. Initial macros didn't take parameters so they used inline expansion. That was the subprogram implementation for COBOL. By the time we moved onto that higher level language, they were still doing an expansion insertion of code. This doesn't support recursion (you can't just continue expanding because the expansion process usually won't realize that it's only a finite expansion).
When you do a #include in the C language, it's similar to a macro because there is an expansion step in preprocessing that goes out and expands the header files and that is what gets compiled. #define is another preprocessor directive that is substituted everywhere it is used before the code is sent to the compiler.
Formal arguments occur when you define parameters. Actual arguments occur when the function is called. It is in a sense a binding because the actual argument is bound to the formal argument's location. Call-by-value binds the value to the location and pass-by-reference binds the address to the location.
Posted on November 02, 2005 in Programming Languages | Permalink | Comments (0) | TrackBack (0)
| Sun | Mon | Tue | Wed | Thu | Fri | Sat |
|---|---|---|---|---|---|---|
| 1 | 2 | |||||
| 3 | 4 | 5 | 6 | 7 | 8 | 9 |
| 10 | 11 | 12 | 13 | 14 | 15 | 16 |
| 17 | 18 | 19 | 20 | 21 | 22 | 23 |
| 24 | 25 | 26 | 27 | 28 | 29 | 30 |
| 31 |