Opentopia Directory Encyclopedia Tools

Generic programming

Encyclopedia : G : GE : GEN : Generic programming



 

In computer science, generics is a technique that allows one value to take different datatypes (so-called polymorphism) as long as certain contracts such as subtypes and signature are kept. The programming style emphasizing use of this technique is called generic programming.

For example, if one wanted to create a list using generics, a possible declaration would be to say List, where T represented the type. When instantiated, one could create List or List. The list is then treated as a list of whichever type is specified.

Generic facilities first appeared in the 1970s in such languages as CLU and Ada, and were subsequently adopted by many object-based and object-oriented languages, including Eiffel, DEC's now defunct Trellis-Owl language, BETA, C++ and D. The template construct of C++ is widely cited as the generic programming construct that popularized the notion among programmers and language designers. Java has provided generic programming facilities since the introduction of J2SE 5.0. C# 2.0 and Visual Basic .NET 2005 have constructs that take advantage of the support for generics present in the Microsoft .NET Framework since version 2.0.

The ML family of programming languages encourage generic programming through parametric polymorphism and generic modules called functors. The type class mechanism of Haskell supports generic programming.

Dynamic typing, such as is featured in Objective-C, and, if necessary, judicious use of protocols circumvent the need for use of generic programming techniques, since there exists a general type to contain any object. Whilst Java does so also, the casting that needs to be done breaks the discipline of static typing, and generics are one way of achieving some of the benefits of dynamic typing with the advantages of having static typing.

Generics in Ada

Ada has had generics since it was first designed in 1977-1980. The standard library uses generics to provide many services. Ada 2005 adds a comprehensive generic container library to the standard library, which was inspired by C++'s standard template library.

A generic unit is a package or a subprogram that takes one or more generic formal parameters.

A generic formal parameter is a value, a variable, a constant, a type, a subprogram, or even an instance of another, designated, generic unit. For generic formal types, the syntax distinguishes between dicscrete, floating-point, fixed-point, access (pointer) types, etc. Some formal parameters can have default values.

To instantiate a generic unit, the programmer passes actual parameters for each formal. The generic instance then behaves just like any other unit. It is possible to instantiate generic units at run-time, for example inside a loop.

Example

The specification of a generic package:

generic
Max_Size : Natural; -- a generic formal value
type Element_Type is private; -- a generic formal type; accepts any nonlimited type
package Stacks is
type Size_Type is range 0 .. Max_Size;
type Stack is limited private;
procedure Create (S : out Stack;
Initial_Size : in Size_Type := Max_Size);
procedure Push (Into : in out Stack; Element : in Element_Type);
procedure Pop (From : in out Stack; Element : out Element_Type);
Overflow : exception;
Underflow : exception;
private
subtype Index_Type is Size_Type range 1 .. Max_Size;
type Vector is array (Index_Type range <>) of Element_Type;
type Stack (Allocated_Size : Size_Type := 0) is record
Top : Index_Type;
Storage : Vector (1 .. Allocated_Size);
end record;
end Stacks;
Instantiating the generic package:

type Bookmark_Type is new Natural;
-- records a location in the text document we are editing

package Bookmark_Stacks is new Stacks (Max_Size => 20, Element_Type => Bookmark_Type); -- Allows the user to jump between recorded locations in a document

Using an instance of a generic package:

type Document_Type is record
Contents : Ada.Strings.Unbounded.Unbounded_String;
Bookmarks : Bookmark_Stacks.Stack;
end record;

procedure Edit (Document_Name : in String) is Document : Document_Type; begin -- Initialise the stack of bookmarks: Bookmark_Stacks.Create (S => Document.Bookmarks, Initial_Size => 10); -- Now, open the file Document_Name and read it in... end Edit;

Advantages and limitations

The language syntax allows very precise specification of constraints on generic formal parameters. For example, it is possible to specify that a generic formal type will only accept a modular type as the actual. It is also possible to express constraints between generic formal parameters; for example:

generic
type Index_Type is (<>); -- must be a discrete type
type Element_Type is private; -- can be any nonlimited type
type Array_Type is array (Index_Type range <>) of Element_Type;
In this example, Array_Type is constrained by both Index_Type and Element_Type. When instantiating the unit, the programmer must pass an actual array type that satisfies these constraints.

The disadvantage of this fine-grained control is a complicated syntax, but, because all generic formal parameters are completely defined in the specification, the compiler can instantiate generics without looking at the body of the generic.

Unlike C++, Ada does not allow specialised generic instances, and requires that all generics be instantiated explicitly. These rules have several consequences:

Templates in C++

Templates are of great utility to programmers in C++, especially when combined with multiple inheritance and operator overloading. The C++ Standard Template Library (STL) provides many useful functions within a framework of connected templates.

As the templates in C++ are very expressive they may be used for things other than generic programming. One such use is called template metaprogramming, which is a way of pre-evaluating some of the code at compile-time rather than run-time. Further discussion here only relates to templates as a method of generic programming.

Technical overview

There are two kinds of templates: function templates and class templates. A function template behaves like an ordinary function, but that can accept arguments of many different possibly unrelated types. For example, the C++ Standard Template Library contains the function template max(x, y) which returns either x or y, whichever is larger. max() could be defined like this:

template 
T max(T x, T y)

This template can be called just like a function:

cout << max(3, 7);   // outputs 7
The compiler determines by examining the arguments that this is a call to max(int, int) and instantiates a version of the function where the type T is int.

This works whether the arguments x and y are integers, strings, or any other type for which it makes sense to say "x < y". There does not need to be any common inheritance for the set of types that can be used, and so it is actually a form of static duck typing. If a program defines a custom data type, all it needs to do is to use operator overloading to define the meaning of < for that type, thus allowing it to be used by the max() function. While this may seem a minor benefit in this isolated example, in the context of a comprehensive library like the STL it allows the programmer to get extensive functionality for a new data type, just by defining a few operators for it. Merely defining < allows a type to be used with the standard sort(), stable_sort(), and binary_search() algorithms; or put inside data structures such as sets, heaps, and associative arrays; and more.

C++ templates are completely type safe at compile time. As a demonstration, the standard type complex does not define the < operator, because there is no strict order on complex numbers. Therefore max(x, y) will fail with a compile error if x and y are complex values. Likewise, other templates that rely on < cannot be applied to complex data. Unfortunately, compilers historically generate somewhat esoteric and unhelpful error messages for this sort of error. Ensuring that a certain object adheres to a method protocol can alleviate this issue.

The second kind of template, a class template extends the same concept to classes. Class templates are often used to make generic containers. For example, the STL has a linked list container. To make a linked list of integers, one writes list<int>. A list of strings is denoted list<string>. A list has a set of standard functions associated with it, which work no matter what you put between the brackets.

Template specialization

A powerful feature of C++'s templates is template specialization. This allows alternative implementations to be provided based upon certain characteristics of the parameterized type that is being instantiated. Template specialization has two purposes, to allow certain forms of optimization, and to help reduce code bloat.

For example, consider a sort() template function. One of the primary activities that such a function does is to swap or exchange the values in two of the container's positions. If the values are large (in terms of the number of bytes it takes to store each of them), then it is often quicker to first build a separate list of pointers to the objects, sort those pointers, and then build the final sorted sequence. If the values are quite small though it is usually fastest to just swap the values in-place as needed. Furthermore if the parameterized type is already of some pointer-type, then there is no need to build a separate pointer array. Template specialization allows the template creator to write several different implementations and to specify the characteristics that the parameterized type(s) must have for each implementation to be used.

Advantages and disadvantages

Some uses of templates, such as the max() function, were previously filled by function-like preprocessor macros (a legacy of the C programming language). For example, here is a possible max() macro:

#define max(a,b)   ((a) < (b) ? (b) : (a))
Both macros and templates are expanded at compile time. Macros are always expanded inline; templates can also be expanded as inline functions when the compiler deems it appropriate. Thus both function-like macros and function templates have no run-time overhead.

However, templates are generally considered an improvement over macros for these purposes. Templates are type-safe. Templates avoid some of the common errors found in code that makes heavy use of function-like macros. Perhaps most importantly, templates were designed to be applicable to much larger problems than macros.

There are three primary drawbacks to the use of templates: compiler support, poor error messages, and code bloat. Many compilers historically have very poor support for templates, so the use of templates can make code somewhat less portable. Support may also be poor when a C++ compiler is being used with a linker which is not C++-aware, or when attempting to use templates across shared library boundaries. Most modern compilers though now have fairly robust and standard template support.

Almost all compilers produce confusing, long, or sometimes unhelpful error messages when errors are detected in code that uses templates. This can make templates difficult to develop.

Finally, the use of a templates may cause the compiler to generate extra code (an instantiation of the template), so the indiscriminate use of templates can lead to code bloat, resulting in excessively large executables. However, judicious use of template specialization can dramatically reduce such code bloat in some cases. The extra instantiations generated by templates can also cause debuggers to have difficulty working gracefully with templates. For example, setting a debug breakpoint within a template from a source file may either miss setting the breakpoint in the actual instantiation desired or may set a breakpoint in every place the template is instantiated.

Templates in D

The D programming language supports templates meta programming. Like C++ templates, they can be used for simple generic programming or advanced metaprogramming.

The basic premise in D's philosophy about templates is that many benefits of templates in C++ where "discovered" rather than "designed". Thus, D takes a step back, looks at the benefits of templates and what they can be used for, and redesignes then. The result is that D's templates are much simpler than C++ templates, yet they can be more powerful in many aspects.

For more details, see the article "Templates Revisited", by Walter Bright. http://www.digitalmars.com/d/templates-revisited.html

The most obvious difference between D's syntax and C++'s syntax is that D doesn't use < > at all for definition and instantiation of templates. Instead, D uses ! as binary operator for template instntiation, thus a!(b); is the equivelant of a<b>; in C++.
For definition, D uses the plain old parenthesis ( ).

template Foo(T)

The max template function in D

The max() template function used in the C++ example might be implemented in D as follows.
T max (T) (T a, T b) 
}
return a < b ? b : a ;
}

Note that it provides the further feature of throwing an exception when trying to compare unordered data, which is only included in the expanded and instantiated template when the type of T is implicitly castable to one of real, ireal, creal. This triggers then, when T is any one of float, double, real, ifloat, idouble, ireal, cfloat, cdouble, creal, all being floating-point/fractional data types and capable of unordered state (such as NaN and infinity). This would also be called as a normal function would.

writefln(max(3, 7));

Generics in Java

Generics were added to the Java programming language in 2004 as part of J2SE 5.0. Unlike C++ templates, generic Java code generates only one compiled version of a generic class. Generic Java classes can only use object types as type parameters—primitive types are not allowed. Thus a List<Integer> is legal, while a List<int> is not.

In Java, generics are checked at compile time for type correctness. The generic type information is then removed via a process called type erasure, leaving a non-generic version of the code. For example, List<Integer> will be converted into a non-generic (raw) List, which can contain arbitrary objects. However, due to the compile-time check, the resulting code is guaranteed to be type correct, as long the code generated no unchecked compiler warnings.

One side-effect of this process is that the generic type information is not known at runtime. Thus, at runtime, a List<Integer> and a List<String> refer to the same class, List. One way to mitigate this side effect is through the use of Java's Collections.checkedSet() method, which will decorate the declared Collection and check for improper use (i.e. insertion of an inappropriate type) of a typed Collection at runtime. This can be useful in situations when legacy code is interoperating with code that makes use of generics.

Wildcards

Generic type parameters in Java are not limited to specific classes. Java allows the use of wildcards to specify bounds on what type of parameters a given generic object may have. For example, List<?> indicates a list which has an unknown object type. Methods which take such a list as an argument could take any type of list. Reading from the list will return objects of type Object, and writing non-null elements to the list is not allowed, since the parameter type is not known.

To specify the upper bound of a generic element, the extends keyword is used, which indicates that the generic type is a subclass (either extends the class, or implements the interface) of the bounding class. So List<? extends Number> means that the given list contains objects which extend the Number class; for example, the list could be List<Float> or List<Number>. Thus, reading an element from the list will return a Number, while writing non-null elements is once again not allowed, since it is not known what type of element the list holds.

To specify the lower bound of a generic element, the super keyword is used, which indicates that the generic type is a superclass of the bounding class. So List<? super Number> could be List<Number> or List<Object>. Reading from the list returns objects of type Object, while any element of type Number can be added to the list, since it is guaranteed to be a valid type to store in the list.

Limitations

A limitation of the Java implementation of generics makes it impossible to create an array of a generic type, since there is no way to determine what the array type should be. Thus if a method had a type parameter T, the programmer cannot create a new array of that type, such as via new T[size]. (It is possible to work around this limitation using Java's reflection mechanisms: if an instance of class T is available, one can obtain from that object the Class object corresponding to T and use java.lang.reflect.Array.newInstance to create the array.) Another limitation of the Java implementation of generics is that it is impossible to create an array of a generic class with a type parameter type other than <?>. This is due to the way arrays are handled in the language, and is necessary to assure that all code that doesn't cause compile warnings without using explicit casts is guaranteed to be type safe.

Generic Programming in Haskell

The Haskell programming language has parameterized types, parametric polymorphism and type classes, which together support a style of programming somewhat similar to what is possible with Java generics or common usage idioms of C++ templates. Use of these constructs in Haskell programs is ubiquitous and even difficult to avoid. Haskell also has some more unusual generic programming features, which have inspired programmers and language designers to strive for even more genericity and code re-use than polymorphism can provide.

Six of the predefined type classes in Haskell (including Eq, the types that can be compared for equality, and Show, the types whose values can be rendered as strings) have the special property of supporting derived instances. This means that a programmer defining a new type can state that this type is to be an instance of one of these special type classes, without providing implementations of the class methods as is usually necessary when declaring class instances. All the necessary methods will be "derived" -- that is, constructed automatically -- based on the structure of the type. For instance, the following declaration of a type of binary trees states that it is to be an instance of the classes Eq and Show:

data BinTree a = Leaf a | Node (BinTree a) a (Bintree a)
deriving (Eq, Show)
This results in an equality function (==) and a string representation function (show) being automatically defined for any type of the form BinTree T provided that T itself supports those operations.

The support for derived instances of Eq and Show makes their methods == and show generic in a qualitatively different way from parametrically polymorphic functions: these "functions" (more accurately, type-indexed families of functions) can be applied to values of many different types, and although they behave differently for every argument type, very little work is needed to add support for a new type. Ralf Hinze (2004) has shown that a similar effect can be achieved for user-defined type classes by certain programming techniques. Many other researchers have proposed approaches to this and other kinds of genericity in the context of Haskell and extensions to Haskell (discussed below).

Generic programming features in other languages

Templates were left out of C#, largely due to the problems with templates in C++. However, C# is currently adopting generic programming features comparable to those of Java. One major difference is that generic information in C# is kept at runtime, rather than being erased at compile time.[link]

Many functional programming languages support small-scale generic programming in the form of parameterized types and parametric polymorphism. In addition, Standard ML and OCaml provide functors, which are similar to class templates and to Ada's generic packages.

Recent Developments

As of 2006, the development of language constructs and programming techniques to support a greater degrees of genericity and code reuse, particularly in functional programming languages, is an active research area. Many of the recent contributions are presented as extensions to Haskell and rely to some extent on that language's type class mechanism.

PolyP

PolyP was the first generic programming language extension to Haskell. In PolyP, generic functions are called polytypic. The language introduces a special construct in which such polytypic functions can be defined via structural induction over the structure of the pattern functor of a regular datatype. Regular datatypes in PolyP are a subset of Haskell datatypes. A regular datatype t must be of kind * → *, and if a is the formal type argument in the definition, then all recursive calls to t must have the form t a. These restrictions rule out higher kinded datatypes as well as nested datatypes, where the recursive calls are of a different form. The flatten function in PolyP is here provided as an example:
flatten :: Regular d => d a -> [a]
flatten = cata fl

polytypic fl :: f a [a] -> [a] case f of g+h -> either fl fl g*h -> \(x,y) -> fl x ++ fl y () -> \x -> [] Par -> \x -> [x] Rec -> \x -> x d@g -> concat . flatten . pmap fl Con t -> \x -> []

cata :: Regular d => (FunctorOf d a b -> b) -> d a -> b

Generic Haskell

Generic Haskell is another extension to Haskell, developed at Utrecht University. The extensions it provides are: The resulting type-indexed value can be specialised to any type. As an example, the equality function in Generic Haskell:
type Eq  t1 t2 = t1 -> t2 -> Bool
type Eq  t1 t2 = forall u1 u2. Eq  u1 u2 -> Eq  (t1 u1) (t2 u2)

eq
> :: Eq t t eq
> _ _ = True eq
> eqA eqB (Inl a1) (Inl a2) = eqA a1 a2 eq
> eqA eqB (Inr b1) (Inr b2) = eqB b1 b2 eq
> eqA eqB _ _ = False eq
> eqA eqB (a1 :*: b1) (a2 :*: b2) = eqA a1 a2 && eqB b1 b2 eq
> = (==) eq
> = (==) eq
> = (==)

The \"Scrap your boilerplate\" approach

The Scrap your boilerplate approach is a lightweight generic programming approach for Haskell (Lämmel and Peyton Jones, 2003). The approach is supported in the GHC >= 6.0 implementation of Haskell. Using this approach, the programmer can write generic functions such as traversal schemes (e.g., everywhere and everything), as well as generic read, generic show and generic equality (i.e., gread, gshow, and geq). This approach is based on just a few primitives for type-safe cast and processing constructor applications.

See also

External references and links

 


From Wikipedia, the Free Encyclopedia. Original article here. Support Wikipedia by contributing or donating.
All text is available under the terms of the GNU Free Documentation License See Wikipedia Copyrights for details.


Search Titles
0123456789
ABCDEFGHIJ
KLMNOPQRST
UVWXYZ?

E-mail this article to:

Personal Message: