JythonTutorial:9 Collections

From jWork.ORG
Jump to: navigation, search
Limitted access. Reguest membership or login to this link first if you are already a member
Contents


Collections

Collections are designed to keep multiple numbers, strings and objects in a single object representing all objects as a single group.

Lists

A list a data container which keeps values (numbers, strings, objects). The values should have the same type, which can be checked with the command type(value). Values in a list are numbered, starting from zero - the first one is numbered zero, the second 1, the third 2, etc. Let us make a list of numbers:


L=[1,2,3,4,5]
print L

The list can be written as a list of comma-separated values (items) between square brackets.

In fact, one can first specify an empty list and then add the numbers using the append method.


L=[]          # define an empty list
L.append(1)   # append value 1
L.append(2)
L.append(3)
print L

We can calculate the length of this list using the function len(). Also you can access any element of the list using it's index:


L=[10,20,30,40,50]
print len(L)  # print the length of this list
print L[2]    # print the value at the position 2

Often we need to create a large lists, in which case typing a long list become tedious. In this case use the statement range() It takes at least 2 parameters: first the number at which the list it returns should start, and second, the number up to which the list should go.


L=range(5,20)
print L

range() takes an optimal third parameter that specifies the step between each number in the list:


L=range(0,5,2)
print L

Looping Over Lists

The for-in statement makes it easy to loop over the items in a list:


L=["ok","no","yes"]
for item in L:
     print item

If you need both the index and the item, use the enumerate function:


L=["ok","no","yes"]
for index, item in enumerate(L):
        print index, item

If you need only the index, use range and len:


L=["ok","no","yes"]
for index in range(len(L)):
             print index

Python provides various shortcuts for common list operations. For example, if a list contains numbers, the built-in sum function gives you the sum:


L=[1,2,3,4,5]
print sum(L)

If a list contains strings, you can combine the string into a single long string using the join string method:


L=["1","2","3","4","5"]
s = ''.join(L)
print s

Python also provides built-in operations to search for items, and to sort the list.

Modifying Lists

The list type also allows you to assign to individual items or slices, and to delete them.


L=["ok","non","again"]
L[1] = "test"
L[1:2] =["1","2"]
print L

Making a copy

To create a separate list, you can use slicing or the list function to quickly create a copy:


L = [1,2,3,4,5]
M = L[:] # create a copy
print M
M.extend([6,7,8])
M.insert(0,10)
print M

The append method adds a single item to the end of the list, the extend method adds items from another list (or any sequence) to the end, and insert inserts an item at a given index, and move the remaining items to the right.

To insert items from another list or sequence at some other location, use slicing syntax:

You can also remove items. The del statement can be used to remove an individual item, or to remove all items identified by a slice. The pop method removes an individual item and returns it, while remove searches for an item, and removes the first matching item from the list.


L=[1,2,3,4,5,6,7,8]
del L[1]
del L[5:8]
print L

The list type allows you to quickly reverse the order of the list, L.reverse()

/*

   out = []
   for object in L:
       out.append(function(object))

The above can be better written using either the built-in map function, or as a list comprehension:

   out = map(function, L)
   out = [function(object) for object in L]

For straightforward function calls, the map solution is more efficient, since the function object only needs to be fetched once. For other constructs (e.g. expressions or calls to object methods), you have to use a callback or a lambda to wrap the operation; in such cases, the list comprehension is more efficient, and usually also easier to read.


Again, if you need both the item and the index, use enumerate:

   out = [function(index, object) for index, object in enumerate(L)]

You can use the list type to implement simple data structures, such as stacks and queues.

   stack = []
   stack.append(object) # push
   object = stack.pop() # pop from end
   queue = []
   queue.append(object) # push
   object = queue.pop(0) # pop from beginning

The list type isn’t optimized for this, so this works best when the structures are small (typically a few hundred items or smaller). For larger structures, you may need a specialized data structure, such as collections.deque.

Another data structure for which a list works well in practice, as long as the structure is reasonably small, is an LRU (least-recently-used) container. The following statements moves an object to the end of the list:

   lru.remove(item)
   lru.append(item)

If you do the above every time you access an item in the LRU list, the least recently used items will move towards the beginning of the list. (for a simple cache implementation using this approach, see Caching.) Searching Lists #

The in operator can be used to check if an item is present in the list:

   if value in L:
       print "list contains", value

To get the index of the first matching item, use index:

   i = L.index(value)

The index method does a linear search, and stops at the first matching item. If no matching item is found, it raises a ValueError exception.

   try:
       i = L.index(value)
   except ValueError:
       i = -1 # no match

To get the index for all matching items, you can use a loop, and pass in a start index:

   i = -1
   try:
       while 1:
           i = L.index(value, i+1)
           print "match at", i
   except ValueError:
       pass

Moving the loop into a helper function makes it easier to use:

   def findall(L, value, start=0):
       # generator version
       i = start - 1
       try:
           i = L.index(value, i+1)
           yield i
       except ValueError:
           pass
   for index in findall(L, value):
       print "match at", i

To count matching items, use the count method:

   n = L.count(value)

Note that count loops over the entire list, so if you just want to check if a value is present in the list, you should use in or, where applicable, index.

To get the smallest or largest item in a list, use the built-in min and max functions:

   lo = min(L)
   hi = max(L)

As with sort (see below), you can pass in a key function that is used to map the list items before they are compared:

   lo = min(L, key=int)
   hi = max(L, key=int)

Sorting Lists #

The sort method sorts a list in place.

   L.sort()

To get a sorted copy, use the built-in sorted function:

   out = sorted(L)

An in-place sort is slightly more efficient, since Python does not have to allocate a new list to hold the result.

By default, Python’s sort algorithm determines the order by comparing the objects in the list against each other. You can override this by passing in a callable object that takes two items, and returns -1 for “less than”, 0 for “equal”, and 1 for “greater than”. The built-in cmp function is often useful for this:

   def compare(a, b):
       return cmp(int(a), int(b)) # compare as integers
   L.sort(compare)
   def compare_columns(a, b):
       # sort on ascending index 0, descending index 2
       return cmp(a[0], b[0]) or cmp(b[2], a[2])
   out = sorted(L, compare_columns)

Alternatively, you can specify a mapping between list items and search keys. If you do this, the sort algorithm will make one pass over the data to build a key array, and then sort both the key array and the list based on the keys.

   L.sort(key=int)
   out = sorted(L, key=int)

If the transform is complex, or the list is large, this can be a lot faster than using a compare function, since the items only have to be transformed once.

Python’s sort is stable; the order of items that compare equal will be preserved. Printing Lists #

By default, the list type does a repr on all items, and adds brackets and commas as necessary. In other words, for built-in types, the printed list looks like the corresponding list display:

   print [1, 2, 3] # prints [1, 2, 3]

To control formatting, use the string join method, combined with either map or a list comprehension or generator expression.

   print "".join(L) # if all items are strings
   print ", ".join(map(str, L))
   print "|".join(str(v) for v in L if v > 0)

To print a list of string fragments to a file, you can use writelines instead of write:

   sys.stdout.writelines(L) # if all items are strings

Performance Notes #

The list object consists of two internal parts; one object header, and one separately allocated array of object references. The latter is reallocated as necessary.

The list has the following performance characteristics:

   The list object stores pointers to objects, not the actual objects themselves. The size of a list in memory depends on the number of objects in the list, not the size of the objects.
   The time needed to get or set an individual item is constant, no matter what the size of the list is (also known as “O(1)” behaviour).
   The time needed to append an item to the list is “amortized constant”; whenever the list needs to allocate more memory, it allocates room for a few items more than it actually needs, to avoid having to reallocate on each call (this assumes that the memory allocator is fast; for huge lists, the allocation overhead may push the behaviour towards O(n*n)).
   The time needed to insert an item depends on the size of the list, or more exactly, how many items that are to the right of the inserted item (O(n)). In other words, inserting items at the end is fast, but inserting items at the beginning can be relatively slow, if the list is large.
   The time needed to remove an item is about the same as the time needed to insert an item at the same location; removing items at the end is fast, removing items at the beginning is slow.
   The time needed to reverse a list is proportional to the list size (O(n)).
   The time needed to sort a list varies; the worst case is O(n log n), but typical cases are often a lot better than that. 
  • /

Tuples

Tuples are just like lists, but you can't change their values. Again, each value is numbered starting from zero, for easy reference.


months = ('January','February','March','April','May')
print months

Note we use parentheses rather than square brackets.


Sometime tuples are used to associate some values in pairs. In this example we made a list with pairs. Each pair is tuple (so cannot be changed).


stop=1
attention=2
go=3
pairs = [(stop, 'red'), (attention, 'yellow'), (go, 'green')]
print pairs

Here we associated integer values with strings. How we can work with such compound object? As usual, one can loop over each item (pair) and print it:



stop=1
attention=2
go=3
pairs = [(stop, 'red'), (attention, 'yellow'), (go, 'green')]
for j in pairs:
     print j         # we print pairs
     print j[0],j[1] # print each value in the pair

Dictionaries

Dictionaries are similar to what their name suggests - a dictionary. In a dictionary, we have an 'index' of words, and for each of them a definition. In Python, the word is called a 'key', and the definition a 'value'. The values in a dictionary are not numbered, unlike the lists.

Let us make a small dictionary:


data = dict()
data['hello'] = 'hola'
data['yes'] = 'si'
data['one'] = 'uno'
data['two'] = 'dos'
print data

What do you see?

To retrieve a value corresponding to a given key, just use the key:


data = dict()
data['hello'] = 'hola'
data['yes'] = 'si'
data['one'] = 'uno'
print data['hello']

What do you see?

A dictionary is defined using a shortcut {} instead of calling dict():


data = {}
data['hello'] = 'hola'
print data

One can also create a dictionary during its initialisation:


data = {'hello':'hola'}
print data


Let us now make a loop over keys of a dictionary:



data = {}
data['hello'] = 'hola'
data['yes'] = 'si'
data['one'] = 'uno'
data['two'] = 'dos'
for key, value in data.iteritems():
    print "%s -> %s" % (key, value)


Sets

A set is an unordered collection of objects that cannot have duplicate members. They are similar to lists, except that they are unordered (like dictionaries). Sets are useful because they do not allow duplicate entries and they can be used for mathematical comparisons.



set = set([1, 2, 3, 4, 5, 6, 7, 8, 9])
print set