Documentation API of the 'org.clapper.util.misc.FileHashMap' Java class
FileHashMap
org.clapper.util.misc

Class FileHashMap<K,V>

  • All Implemented Interfaces:
    Map<K,V>


    public class FileHashMap<K,V>extends AbstractMap<K,V>

    FileHashMap implements a java.util.Map that keeps the keys in memory, but stores the values as serialized objects in a random access disk file. When an application attempts to access the value associated with a given key, the FileHashMap object determines the location of the serialized value object, seeks to that location in the random access file, and reads and deserializes the value object. In a sense, a FileHashMap is akin to a simple, classic indexed sequential file. This approach gives a FileHashMap object the following characteristics:

    • Because the map keys are cached in memory, access to the key space is relatively efficient.
    • Since the values are stored in a file, access to the values is slower than if they were in memory, but you can store a lot more data in a FileHashMap object than in a wholly memory-resident object, such as a HashMap or TreeMap.

    File Name Conventions

    A FileHashMap is instantiated with a base file name (i.e., a file name without an extension). This base file name specifies the path to the file(s) that used to store the map's data; the FileHashMap tacks the following extensions onto the prefix to arrive at the actual file names.

    Extension Meaning
    .ix The saved in-memory index. This file is created only if the FileHashMap is not marked as transient. (See below.) The INDEX_FILE_SUFFIX constant defines this string.
    .db The on-disk data (value) file, where the serialized objects are stored. The DATA_FILE_SUFFIX constant defines this string.

    For instance, if you create a FileHashMap with the statement

     Map map = new FileHashMap("/tmp/mymap"); 

    the serialized value objects will be stored in file "/tmp/mymap.db", and the index (if saved) will be stored in "/tmp/mymap.ix".

    Transient versus Persistent Maps

    A FileHashMap is persistent by default. When a FileHashMap object is finalized or explicitly closed (using the close() method), its in-memory index is saved to disk. You can reopen the saved map by instantiating another FileHashMap object, and specifying the same file prefix. The new FileHashMap object will load its initial in-memory index from the saved index; any modifications to the new object will be written back to the on-disk index with the new object finalized or closed.

    A FileHashMap can be marked as non-persistent, or transient, by passing the TRANSIENT flag to the constructor. A transient map is not saved; its disk files are removed when the map is finalized or manually closed. You cannot create a transient map using a file prefix that specifies an existing, saved map. That is, the disk files for a transient map must not exist at the time the map is created.

    Optimizations

    The FileHashMap class attempts to optimize access to the disk-resident values and to conserve memory use. These optimizations include the following specific measures.

    Sequential access to values

    The iterators attempt to minimize file seeking while looping over the stored values. The values() method returns a Set whose iterator loops through the values in the order they were written to the data file. Traversing the value set via the iterator will access the FileHashMap's data file sequentially, from top to bottom. For example, the following code fragment loops through a FileHashMap's values; because the iterator returns the values in the order they appear in the data file, the code fragment accesses the data file sequentially:

     FileHashMap map = new FileHashMap("myfile"); for (Iterator values = map.values().iterator(); values.hasNext(); ) {     Object value = it.next();     System.out.println(value); } 

    Similarly, the Set objects returned by the keySet() and entrySet() methods use iterators that return the keys in the order that the associated values appear in the disk file. For example, the following code fragment also accesses the data file sequentially:

     FileHashMap map = new FileHashMap("myfile"); for (Iterator keys = map.keySet().iterator(); keys.hasNext(); ) {     Object key   = it.next();     Object value = map.get(key);     System.out.println("key=" + key + ", value=" + value); } 

    Note that this optimization strategy can be foiled in a number of ways. For instance, the sequential access behavior can be thwarted if a second thread is accessing the map while the first thread is iterating over it, or if the thread that's iterating over the values is simultaneously inserting values in the table.

    Accessing the keys, without reading the values, does not result in any file access at all, because the keys are cached in memory. See the next section for more details.

    Memory Conservation

    The values stored in the map are serialized and written to a data file; they are not cached in memory at all. Therefore, you can store a lot more objects in a FileHashMap before running out of memory (assuming you have enough disk space available).

    The keys are stored in memory, however. Each key is associated with a special internal object that keeps track of the location and length of the associated value in the disk file. As you place more items in a FileHashMap, the index will grow, consuming memory. But it will consume far less memory than if you use an in-memory Map, such as a java.util.HashMap, which stores the keys and the values in memory.

    The Iterator and Set objects returned by the entrySet() and values() methods contain proxy objects, not real values. That is, they do not actually contain any values. Instead, they contain proxies for the values, objects that specify the locations of the values in the disk file. A value is loaded from disk only when you actually attempt to retrieve it from the Iterator or Set.

    Reclaiming Gaps in the File

    Normally, when you remove an object from the map, the space where the object was stored in the data file is not reclaimed. This strategy allows for faster insertions, since new objects are always added to the end of the disk file. However, for long-lived FileHashMap objects that are periodically modified, this strategy may not be appropriate. For that reason, you can pass a special RECLAIM_FILE_GAPS flag to the constructor. If specified, the flag tells the object to keep track of "gaps" in the file, and reuse them if possible. When a new object is inserted into the map, and RECLAIM_FILE_GAPS is enabled, the object will attempt to find the smallest unused area in the file to accommodate the new object. It will only add the new object to the end of the file (enlarging the file) if it cannot find a suitable gap.

    This mode is not the default, because it can add time to processing. However, it does not access the file at all; the file gap maintenance logic uses in-memory data only. So, while it adds a small amount of computational overhead, the difference between running with RECLAIM_FILE_GAPS enabled and running with it disabled should not be dramatic.

    Restrictions

    This class currently has the following restrictions and unimplemented behavior.

    • An object cannot be stored in a FileHashMap unless it implements java.io.Serializable.
    • The maximum size of a serialized stored object is confined to a 32-bit integer. This restriction is unlikely to cause anyone problems, and it keeps the keyspace down.
    • To prevent multiple Java VMs from updating the file containing the serialized values, this class detect throws an exception if it detects an attempted concurrent modification of a map. (Locking the map, then synchronizing the in-memory indexes across multiple updaters, is non-trivial. The most obvious method that leaps to mind, mapping the underlying java.nio.channels.FileChannel object to map the index file into memory, isn't guaranteed to work. Cccording to the JDK documentation, "Whether changes made to the content or size of the underlying file, by this program or another, are propagated to the [mapped memory] buffer is unspecified. The rate at which changes to the buffer are propagated to the file is unspecified.")

Warning: You cannot see the full API documentation of this class since the access to the DatMelt documentation for third-party Java classes is denied. Guests can only view jhplot Java API. To view the complete description of this class and its methods, please request the full DataMelt membership.

If you are already a full member, please login to the DataMelt member area before visiting this documentation.