FileHashMap
org.clapper.util.misc

Class FileHashMap<K,V>

  • All Implemented Interfaces:
    Map<K,V>


    public class FileHashMap<K,V>extends AbstractMap<K,V>

    FileHashMap implements a java.util.Map that keeps the keys in memory, but stores the values as serialized objects in a random access disk file. When an application attempts to access the value associated with a given key, the FileHashMap object determines the location of the serialized value object, seeks to that location in the random access file, and reads and deserializes the value object. In a sense, a FileHashMap is akin to a simple, classic indexed sequential file. This approach gives a FileHashMap object the following characteristics:

    • Because the map keys are cached in memory, access to the key space is relatively efficient.
    • Since the values are stored in a file, access to the values is slower than if they were in memory, but you can store a lot more data in a FileHashMap object than in a wholly memory-resident object, such as a HashMap or TreeMap.

    File Name Conventions

    A FileHashMap is instantiated with a base file name (i.e., a file name without an extension). This base file name specifies the path to the file(s) that used to store the map's data; the FileHashMap tacks the following extensions onto the prefix to arrive at the actual file names.

    Extension Meaning
    .ix The saved in-memory index. This file is created only if the FileHashMap is not marked as transient. (See below.) The INDEX_FILE_SUFFIX constant defines this string.
    .db The on-disk data (value) file, where the serialized objects are stored. The DATA_FILE_SUFFIX constant defines this string.

    For instance, if you create a FileHashMap with the statement

     Map map = new FileHashMap("/tmp/mymap"); 

    the serialized value objects will be stored in file "/tmp/mymap.db", and the index (if saved) will be stored in "/tmp/mymap.ix".

    Transient versus Persistent Maps

    A FileHashMap is persistent by default. When a FileHashMap object is finalized or explicitly closed (using the close() method), its in-memory index is saved to disk. You can reopen the saved map by instantiating another FileHashMap object, and specifying the same file prefix. The new FileHashMap object will load its initial in-memory index from the saved index; any modifications to the new object will be written back to the on-disk index with the new object finalized or closed.

    A FileHashMap can be marked as non-persistent, or transient, by passing the TRANSIENT flag to the constructor. A transient map is not saved; its disk files are removed when the map is finalized or manually closed. You cannot create a transient map using a file prefix that specifies an existing, saved map. That is, the disk files for a transient map must not exist at the time the map is created.

    Optimizations

    The FileHashMap class attempts to optimize access to the disk-resident values and to conserve memory use. These optimizations include the following specific measures.

    Sequential access to values

    The iterators attempt to minimize file seeking while looping over the stored values. The values() method returns a Set whose iterator loops through the values in the order they were written to the data file. Traversing the value set via the iterator will access the FileHashMap's data file sequentially, from top to bottom. For example, the following code fragment loops through a FileHashMap's values; because the iterator returns the values in the order they appear in the data file, the code fragment accesses the data file sequentially:

     FileHashMap map = new FileHashMap("myfile"); for (Iterator values = map.values().iterator(); values.hasNext(); ) {     Object value = it.next();     System.out.println(value); } 

    Similarly, the Set objects returned by the keySet() and entrySet() methods use iterators that return the keys in the order that the associated values appear in the disk file. For example, the following code fragment also accesses the data file sequentially:

     FileHashMap map = new FileHashMap("myfile"); for (Iterator keys = map.keySet().iterator(); keys.hasNext(); ) {     Object key   = it.next();     Object value = map.get(key);     System.out.println("key=" + key + ", value=" + value); } 

    Note that this optimization strategy can be foiled in a number of ways. For instance, the sequential access behavior can be thwarted if a second thread is accessing the map while the first thread is iterating over it, or if the thread that's iterating over the values is simultaneously inserting values in the table.

    Accessing the keys, without reading the values, does not result in any file access at all, because the keys are cached in memory. See the next section for more details.

    Memory Conservation

    The values stored in the map are serialized and written to a data file; they are not cached in memory at all. Therefore, you can store a lot more objects in a FileHashMap before running out of memory (assuming you have enough disk space available).

    The keys are stored in memory, however. Each key is associated with a special internal object that keeps track of the location and length of the associated value in the disk file. As you place more items in a FileHashMap, the index will grow, consuming memory. But it will consume far less memory than if you use an in-memory Map, such as a java.util.HashMap, which stores the keys and the values in memory.

    The Iterator and Set objects returned by the entrySet() and values() methods contain proxy objects, not real values. That is, they do not actually contain any values. Instead, they contain proxies for the values, objects that specify the locations of the values in the disk file. A value is loaded from disk only when you actually attempt to retrieve it from the Iterator or Set.

    Reclaiming Gaps in the File

    Normally, when you remove an object from the map, the space where the object was stored in the data file is not reclaimed. This strategy allows for faster insertions, since new objects are always added to the end of the disk file. However, for long-lived FileHashMap objects that are periodically modified, this strategy may not be appropriate. For that reason, you can pass a special RECLAIM_FILE_GAPS flag to the constructor. If specified, the flag tells the object to keep track of "gaps" in the file, and reuse them if possible. When a new object is inserted into the map, and RECLAIM_FILE_GAPS is enabled, the object will attempt to find the smallest unused area in the file to accommodate the new object. It will only add the new object to the end of the file (enlarging the file) if it cannot find a suitable gap.

    This mode is not the default, because it can add time to processing. However, it does not access the file at all; the file gap maintenance logic uses in-memory data only. So, while it adds a small amount of computational overhead, the difference between running with RECLAIM_FILE_GAPS enabled and running with it disabled should not be dramatic.

    Restrictions

    This class currently has the following restrictions and unimplemented behavior.

    • An object cannot be stored in a FileHashMap unless it implements java.io.Serializable.
    • The maximum size of a serialized stored object is confined to a 32-bit integer. This restriction is unlikely to cause anyone problems, and it keeps the keyspace down.
    • To prevent multiple Java VMs from updating the file containing the serialized values, this class detect throws an exception if it detects an attempted concurrent modification of a map. (Locking the map, then synchronizing the in-memory indexes across multiple updaters, is non-trivial. The most obvious method that leaps to mind, mapping the underlying java.nio.channels.FileChannel object to map the index file into memory, isn't guaranteed to work. Cccording to the JDK documentation, "Whether changes made to the content or size of the underlying file, by this program or another, are propagated to the [mapped memory] buffer is unspecified. The rate at which changes to the buffer are propagated to the file is unspecified.")
    • Field Summary

      Fields 
      Modifier and TypeField and Description
      static StringDATA_FILE_SUFFIX
      Data file suffix.
      static intFORCE_OVERWRITE
      Constructor flag value: Ignored unless TRANSIENT is also specified, FORCE_OVERWRITE tells the constructor to overwrite the on-disk files if they already exist, instead of throwing an exception.
      static StringINDEX_FILE_SUFFIX
      Index file suffix.
      static intNO_CREATE
      Constructor flag value: If specified, the disk files will not be created if they don't exist; instead, the constructor will throw an exception.
      static intRECLAIM_FILE_GAPS
      Constructor flag value: Tells the object to reclaim unused space in the file, whenever possible.
      static intTRANSIENT
      Constructor flag value: If specified, the on-disk hash table is considered to be transient and will be removed when this object is closed or finalized.
    • Constructor Summary

      Constructors 
      Constructor and Description
      FileHashMap()
      Create a new transient FileHashMap object.
      FileHashMap(String tempFilePrefix)
      Create a new transient FileHashMap object.
      FileHashMap(String pathPrefix, int flags)
      Create a new FileHashMap object that will read its data from and/or store its data in files derived from the specified prefix.
    • Method Summary

      Methods 
      Modifier and TypeMethod and Description
      voidclear()
      Removes all mappings from this map.
      voidclose()
      Close this map.
      booleancontainsKey(Object key)
      Returns true if this map contains a mapping for the specified key.
      booleancontainsValue(Object value)
      Returns true if this map maps one or more keys that are mapped to the specified value.
      voiddelete()
      Deletes the files backing this FileHashMap.
      Set<Map.Entry<K,V>>entrySet()
      Returns a "thin" set view of the mappings contained in this map.
      booleanequals(Object o)
      Compares the specified object with this map for equality.
      Vget(Object key)
      Returns the value associated with the the specified key.
      inthashCode()
      Returns the hash code value for this map.
      booleanisValid()
      Determine whether the FileHashMap is valid or not.
      Set<K>keySet()
      Returns a Set containing all the keys in this map.
      Vput(K key, V value)
      Associates the specified value with the specified key in this map.
      Vremove(Object key)
      Removes the mapping for this key from this map, if present.
      voidsave()
      Save any in-memory index changes to disk without closing the map.
      intsize()
      Returns the number of key-value mappings in this map.
      Collection<V>values()
      Returns a collection view of the values contained in this map.
    • Field Detail

      • NO_CREATE

        public static final int NO_CREATE
        Constructor flag value: If specified, the disk files will not be created if they don't exist; instead, the constructor will throw an exception. By default, if the files don't exist, the constructor creates them.
        See Also:
        Constant Field Values
      • TRANSIENT

        public static final int TRANSIENT
        Constructor flag value: If specified, the on-disk hash table is considered to be transient and will be removed when this object is closed or finalized. Note that this flag cannot be combined with NO_CREATE. Also, if TRANSIENT is specified to the constructor, but the on-disk files already exist, the constructor will throw an exception unless FORCE_OVERWRITE is also specified.
        See Also:
        Constant Field Values
      • FORCE_OVERWRITE

        public static final int FORCE_OVERWRITE
        Constructor flag value: Ignored unless TRANSIENT is also specified, FORCE_OVERWRITE tells the constructor to overwrite the on-disk files if they already exist, instead of throwing an exception.
        See Also:
        Constant Field Values
      • RECLAIM_FILE_GAPS

        public static final int RECLAIM_FILE_GAPS
        Constructor flag value: Tells the object to reclaim unused space in the file, whenever possible. If this flag is not specified, then the space occupied by objects removed from the hash table is not reclaimed. This flag is not persistent. It's possible to create a persistent FileHashMap without using this flag, and later reopen the on-disk map via a new FileHashMap object that does have this flag set. Whenever the flag is set, the in-memory FileHashMap object attempts to reuse gaps in the file. When the flag is not set, the in-memory FileHashMap object ignores gaps in the file.
        See Also:
        Constant Field Values
    • Constructor Detail

      • FileHashMap

        public FileHashMap()            throws IOException

        Create a new transient FileHashMap object. The transient disk files will be in the normal temporary directory and will have the prefix "FileHashMap". The names of the associated disk files are guaranteed to be unique, and will go away when the object is finalized or when the Java VM goes away. Call this constructor is identical to calling FileHashMap(String) with a null filePrefix parameter.

        Note that this constructor implicitly sets the TRANSIENT flag.

        Throws:
        IOException - Unable to create temp file
        See Also:
        FileHashMap(String,int), FileHashMap(String)
      • FileHashMap

        public FileHashMap(String tempFilePrefix)            throws IOException

        Create a new transient FileHashMap object. The transient disk files will be in the normal temporary directory and will have the specified temporary file prefix. If no temporary file prefix is specified (i.e., the tempFilePrefix parameter is null), then "fhm" will be used. The names of the associated disk files are guaranteed to be unique, and they will go away when the object is finalized or when the Java VM goes away.

        Note that this constructor implicitly sets the TRANSIENT flag.

        Parameters:
        tempFilePrefix - the prefix to use with the temporary file, or null for default prefix "fhm"
        Throws:
        IOException - Unable to create temp file
        See Also:
        FileHashMap(String,int), FileHashMap()
      • FileHashMap

        public FileHashMap(String pathPrefix,           int flags)            throws FileNotFoundException,                   ObjectExistsException,                   ClassNotFoundException,                   VersionMismatchException,                   IOException

        Create a new FileHashMap object that will read its data from and/or store its data in files derived from the specified prefix. The prefix is a file prefix onto which data and index suffixes will be appended.

        Values for the flag parameter

        The flags parameter consists of a set of logically OR'd constants.

        If (flags & NO_CREATE) is zero, the constructor will attempt to create the index files if they don't exist. Otherwise, it expects to find a saved hash map, and it will throw an exception if the files are not present.

        If (flags & TRANSIENT) is non-zero, the constructor will attempt to create a transient map. A transient map is not saved; its disk files are removed when the map is finalized or manually closed. You cannot create a transient map using a file prefix that specifies an existing, saved map. Specifying the TRANSIENT flag causes any NO_CREATE flag to be ignored.

        For example, the following statement creates a transient FileHashMap:

         Map map = new FileHashMap ("/my/temp/dir", FileHashMap.TRANSIENT); 

        whereas this statements opens a persistent FileHashMap, creating it if it doesn't already exist:

         Map map = new FileHashMap ("/my/map/dir", FileHashMap.CREATE); 
        Parameters:
        pathPrefix - The pathname prefix to the files to be used
        flags - Flags that control the disposition of the files. A value of 0 means no flags are set. See above for details.
        Throws:
        FileNotFoundException - The specified hash files do not exist, and the NO_CREATE flag was specified.
        ClassNotFoundException - Failed to deserialize an object
        VersionMismatchException - Bad or unsupported version stamp in FileHashMap index file
        ObjectExistsException - One or both of the files already exist, but the TRANSIENT flag was set and the FORCE_OVERWRITE flag was not set.
        IOException - Other errors
        See Also:
        NO_CREATE, TRANSIENT, FileHashMap(), FileHashMap(String)
    • Method Detail

      • clear

        public void clear()

        Removes all mappings from this map. The data file is cleared by closing it, deleting it, and reopening it. If an I/O error occurs at any point, this object will be closed and marked invalid.

        Specified by:
        clear in interface Map<K,V>
        Overrides:
        clear in class AbstractMap<K,V>
      • containsKey

        public boolean containsKey(Object key)

        Returns true if this map contains a mapping for the specified key. Since the keys are cached in an in-memory index, this method does not have to access the data file, and is fairly cheap.

        Specified by:
        containsKey in interface Map<K,V>
        Overrides:
        containsKey in class AbstractMap<K,V>
        Parameters:
        key - key whose presence in this map is to be tested
        Returns:
        true if this map contains a mapping for the specified key, false otherwise.
      • containsValue

        public boolean containsValue(Object value)

        Returns true if this map maps one or more keys that are mapped to the specified value. Because it must iterate through some portion of the on-disk value set, this method can be slow.

        Specified by:
        containsValue in interface Map<K,V>
        Overrides:
        containsValue in class AbstractMap<K,V>
        Parameters:
        value - value whose presence in this map is to be tested.
        Returns:
        true if this map maps one or more keys to the specified value, false otherwise.
      • delete

        public void delete()
        Deletes the files backing this FileHashMap. This method implicitly calls close().
      • entrySet

        public Set<Map.Entry<K,V>> entrySet()

        Returns a "thin" set view of the mappings contained in this map. Each element in the returned set is a Map.Entry; each Map.Entry contains a key and a proxy for the associated value. The map's values themselves are not loaded into memory until requested via a call to Map.Entry.getValue().

        The set is backed by the map, so changes to the map are reflected in the set. If the map is modified while an iteration over the set is in progress, the results of the iteration are undefined. Neither the set nor its associated iterator supports any of the set-modification methods (e.g., add(), remove(), etc). If you attempt to call any of those methods, the called method will throw an UnsupportedOperationException.

        NOTE: Calling this method on an invalid or closed FileHashMap will result in an exception.

        Specified by:
        entrySet in interface Map<K,V>
        Specified by:
        entrySet in class AbstractMap<K,V>
        Returns:
        a set view of the mappings contained in this map.
        See Also:
        keySet(), values(), UnsupportedOperationException
      • equals

        public boolean equals(Object o)

        Compares the specified object with this map for equality. Returns true if the given object is also a map and the two Maps represent the same mappings. More formally, two maps t1 and t2 represent the same mappings if t1.entrySet().equals(t2.entrySet()). This ensures that the equals method works properly across different implementations of the Map interface.

        Warning:: Because this method must compare the actual values stored in the map, and the values in a file, this method can be quite slow.

        Specified by:
        equals in interface Map<K,V>
        Overrides:
        equals in class AbstractMap<K,V>
        Parameters:
        o - object to be compared for equality with this map.
        Returns:
        true if the specified object is equal to this map.
      • get

        public V get(Object key)

        Returns the value associated with the the specified key. Returns null if the map contains no mapping for this key.

        Specified by:
        get in interface Map<K,V>
        Overrides:
        get in class AbstractMap<K,V>
        Parameters:
        key - key whose associated value is to be returned.
        Returns:
        the value to which this map maps the specified key, or null if the map contains no mapping for this key.
        Throws:
        ClassCastException - if the key is of an inappropriate type for this map.
        NullPointerException - key is null
        See Also:
        containsKey(Object)
      • hashCode

        public int hashCode()

        Returns the hash code value for this map. The hash code of a map is defined to be the sum of the hash codes of each entry in the map's entrySet() view. This ensures that t1.equals(t2) implies that t1.hashCode()==t2.hashCode() for any two maps t1 and t2, as required by the general contract of Object.hashCode.

        Warning:: The recommended hash code for each entry in the entrySet() view is a combination of hash codes for the entry's key and the entry's value. (See java.util.Map.Entry for details.) Because the values in a FileHashMap object are stored in a file, this method can be quite slow.

        Specified by:
        hashCode in interface Map<K,V>
        Overrides:
        hashCode in class AbstractMap<K,V>
        Returns:
        the hash code value for this map.
        See Also:
        Object.hashCode(), Object.equals(Object), equals(Object)
      • isValid

        public boolean isValid()
        Determine whether the FileHashMap is valid or not. Once a FileHashMap has been closed, it is invalid and can no longer be used.
        Returns:
        true if this object is valid, false if not
      • keySet

        public Set<K> keySet()

        Returns a Set containing all the keys in this map. Since the keys are cached in memory, this method is relatively efficient. The keys are returned in an order that optimizes sequential access to the data file; see the Optimizations section in the main class documentation for details.

        The set is backed by the map, so changes to the map are reflected in the set. If the map is modified while an iteration over the set is in progress, the results of the iteration are undefined. Neither the set nor its associated iterator supports any of the set-modification methods (e.g., add(), remove(), etc). If you attempt to call any of those methods, the called method will throw an UnsupportedOperationException.

        Specified by:
        keySet in interface Map<K,V>
        Overrides:
        keySet in class AbstractMap<K,V>
        Returns:
        a set view of the keys contained in this map.
      • put

        public V put(K key,    V value)      throws ClassCastException,             IllegalArgumentException,             NullPointerException

        Associates the specified value with the specified key in this map. If the map previously contained a mapping for this key, the old value is replaced. This map class does not permit a null value to be stored.

        Specified by:
        put in interface Map<K,V>
        Overrides:
        put in class AbstractMap<K,V>
        Parameters:
        key - key with which the specified value is to be associated.
        value - value to be associated with the specified key.
        Returns:
        previous value associated with specified key, or null if there was no mapping for key.
        Throws:
        ClassCastException - if the class of the specified key or value prevents it from being stored in this map.
        IllegalArgumentException - Value not serializable, or I/O error while attempting to serialize value.
        NullPointerException - the specified key or value is null.
      • remove

        public V remove(Object key)

        Removes the mapping for this key from this map, if present. Note: The space occupied by the serialized value in the data file is not coalesced at this point.

        Specified by:
        remove in interface Map<K,V>
        Overrides:
        remove in class AbstractMap<K,V>
        Parameters:
        key - key whose mapping is to be removed from the map.
        Returns:
        previous value associated with specified key, or null if there was no mapping for key.
      • save

        public void save()          throws IOException,                 NotSerializableException

        Save any in-memory index changes to disk without closing the map. You can call this method even if the map is marked as temporary; on a temporary map, save() simply returns without doing anything.

        Throws:
        IOException - Error saving changes to disk.
        NotSerializableException - Can't save index because it contains one or more objects that cannot be serialized.
        See Also:
        close()
      • size

        public int size()

        Returns the number of key-value mappings in this map. If the map contains more than Integer.MAX_VALUE elements, returns Integer.MAX_VALUE. This method queries the in-memory key index, so it is relatively efficient.

        Specified by:
        size in interface Map<K,V>
        Overrides:
        size in class AbstractMap<K,V>
        Returns:
        the number of key-value mappings in this map.
      • values

        public Collection<V> values()

        Returns a collection view of the values contained in this map. The returned collection is a "thin" view of the values contained in this map. The collection contains proxies for the actual disk-resident values; the values themselves are not loaded until a Collection method such as contains() is called.

        The collection is backed by the map, so changes to the map are reflected in the set. If the map is modified while an iteration over the set is in progress, the results of the iteration are undefined.

        The set does not support any of the add() methods.

        Warning:: The toArray() methods can be dangerous, since they will attempt to load every value from the data file into an in-memory array.

        Specified by:
        values in interface Map<K,V>
        Overrides:
        values in class AbstractMap<K,V>
        Returns:
        a collection view of the values contained in this map.
        See Also:
        keySet(), values()

SCaVis 1.8 © jWork.org