MediaWiki API result

This is the HTML representation of the JSON format. HTML is good for debugging, but is unsuitable for application use.

Specify the format parameter to change the output format. To see the non-HTML representation of the JSON format, set format=json.

See the complete documentation, or the API help for more information.

{
    "batchcomplete": "",
    "continue": {
        "gapcontinue": "ReDigi",
        "continue": "gapcontinue||"
    },
    "warnings": {
        "main": {
            "*": "Subscribe to the mediawiki-api-announce mailing list at <https://lists.wikimedia.org/postorius/lists/mediawiki-api-announce.lists.wikimedia.org/> for notice of API deprecations and breaking changes."
        },
        "revisions": {
            "*": "Because \"rvslots\" was not specified, a legacy format has been used for the output. This format is deprecated, and in the future the new format will always be used."
        }
    },
    "query": {
        "pages": {
            "78161": {
                "pageid": 78161,
                "ns": 0,
                "title": "Re-Pair",
                "revisions": [
                    {
                        "contentformat": "text/x-wiki",
                        "contentmodel": "wikitext",
                        "*": "{{short description|Lossless, but memory-consuming, data compression algorithm}}\n'''Re-Pair''' (short for '''recursive pairing''') is a [[Grammar-based code|grammar-based compression]] algorithm that, given an input text, builds a [[Straight-line program|straight-line program]], i.e. a [[Context-free grammar|context-free grammar]] generating a single string: the input text. In order to perform the compression in linear time, it consumes the amount of memory that is approximately five times the size of its input.\n\nThe grammar is built by recursively replacing the most frequent pair of characters occurring in the text. \nOnce there is no pair of characters occurring twice, the resulting string is used as the axiom of the grammar. \nTherefore, the output grammar is such that all rules but the axiom have two symbols on the right-hand side.\n\n== How it works ==\n[[File:Re pair example.gif|thumb|right|500px|Construction of a Straight Line Program that generates the string w=\"xabcabcy123123zabc\" by using Re-Pair]]'''Re-Pair''' was first introduced by NJ. Larsson and A. Moffat<ref name=Offline>Larsson, N. J., & Moffat, A. (2000). Off-line dictionary-based compression. Proceedings of the IEEE, 88(11), 1722\u20131732.</ref> in 1999. \nIn their paper the algorithm is presented together with a detailed description of the data structures required to implement it with linear time and space complexity.\nThe experiments showed that '''Re-Pair''' achieves high compression ratios and offers good performance for decompression.\nHowever, the major drawback of the algorithm is its memory consumption, which is approximately 5 times the size of the input. Such memory usage is required in order to perform the compression in linear time but makes the algorithm impractical for compressing large files.\n\nThe image on the right shows how the algorithm works compresses the string <math>w=xabcabcy123123zabc</math>.\n\nDuring the first iteration, the pair <math>ab</math>, which occurs three times in <math>w</math>, is replaced by a new symbol <math>R_1</math>.\nOn the second iteration, the most frequent pair in the string <math>w=xR_1cR_1cy123123zR_1c</math>, which is <math>R_1c</math>, is replaced by a new symbol <math>R_2</math>.\nThus, at the end of the second iteration, the remaining string is <math>w=xR_2R_2y123123zR_2</math>.\nIn the next two iterations, the pairs <math>12</math> and <math>R_{3}3</math> are replaced by symbols <math>R_3</math> and <math>R_4</math> respectively.\nFinally, the string <math>w=xR_2R_2yR_4R_4zR_2</math> contains no repeated pair and therefore it is used as the axiom of the output grammar.\n\n== Data structures ==\nIn order to achieve linear time complexity, '''Re-Pair''' requires the following data structures\n* '''A sequence''' representing the input string. Position <math>i</math> of the sequence contains the ''i''-th symbol of the input string plus two references to other positions in the sequence. These references point to the next/previous positions, say <math>k</math> and <math>m</math>, such that the same substring begins at <math>w[i]</math>, <math>w[k]</math> and <math>w[m]</math> and all three occurrences are captured by the same reference (i.e. there is a variable in the grammar generating the string).\n* '''A priority queue'''. Each element of the queue is a pair of symbols (terminals or previously defined pairs) that occur consecutively in the sequence. The priority of a pair is given by the number of occurrences of the pair in the remaining sequence. Each time a new pair is created, the priority queue is updated.\n* '''A hash table''' to keep track of already defined pairs. This table is updated each time a new pair is created or removed.\n\nSince the hash table and the priority queue refer to the same elements (pairs), they can be implemented by a common data structure called PAIR with pointers for the hash table (h_next) and the priority queue (p_next and p_prev). Furthermore, each PAIR points to the beginning of the first (f_pos) and the last (b_pos) occurrences of the string represented by the PAIR in the sequence. The following picture shows an overview of this data structure.\n\n[[File:Structure repair.png|900px|Data structure to implement the Recursive Pairing algorithm with linear runtime and space usage.]]\n\nThe following two pictures show an example of how these data structures look after the initialization and after applying one step of the pairing process (pointers to NULL are not displayed):\n\n[[File:Slide00.png|450px|State of the data structures used by the Recursive Pairing algorithm after going through the input text.]]\n[[File:Slide01.png|450px|State of the data structures used by the Recursive Pairing algorithm after performing the first pair replacement.]]\n\n=== Encoding the grammar ===\nOnce the grammar has been built for a given input string, in order to achieve effective compression, this grammar has to be encoded efficiently.\nOne of the simplest methods for encoding the grammar is the '''implicit encoding''', which consists on invoking function <code>encodeCFG(X)</code>, described below, sequentially on all the axiom's symbols.\nIntuitively, rules are encoded as they are visited in a depth-first traversal of the grammar. The first time a rule is visited, its right hand side is encoded recursively and a new code is assigned to the rule. From that point, whenever the rule is reached, the assigned value is written.\n\n<source lang=\"c\">\nnum_rules_encoded = 256 // By default, the extended ASCII charset are the terminals of the grammar.\n\nwriteSymbol(symbol s) {\n  bitslen = log(num_rules_encoded); // Initially 8, to describe any extended ASCII character\n  write s in binary using bitslen bits\n}\n\nvoid encodeCFG_rec(symbol s) {\n  if (s is non-terminal and this is the first time symbol s appears) {\n  \ttake rule s \u2192 X Y;\n    write bit 1;\n    encodeCFG_rec(X);\n    encodeCFG_rec(Y);\n    assign to symbol s value ++num_rules_encoded;\n  } else {\n    write bit 0;\n    writeSymbol(terminal/value assigned)\n  }\n}\n\nvoid encodeCFG(symbol s) {\n  encodeCFG_rec(s);\n  write bit 1;\n}\n</source>\n\nAnother possibility is to separate the rules of the grammar into generations such that a rule <math>X \\to YZ</math> belongs to generation <math>i</math> iff at least one of <math>Y</math> or <math>Z</math> belongs to generation <math>i-1</math> and the other belongs to generation <math>j</math> with <math>j \\leq i-1</math>. Then these generations are encoded subsequently starting from generation <math>0</math>. This was the method proposed originally when '''Re-Pair''' was first introduced. However, most implementations of Re-Pair use the implicit encoding method due to its simplicity and good performance. Furthermore, it allows on-the-fly decompression.\n\n== Versions ==\nThere exists a number of different implementations of '''Re-Pair'''. Each of these versions aims at improving one specific aspect of the algorithm, such as reducing the runtime, reducing the space consumption or increasing the compression ratio.\n\n{| class=\"wikitable\"\n|-\n! Improvement !! Year !! Implementation !! Description\n|-\n| Phrase Browsing<ref>R. Wan. \"Browsing and Searching Compressed Documents\". PhD thesis, University of Melbourne, Australia, December 2003.</ref> || 2003 || [https://github.com/rwanwork/Re-Pair] || Instead of manipulating the input string as a sequence of characters, this tool first groups the characters into phrases (for instance, words). The compression algorithm works as Re-Pair but considering the identified phrases as the terminals of the grammar. The tool accepts different options to decide which sort of phrases are considered and encodes the resulting grammar into separated files: one containing the axiom and one with the rest of the rules. \n|-\n| Original || 2011 || [https://code.google.com/archive/p/re-pair/] || This is one of the most popular implementations of Re-Pair. It uses the data structures described here (the ones proposed when it was originally published<ref name=Offline/>) and encodes the resulting grammar using the implicit encoding method. Most later versions of Re-Pair are implemented starting from this one.\n|-\n| Encoding<ref>Satoshi Yoshida and Takuya Kida, Effective Variable-Length-to-Fixed-Length Coding via a Re-Pair Algorithm, In Proc. of Data Compression Conference 2013 (DCC 2013), p. 532, Snowbird, Utah, USA, March 2013.</ref> || 2013 || [https://github.com/syoshid/Re-Pair-VF] || Instead of the implicit encoding method, this implementation uses a [[Variable-length code|Variable Length]] to Fixed Length method, in which each rule (represented with a string of Variable Length) is encoded using a code of Fixed Length.\n|-\n| Memory usage<ref>Bille, P., G\u00f8rtz, I. L., & Prezza, N. (2017, April). Space-efficient re-pair compression. In 2017 (DCC) (pp. 171-180). IEEE.</ref> || 2017 || [http://github.com/nicolaprezza/Re-Pair] || The algorithm performs in two phases. During the first phase, it considers the ''high frequency pairs'', i.e. those occurring more than <math> \\lceil\\sqrt{n}/3\\rceil</math> times, while the ''low frequency pairs'' are considered in the second. The main difference between the two phases is the implementation of the corresponding priority queues. \n|-\n| Compression<ref>Ga\u0144czorz, M., & Je\u017c, A. (2017, April). Improvements on Re-Pair grammar compressor. In 2017 Data Compression Conference (DCC) (pp. 181\u2013190). IEEE.</ref> || 2017 || [https://bitbucket.org/IguanaBen/repairimproved/src/master/] || This version modifies the way the next pair to be replaced is chosen. Instead of simply considering the most frequent pair, it employs a heuristic which penalizes pairs that are not coherent with a Lempel\u2013Ziv factorisation of the input string.\n|-\n| Compression<ref>Furuya, I., Takagi, T., Nakashima, Y., Inenaga, S., Bannai, H., & Kida, T. (2018). MR-RePair: Grammar Compression based on Maximal Repeats. arXiv preprint arXiv:1811.04596.</ref> || 2018 || [https://github.com/tkida/MR-Repair] || This algorithm reduces the size of the grammar generated by Re-Pair by replacing maximal repeats first. When a pair is identified as the most frequent pair, i.e. the one to be replaced in the current step of the algorithm, MR-repair extends the pair to find the longest string that occurs the same number of times as the pair to be replaced. The provided implementation encodes the grammar by simply listing the rules in text, hence this tool is purely for research purposes and cannot be used for compression as it is.\n|}\n\n== See also ==\n* [[Byte pair encoding]]\n* [[Sequitur algorithm]]\n\n== References ==\n<!-- Inline citations added to your article will automatically display here. See en.wikipedia.org/wiki/WP:REFB for instructions on how to add citations. -->\n{{reflist}}\n\n{{Compression Methods}}\n\n[[Category:Lossless compression algorithms]]\n\n{{Sourceattribution|Re-Pair}}"
                    }
                ]
            },
            "71481": {
                "pageid": 71481,
                "ns": 0,
                "title": "Re-order buffer",
                "revisions": [
                    {
                        "contentformat": "text/x-wiki",
                        "contentmodel": "wikitext",
                        "*": "A '''re-order buffer''' ('''ROB''') is a hardware unit used in an extension to the [[Tomasulo algorithm]] to support [[Out-of-order execution|out-of-order]] and [[Speculative execution|speculative]] instruction execution.  The extension forces instructions to be committed in-order.\n\nThe buffer is a [[Circular buffer|circular buffer]] (to provide a [[FIFO (computing and electronics)|FIFO]] instruction ordering queue) implemented as an [[Array (data structure)|array/vector]] (which allows recording of results against instructions as they complete out of order).\n\nThere are three stages to the Tomasulo algorithm: \"Issue\", \"Execute\", \"Write Result\". In an extension to the algorithm, there is an additional \"Commit\" stage. During the Commit stage, instruction results are stored in a register or memory. The \"Write Result\" stage is modified to place results in the re-order buffer. Each instruction is tagged in the [[Reservation station|reservation station]] with its index in the ROB for this purpose.\n\nThe contents of the buffer are used for data dependencies of other instructions scheduled in the buffer. The head of the buffer will be committed once its result is valid. Its dependencies will have already been calculated and committed since they must be ahead of the instruction in the buffer though not necessarily adjacent to it. Data dependencies between instructions would normally stall the pipeline while an instruction waits for its dependent values. The ROB allows the pipeline to continue to process other instructions while ensuring results are committed in order to prevent data hazards such as read ahead of write (RAW), write ahead of read (WAR) and write ahead of write (WAW). \n\nThere are additional fields in every entry of the buffer to support the extended algorithm:\n* Instruction type (jump, store to memory, store to register)\n* Destination (either memory address or register number)\n* Result (value that goes to destination or indication of a (un)successful jump)\n* Validity (does the result already exist?)\n\nThe consequences of the re-order buffer include precise [[Exception handling|exceptions]] and easy [[Rollback (data management)|rollback]] control of [[Branch target predictor|target address mis-predictions]] (branch or jump). When jump prediction is not correct or a nonrecoverable exception is encountered in the instruction stream, the ROB is cleared of all instructions (by setting the circular queue tail to the head) and reservation stations are re-initialized.\n\n==References==\n{{Reflist}}\n\n==External links==\n*[https://www.youtube.com/watch?v=mFtX28lH4O8 Hardware Based Speculation - Prof. Dr. Ben H. Juurlink]\n*[https://web.archive.org/web/20040724215416/http://lgjohn.okstate.edu/6253/lectures/reorder.pdf Reorder Buffer]\n\n[[Category:Instruction processing]]\n\n\n{{Sourceattribution|Re-order buffer}}"
                    }
                ]
            }
        }
    }
}