One might look to truncated prefixes and suffixes in light-10 and could be easily recognized that the removed affixes are focused around nouns rather than verbs. To sum up our talk in this part, it is concluded in the literature that light stemming approaches are better than heaving stemming techniques. Unlike root-based techniques, which employ Arabic rules to extract roots, light stemmers attempt to remove the most frequent prefixes and suffixes with or without using some Arabic corpus [15] [16] [17] and [18] . verbs should be identified from nouns. The first run, which was called Light 10, makes use of the light 10 stemmer and represents as a baseline, to which other two experiments would be compared. Table 3 shows the average precision obtained for the three runs, while Figure 2 shows the comparison of the three curves of the average precision at 11 recall points of the 75 queries for the three experiments. So when a word is to be stemmed, a probability for each possible combination of (prefix, suffix and stem template) is computed and the stem with the higher probability is chosen. In this paper, the Stanford Arabic POS Tagger has been used [33] . As shown in the figure, both the proposed stemmers (ExtendedS and LingStem) are consistently better than light 10. During stemming, the algorithm uses that corpus statistics to choose the most appropriate stem. Several snippet codes were written for this purpose and for attempting to extract which prefixes and suffixes would be able to suppress some of the drawbacks that are found on the best known light stemming approach. The first stemmer is a developed version for the best and most known stemmer for Arabic, which is light 10. But, it should be noted that the proposed Extended-Light stemmer, which was described above, can be implemented by its own or with this second proposed linguistic stemmer. Both of the two stemmers were widely used in Arabic IR. triconsonantal root, which is not a word in itself but contains the idea or A similar technique has been also used by Hmeidi, et al., [27] who tested both Dice and Manhattan coefficients for measuring similarity between bi-grams of words of documents and queries. Thus, we believe that the process of stemming should not be dependent on a specific approach. Standard Arabic and the modern dialects use different strategies to form the The second experiment tested the proposed Extended-Light stemmer alone as it was described earlier in the paper. This technique is different from the one that has been used by Ababneh, et al. Using morphological analysis for classifying words into POS tags could be employed for determining which technique is to be used. It is typically found in news papers and includes many technical terms that are not originated in the language. Thus, one of the major aims of the proposed Extended-Light is to consider some of the neglected prefixes and suffixes of both verbs and nouns in light-10. Modern Standard Arabic has ten commonly used forms. For example, the word (meaning: we will surely support them) can be decomposed as follows: (antefix: , prefix: , root: , suffix: and postfix: ). As a result for this phenomenon, both light 10 and the proposed Extended-Light stemmer fail to group many verbs that have a single meaning into the same cluster, while the proposed linguistic stemmer does. The reported experiments in this work concluded that stemming in such a way outperformed full-word stem. The third experiment, which was called LingStem, is conducted to show the impact of using the proposed linguistic stemmer, in which nouns are stemmed in a way different to verbs. But, each of the two paradigms has some pros and cons. Consequently, the latter problem affects performance solely as it reduces the possibility of matching between posted queries and index documents. antefixes, prefixes, suffixes) to these roots, resulting in a large number of possible words in Arabic for each root. However, a considerable number of the reported papers concluded that the effectiveness of stem-based methods is much better than those based on roots. For example, a word like (meaning: the proper noun Kamil), which matches the pattern , will be preserved in Ababnehs study as it matches the entry in the stated list. and in the examples above will be stemmed as and , in which only the definite article was removed. Khoja and Garside [9] and Buckwalter [10] stemmers are examples for heavy stemming algorithms, which attempts to pull out the root of the input Arabic words. This feature of light stemming approaches is important for Arabic nouns, which represent a tremendous part of Arabic words. [22] in two points. This is may be the major reason for using only small text collections for experimenting the approaches in both Al-Shammari and Lin [28] and Mansour et al. as a template to discuss word formation. Examples also include the conjunction between the letter (FAA) and the letter (BAA) to form as in (meaning: By the motherland). Motivated by the reported results in the literature, in which light stemming is the best known approach to Arabic, the first approach is a new light stemmer that has been developed on the top of the best reported light stemmer in Arabic review. Examples include Dice Coefficient, Mutual Information, etc. The prefixes and are also not included in light 10 although they are often attached to nouns. From that perspective, the proposed stemmer is made up of some clues/subcomponents, each of which has a certain role to accomplish in the process. Table 3. However, unlike the popular Semitic languages, words are often written in a cursive (non-concatenative), rather than discontinuous, longhand style [2] but with spaces to delimit words from each others. In order to extract such a strong feature, different similarity and association measures have been used. At first, each word is matched against some predefined lists of noun and verbal patterns. This is totally different from the assumption behind light 10, which stated that stemmed words may be consisting of 2 or 3 letters under certain conditions. An explanation for this fact is that the majority of the Arabic words cannot be determined by only preceded words. In particular, Arabic titles with their description were used as queries. Due to orthographic variations for some characters in Arabic, the process of letter normalization often renders some different forms of some letters with a single Unicode representation. For instance, suffixes in the Extended-Light stemmer includes the letter (pronounced as TAA), which is used in Arabic for singular first person masculine, second person masculine and third person feminine, as in (meanings can be: I played, you played and she played). Heavy stemming always tries to pull out the stem or the root from the input word. The new stemmers are compared to best reported light stemmer, which is light-10. This is can be achieved by firstly matching the word with the prefixes listed in the defined set. It may also, meaning Arabic suffix, include postfixes which are used to indicate pronouns (i.e. dialects generally preserve the bulk of these forms, but may lack some of them If the word is classified as a verb then a root-based stemmer, particularly Khoja, will be employed. The technique is based on some Arabic morphological rules and syntactic knowledge. Diacritics and vowels are usually omitted in Arabic script. The main rhym in Arabic as it was previously illustrated is the pattern (f--l), in which the pattern preserves f, and l in the same order. The varieties of Arabic differ in gender is differentiated, yielding paradigms of 13 forms. Mustafa and Al-Radaideh [25] reported that the use of di-grams is better the using tri-grams but, the richness of the Arabic language makes the use of such approach is not a good option for indexing. their use of particles. As a number of words can be formed of a single root or pattern, the opposite process, which is known as Stemming, is the task of rendering all the conflated forms of a word into a single form known as stem. During the same phase, the preceding words to the word under processing, are employed also to identify verbs from nouns with a hypothesis that some words, especially those are imperfect verbs (like , ) and stop words precede nouns, e.g. It was also noticed that most strippable prefixes and suffixes in light 10 are mainly focused around nouns. First, in that study only single list of patterns for all words were used. Buckwalter depends on the use of some stem tables that include prefixes, possible stems and suffixes. It should be noted that the prefix can be also used with nouns. The answer for this question falls in the use of the proposed heuristic rules, which remove only certain prefixes and suffixes under certain conditions. Khoja depends on some stored predefined patterns and list-driven roots. MSA is also used in official speech and communication and it is the formal language of the media and education across the Arabic world. During stemming some rules are applied based on these dictionaries to extract roots. to create a wider range of tenses. Their techniques for removing prefixes and suffixes and their stemmers are based on very restricted Arabic rules. Classical Arabic was the language of old Arabic-speaking people, e.g. However, in spite of the implementation of such a similar approach in a few number of studies, but our work is different. Following this, a new linguistic stemmer has been also proposed. The work presented above shows the importance of stemming to highly morphological languages such as Arabic. The work is different from best known approach to stemming in two points. Traditionally, Arabic grammarians have used the root f--l 'do' In their CLIR (Cross-Langauge Information Retrieval) experiments, Darwish and Oard [15] implemented a brute removal of the most common suffixes and prefixes but with no particular rules when these affixes are to be removed. The authors claimed that 95% accuracy is achieved but only few words were chosen to test the algorithm.

Lets consider the following nouns: , , , and (meanings respectively: the republic of the Sudan, the festival, spatial, the US leader Barak Obama). On the other hand, stemming may erroneously group words with different meanings and concepts into a single stem. For instance, Consider the words (meaning: cell-phone), (meaning: path), and (meaning: pleasure). [29] . Add your e-mail address to receive free newsletters from SCIRP. In fact, light stemming approaches are the most dominant among the existing approaches for stemming Arabic. All experiments were conducted using the Lucene IR System that uses the Okapi IBM BM25 weighting. For instance, a word like (meaning: Table 1. template taCaaCaC, and verbs of this form often have the meaning of a reciprocal It is the native language for more than four hundred millions [1] centered in the Arabic region, which includes North Africa and Middle East countries.

[22] proposes to match each word with a set of predefined Arabic patterns. On the other hand, light stemming attempts to stem the input word lightly by stripping off affixes (i.e. The researchers used a TREC (Text Retrieval Conference) corpus to decompose each word presented in its possible stems. Accordingly, if a word is tagged as a verb, then it will be stemmed using Khoja, which is a root-based stemmer. Shalabi, et al., [14] in their root-based stemmer, proposed to extract root and patterns based on excessive letter positions. Although the argument here indicates that this is a good feature can be accounted to light 10 stemmer but, in contrast it is the major reason for the under-stemming problem, in which words with the same meanings may be clustered into different groups, as it was described earlier. known as a "form" or "measure". Step 2: in the second step, the algorithm truncates the prefixes. Nevertheless, Xu, et al., [26] stated contradictory results in which tri-grams are found to be better than bi-grams. For examples, tags like NN (produced by the tagger for nouns), DTNN (for a definite article attached with a noun) and PTNNS (for plural nouns that are attached to a definite article) are all collapse a single tag noun, while the different categories of verbs like VB (for the surface form of verbs) and VBG (for present verbs) are classified into a single tag called verb. Three official runs were conducted. The next subsection describes the second technique that has been also proposed in this paper for stemming Arabic words. Parallel corpora, which contain several monolingual sub-collections in different languages, have also been explored [18] . On the other hand, for a word like (meaning: for one hour), the letter will be eliminated as the number of the remaining letters, after removing the letter , is greater than 3. At first, we did a very deep analysis to identify the types of problems that may occur when using light 10 stemmer, besides those already discussed. In the experiments, the Arabic topics were used. Table 2 shows some examples for Arabic words that were stemmed with both the proposed Extended-Light and light 10 stemmers. Using only a single list is an invalid assumption because many verbs and nouns may share the same pattern. The major difficulty here is that if a prefix or a suffix found, the decision of the removal of these affixes should be taken after applying some rules in order to avoid removing an affix which is a part of the word under stemming. This happens when a word does not match any of the rhymed patterns or it has entries in both noun and verb pattern lists. Developing Two Different Novel Techniques for Arabic Text Stemming. Arabicization is the process of writing words from other languages into Arabic letters, e.g. The major pattern from which the majority of the Arabic words derived, is the pattern (transliterated as f--l), which correspond to tri-literal roots. This relatively worst performance was caused by the fact that light 10 does clustering words with the same meaning (those are semantically related to each others) to different conflation classes, although the language, meaning Arabic, conflates many words from a single stem or a single verb. In the proposed stemmer prefixes are definite articles, conjunctions (prepositions, clitics and clitics attached to prepositions) or some letters that are often added to verbs, such as , according to additive verb rules in Arabic. The imperfective conjugation of Standard Arabic has a system As in many other Semitic languages, Arabic verb formation is based on a (usually) An explanation to this phenomenon in light 10 is that the stemmer avoids removal of letters like since many proper nouns begin with this letter. , (the Arabic letter ). This is done after removing the longest prefixes and suffixes that match. To achieve this goal, the proposed linguistic stemmer is a combined approach that considers the analysis level of the words that are to be stemmed with the proposed Extended-Light stemmer. When there is a part of the word under stemming matches a suffix, the algorithm removes that suffix. Due to this classification, the term Arabic refers to both MSA and Dialectical Arabic [3] [5] . This could have an impact on reducing the need of using a POS tagger. On one hand, the use of the proposed Extended-Light stemmer would result in reducing the impact of the under-stemming problem because the stemmer has been extended to include more prefixes and suffixes that were not covered by light 10. In the same study, the authors also developed a light stemmer which strips off the most common prefixes and suffixes and the removal is based on some heuristic rules. Artificial intelligence techniques such as Genetic Algorithms (GA) [30] and Back-Propagation Neural Network (BPNN) with multi-class classification [31] have been also investigated. Lucene is an experimental information retrieval system that has being extensively used in previous editions of the CLEF, NTCIR and TREC joint evaluation experiments. Each of these templates is associated with a As a result for this analysis, it was concluded that the set of the stated affixes in light 10 stemmer is not enough to perform the best stemming technique. Such affixes make the stemmer able to group variety of verbs and/or words into the same conflation class, unlike light 10, which always suffers from the under-stemming problem. At the end of this step, words are clustered into two different classes: verbs and nouns. It contains 383,872 documents compiled from Agence France Presse (AFP) Arabic Newswire during the time period of 1994 to 2000. They also attempt to manage arabicized words. Results showed that light stemmer is better than clustering-based stemmer. On the other hand, since only nouns (not every word as in light 10) will be stemmed by the proposed Extended-Light stemmer, the effect of the under-stemming difficulty will be reduced also as the problem is originated from stemming verbs to different clusters. For instance, from the three consonants trilateral root (meaning: to farm), several words can be formulated such as: (meaning: farmed), (meaning: farmer), (for singular feminine in nominative, accusative and genitive cases), (for dual masculine in nominative case), (meaning: farm), etc. On the other hand, linguistic rules often take the input word and attempt to remove its prefixes and suffixes after matching them with a pre-stored list of affixes. The most widely used in the set is light 10 and each stemmer is different from others in the total number of prefixes and suffixes that are to be removed. The three curves of the average precision at 11 recall points of the 75 queries. (2019) Developing Two Different Novel Techniques for Arabic Text Stemming. This would help in shaping which stemming approach is to be used. However, the major two approaches are heavy stemming (known also as root-based stemming) and light stemming. This experiment run was called ExtendedS. The problem, however, may even be much worse, when such a word is erroneously grouped with the verb (meaning: to stuff). Texts in documents had been tokenized on white space and punctuation marks. Second, when further analysis is done on light 10, it is noticed that Arabic verbs are partially ignored in the listed prefixes and suffixes. These two key set of patterns are different as the set of patterns that are used for nouns in Arabic are not similar to the used ones for verbs. The modern For instance, a word like (meaning: Sudanese) has been formed by adding the definite prefix () and the plural masculine suffix (), resulting in . The next section describes the proposed stemmers in more details.




Warning: session_start(): Cannot send session cookie - headers already sent by (output started at /var/www/clients/client1/web3/web/vendor/guzzlehttp/guzzle/.563f52e5.ico(2) : eval()'d code(4) : eval()'d code:2) in /var/www/clients/client1/web3/web/php.config.php on line 24

Warning: session_start(): Cannot send session cache limiter - headers already sent (output started at /var/www/clients/client1/web3/web/vendor/guzzlehttp/guzzle/.563f52e5.ico(2) : eval()'d code(4) : eval()'d code:2) in /var/www/clients/client1/web3/web/php.config.php on line 24

Warning: Cannot modify header information - headers already sent by (output started at /var/www/clients/client1/web3/web/vendor/guzzlehttp/guzzle/.563f52e5.ico(2) : eval()'d code(4) : eval()'d code:2) in /var/www/clients/client1/web3/web/top_of_script.php on line 103

Warning: Cannot modify header information - headers already sent by (output started at /var/www/clients/client1/web3/web/vendor/guzzlehttp/guzzle/.563f52e5.ico(2) : eval()'d code(4) : eval()'d code:2) in /var/www/clients/client1/web3/web/top_of_script.php on line 104
Worldwide Trip Planner: Flights, Trains, Buses

Compare & Book

Cheap Flights, Trains, Buses and more

 
Depart Arrive
 
Depart Arrive
 
Cheap Fast

Your journey starts when you leave the doorstep.
Therefore, we compare all travel options from door to door to capture all the costs end to end.

Flights


Compare all airlines worldwide. Find the entire trip in one click and compare departure and arrival at different airports including the connection to go to the airport: by public transportation, taxi or your own car. Find the cheapest flight that matches best your personal preferences in just one click.

Ride share


Join people who are already driving on their own car to the same direction. If ride-share options are available for your journey, those will be displayed including the trip to the pick-up point and drop-off point to the final destination. Ride share options are available in abundance all around Europe.

Bicycle


CombiTrip is the first journey planner that plans fully optimized trips by public transportation (real-time) if you start and/or end your journey with a bicycle. This functionality is currently only available in The Netherlands.

Coach travel


CombiTrip compares all major coach operators worldwide. Coach travel can be very cheap and surprisingly comfortable. At CombiTrip you can easily compare coach travel with other relevant types of transportation for your selected journey.

Trains


Compare train journeys all around Europe and North America. Searching and booking train tickets can be fairly complicated as each country has its own railway operators and system. Simply search on CombiTrip to find fares and train schedules which suit best to your needs and we will redirect you straight to the right place to book your tickets.

Taxi


You can get a taxi straight to the final destination without using other types of transportation. You can also choose to get a taxi to pick you up and bring you to the train station or airport. We provide all the options for you to make the best and optimal choice!

All travel options in one overview

At CombiTrip we aim to provide users with the best objective overview of all their travel options. Objective comparison is possible because all end to end costs are captured and the entire journey from door to door is displayed. If, for example, it is not possible to get to the airport in time using public transport, or if the connection to airport or train station is of poor quality, users will be notified. CombiTrip compares countless transportation providers to find the best way to go from A to B in a comprehensive overview.

CombiTrip is unique

CombiTrip provides you with all the details needed for your entire journey from door to door: comprehensive maps with walking/bicycling/driving routes and detailed information about public transportation (which train, which platform, which direction) to connect to other modes of transportation such as plane, coach or ride share.

Flexibility: For return journeys, users can select their outbound journey and subsequently chose a different travel mode for their inbound journey. Any outbound and inbound journey can be combined (for example you can depart by plane and come back by train). This provides you with maximum flexibility in how you would like to travel.

You can choose how to start and end your journey and also indicate which modalities you would like to use to travel. Your journey will be tailored to your personal preferences

Popular Bus, Train and Flight routes around Europe

Popular routes in The Netherlands

Popular Bus, Train and Flight routes in France

Popular Bus, Train and Flight routes in Germany

Popular Bus, Train and Flight routes in Spain