subject
Computers and Technology, 03.03.2020 01:28 196336

In word segmentation, you are given as input a string of alphabetical characters ([a-z]) without whitespace, and your goal is to insert spaces into this string such that the result is the most fluent according to the language model.

a. Consider the following greedy algorithm: Begin at the front of the string. Find the ending position for the next word that minimizes the language model cost. Repeat, beginning at the end of this chosen segment.
Show that this greedy search is suboptimal. In particular, provide an example input string on which the greedy approach would fail to find the lowest-cost segmentation of the input.
In creating this example, you are free to design the n-gram cost function (both the choice of n and the cost of any n-gram sequences) but costs must be positive and lower cost should indicate better fluency. Note that the cost function doesn't need to be explicitly defined. You can just point out the relative cost of different word sequences that are relevant to the example you provide. And your example should be based on a realistic English word sequence — don't simply use abstract symbols with designated costs.

b. Implement an algorithm that, unlike greedy, finds the optimal word segmentation of an input character sequence. Your algorithm will consider costs based simply on a unigram cost function.
Before jumping into code, you should think about how to frame this problem as a state-space search problem. How would you represent a state? What are the successors of a state? What are the state transition costs? (You don't need to answer these questions in your writeup.)
Fill in the member functions of the SegmentationProblem class and the segmentWords function. The argument unigramCost is a function that takes in a single string representing a word and outputs its unigram cost. You can assume that all the inputs would be in lower case. The function segmentWords should return the segmented sentence with spaces as delimiters, i. e. ' '.join(words).
For convenience, you can actually run python submission. py to enter a console in which you can type character sequences that will be segmented by your implementation of segmentWords. To request a segmentation, type seg mystring into the prompt. For example:

>> seg
Query (seg):
this is not my beautiful house

Console commands other than seg — namely ins and both — will be used for the upcoming parts of the assignment. Other commands that might help with debugging can be found by typing help at the prompt.
Hint: You are encouraged to refer to NumberLineSearchProblem and GridSearchProblem implemented in util. py for reference. They don't contribute to testing your submitted code but only serve as a guideline for what your code should look like.
Hint: the final actions that ucs (a UniformCostSearch object) takes can be accessed through ucs. actions.

ansver
Answers: 2

Another question on Computers and Technology

question
Computers and Technology, 22.06.2019 11:30
Hassan is writing his master’s thesis, which is a thirty-page document. he received some feedback from his professor in the form of comments, but does not see where the comments are. what is the fastest way for hassan to find the feedback?
Answers: 3
question
Computers and Technology, 22.06.2019 22:50
Assume the existence of a bankaccount class. define a derived class, savingsaccount that contains two instance variables: the first a double, named interestrate, and the second an integer named interesttype. the value of the interesttype variable can be 1 for simple interest and 2 for compound interest. there is also a constructor that accepts two parameters: a double that is used to initialize the interestrate variable, and a string that you may assume will contain either "simple", or "compound", and which should be used to initialize the interesttype variable appropriately. there should also be a pair of functions getinterestrate and getinteresttype that return the values of the corresponding data members (as double and int respectively).
Answers: 2
question
Computers and Technology, 23.06.2019 06:20
What is a point-in-time measurement of system performance?
Answers: 3
question
Computers and Technology, 24.06.2019 07:40
What type of multimedia are live news feeds? live news feeds are examples of multimedia.
Answers: 2
You know the right answer?
In word segmentation, you are given as input a string of alphabetical characters ([a-z]) without whi...
Questions
question
Mathematics, 27.10.2019 17:43
question
Mathematics, 27.10.2019 17:43
question
Mathematics, 27.10.2019 17:43
Questions on the website: 13722367