1 files changed, 0 insertions, 213 deletions
diff --git a/search/assign.rst b/search/assign.rst
deleted file mode 100644
index 66e537e..0000000
--- a/search/assign.rst
+++ /dev/null
@@ -1,213 +0,0 @@
-========================
-Project 2: Search Engine
-========================
-
-**CS2621– Web Science**
-
-*100 points*
-
-You are to create a web search engine that works at the command line.
-To do this, you will write two Python scripts, indexer.py and
-search.py. 
-
-Indexer
-=======
-
-Indexer.py should do the following:
-
-1. After performing a crawl (using your other Python script), read all
-   the HTML files that were stored in the “pages” directory. For each
-   document, extract the title and the text from the body of the page
-   (read the Beautiful Soup documentation to find out how). Beautiful
-   Soup will include the text of the page in the content of the page,
-   and that is OK. Beautiful Soup may also break on some pages and
-   include HTML as text, but we will not worry about these 
-   exceptions or bugs.
-
-2. All text should be converted to lowercase and non-alphanumeric
-   characters should be ignored. So “123-456” would become “123” and
-   “456”, and “joe@yahoo.com” would become “joe”, “yahoo”, “com”.
-   Ignore the following stop words: a, an, and, are, as, at, be, by,
-   for, from, has, he, in, is, it, its, of, on, that, the, to, was,
-   were, will, with. Do not perform stemming.
-
-3. A single inverted index should be created for the document corpus
-   which maintains the document ID (numbered 1…n in order of the pages
-   found in the “pages” directory), a 1 or 0 if the text is found in
-   the title, and the term frequency from the body (normalized by the
-   total number of tokens in the document after removing stop words).
-
-4. After indexer.py has finished indexing all the web pages, it should
-   output the index to index.dat which looks likethis:
-
-::
-
-  arkansas
-    6 0 0.022
-  model
-    1 0 0.309
-    3 0 0.015
-    5 1 0.001
-  tuesday
-    2 0 0.082
-  white
-    2 1 0.018
-  etc…
-
-.. note ::
-  The indexed words are alphabetized, and there are 3 spaces before
-  sets of three numbers (each separated by a single space) which are:
-  doc ID, title (0 or 1), and normalized body TF (rounded to 3 decimal
-  places). For example, the term white was found only in document 2;
-  it was somewhere in the title and made up 1.8% of all the words in
-  the document.
-
-5. It may take some time for your program to run, so you should output
-   information about the program’s status as it indexes the crawled
-   pages. Outputting what file is being worked on would be helpful to
-   the user who is waiting for the program to finish its work.
-
-Search
-======
-
-After the index is written to index.dat, the search.py script will
-allow the user to search the corpus for specific words.  Here is how
-it should operate:
-
-1. First, read the search phrase at the command line. Examples:
-
-  .. code :: bash
-
-    $ search.py bisons
-    $ search.py "landmark college"
-
-If no command line argument is supplied, the program should tell the
-user a search term is required and terminate.  Ignore any command-line
-arguments after the first.
-
-2. Next, the program should read the index from index.dat into memory.
-   Note that you may want to use similar data structures used in
-   indexer.py, so you should write your programs in a way where you
-   share code without having redundant code in each script. (It’s OK
-   to introduce new .py files to your project.)
-
-3. For simplicity, all queries will be assumed to use boolean ANDs,
-   and we will not implement phrase search. For example, the query
-   landmark college should generate a boolean search for landmark AND
-   college, so only documents containing both terms should be
-   considered amatch.
-
-4. Remove any stop words from the query as was done when indexing the
-   documents.
-
-5. After determining which documents match the search terms, calculate
-   the relevancy score for each document: relevancy score = 0.9 * body
-   TF + 0.1 * title score Do this for each term, and compute the
-   average relevancy score for all terms. So if the search was for
-   landmark college, you would compute the score for landmark and the
-   score for college and compute the average to determine the overall
-   relevancy score.
-
-6. The total number of results should first be displayed. Then display
-   every document ID and score (out to 3 decimal places) ordered by
-   score, and number the results. Example:
-
-.. code:: bash
-
-  Results: 4
-  1. docID, 3, score, 0.830
-  2. docID, 1, score, 0.814
-  3. docID, 5, score, 0.350
-  4. docID, 8, score, 0.108
-
-**Bonus:** You can receive 5 bonus points by implementing phrase search.
-So when the user searches for “landmark college”, assume they want
-only documents with that exact phrase. To accomplish this, you will
-need to store the positions of the terms that are stored in the
-inverted index. Then use those positions to ensure the phrase matches
-successive positions.  
-
-
-Zip your entire project directory and submit it
-to Canvas before it is due. Make sure your output matches the
-specifications precisely to avoid losing any points. If you use any
-code you find in the Web, you must document the source in your
-program.
-
-Test Data
-=========
-
-*a.html*
-
-.. code:: html
-
-  <title>cool!!! test!!!</title>
-  	<body>
-  		this 123-456.
-  	</body>
-
-*b.html*
-
-.. code:: html
-
-  <html>
-  	<head>
-  		<title>Go Patriots!</title>
-  	</head>
-  	<body>
-  		And another test and test!
-  	</body>
-  </html>
-
-*c.html*
-
-.. code:: html
-
-  <body>
-  	This is a test.
-  </body>
-
-*Inverted index:*
-
-.. code::
-
-  123
-  	a 0 0.200
-  456
-  	a 0 0.200
-  another
-  	b 0 0.200
-  cool
-  	a 1 0.200
-  patriots
-  	b 1 0.200
-  go
-  	b 1 0.200
-  test
-  	a 1 0.200
-  	c 0 0.500
-  	b 0 0.400
-  this
-  	a 0 0.200
-  	c 0 0.500
-
-Search for "test this" results in the following:
-
-::
-
-  Results: 2
-  1. docID c, score 0.450
-  2. docID a, score 0.230
-
-Search for "test patriots go" results in the following:
-
-::
-
-  Results: 1
-  1. docID b, score 0.310
-
-Search for "cool patriots" results in the following:
-
-::
-
-  Results: 0