Write a program to create an index of a small collection of World Wide Web pages. Each “page” is a text file in a special format called HTML (HyperText Markup Language). The HTML format includes regular text and special HTML commands, which are always enclosed in anglebraces. For example, the string is an HTML command meaning that the following text should be highlighted; a user click on the highlighted text would cause a web browser to fetch and display the file layout.htm. Your program’s job is to read an HTML file called index.htm and all the files referenced within index.htm by the HREF command and all the files referenced by those files, and so on until there are no new files to read. Your program should also read the file webpage.in containing a list of words and show a list of all the files referenced from index.htm which contain each word (see the Sample Output). Assumptions:
2/3 f ilename2 "word" can be found in the following pages: f ilename3 "word" can not be found in any page. Where word is the word from the input file, and filename1, filename2, and so on, are the names of the files containing the word. Each file name should be indented five spaces: a single blank line should separate each listing. Note: There are three files in the Sample Input below (index.htm, layout.htm, and webpage.in). Sample Input
Write a program to create an index of a small collection of World Wide Web pages. Each "page" is a text file in a special format called HTML (HyperText Markup Language). The HTML format includes regular text and special HTML commands, which are always enclosed in angle braces. For example, the string is an HTML command meaning that the following text should be highlighted; a user click on the highlighted text would cause a web browser to fetch and display the file layout.htm.
Don't forget that links can be self-referential!
Note that there is no rule that the file needs to be legal HTML (if you know the rules), or that words really be wordseiwlaoieu;a. Watch out for mutual references!