BROWSE TOPICS

RESOURCES

ABOUT THIS SITE


Suggested Improvements


AITopics > Project Notes > Suggested Improvements

Bugs Noted

  • No known bugs -- please report any bugs encountered.

Suggestions for small improvements

  • Smart Delete Button
    1. To delete an item from a page and still leave a trail to it for people who have cited it
    2. Before removing a highlighted item, create a new page for it
    3. Use the publication year in the name of the new page, not the current year
  • Collect New Information with "Submit Content"
    1. Specify type of material on submission form
Non-exclusive check buttons for (a) Videos = Lecture, Demo, Interview, Panel, TVshow, MovieClip, Other and (b) Text = Book, Article, NewsStory, Transcript, Other. Other categories to help with sorting? Add to tags automatically.
  1. Add a button for "Cancel" (not sure why we need Reset)
  • Option to Sort Items by Date, Title, etc.
Display items retrieved from search in chronological order / alphabetic order / etc.
  • Focus Results of a Search
In items returned from a search, highlight the search terms. Place the most specific pages first in the list of hits. May require creating a new page for every item or at least every featured item.
  • Page Editor: Reclassify a Page
    Maintainers of a page containing text + video removed the video from their site. The text is worth keeping but the page should be reclassified as "article" and removed from the video index.
  • Add a short checklist of tags to the Submission Page for Videos
    Useful to tag videos as Demo, Lecture, Interview, Panel, <maybe more>

Ideas for improving the Wiki

  • Code to browse selected online journals to find well-written overviews of individual topics. E.g., AI Magazine, JAIR. Fill out the "Submit Content" info and submit.
  • Once we know the form in which archivists want to receive metadata, develop code so that:
    1. Users input the metadata using something like (:metadata x, y, z :), and;
    2. An administrator can create an XML metadata document of the form desired by archivists.
    The tags implementation would be a useful starting point.
  • Collect ideas for successful homework assignments and term projects
Of special interest would be those that use AITopics
  • Student Project: Land Search on Specific Phrase on a Page
Searching for a phrase now lands a person at the top of the page containing that phrase. Much more helpful to return the right line(s) with the phrase highlighted. Allow refinement of search with an option to "Search within these results".
Use browser's on-page search capability -- although that means searching in two steps, not one.
  • Student Project: Automatic Page Creation for Editors
Write PmWiki code that will send new submissions and edits to the subject-area editors for their review. When signed off by editor, a page will be added to the wiki.
All are sent now to Buchanan & Smith -- ok for now, but distributing the vetting will be important.
  • Student Project: Page for Submitting Comments on Any Video
Keep track of comments & reviews by persons who view a video and want to comment. See amazon.com.
  • Student Project: Detect Duplicate Submissions
Detect that a new submission duplicates an existing entry based on heuristics that do not depend solely on exact match of URL or title.
  • Student Project: Easy Clipping for Specified Video Clips
Once a short segment is identified as interesting, copy it and move it to our own server
Create new page for each clip, linked to page for video
  • Editors
Move many of the important readings to their own pages, with tags and all the info we request for videos. This way, a search result will contain pointers to exactly the items requested as well as the general pages of secondary relevance. (See 2009-0014, and listing under Robots > General Readings.)
  • Student Project: Incorporate WikiScanner Software to Identify Source of Changes
"WikiScanner, a new Web site that traces the source of millions of changes to Wikipedia, the online encyclopedia. The site, wikiscanner.virgil.gr, created by a computer science graduate student, cross-references an edited entry on Wikipedia with the owner of the computer network where the change originated, using the Internet protocol address of the editor’s network. The address information was already available on Wikipedia, but the new site makes it much easier to connect those numbers with the names of network owners." From http://www.nytimes.com/2007/08/19/technology/19wikipedia.html?pagewanted=2&_r=1
  • Student Project: Create Incentives to Submit & Edit
Provide point scores, iPods, or other incentives for submitting videos & editing pages. Make a quarterly announcement of top-submitter of videos added to the site. Display a list of top scores; give people their own tally when they edit. See ACS Video Contest for ideas about a contest to collect 3 min. videos.
  • Big Project: Fingerprinting and Text Mining
Consider the kind of text mining that is done by Collexis software for the NIH. (Samy Uthurusamy).
Collexis® refers to a family of intelligent, text-searching tools for examining vast quantities of data to identify patterns and establish relationships. As bio-medical data grows to petabytes (millions of gigabytes), managing this data becomes increasingly important. Intelligent text mining holds promise for promoting health research and accelerating discoveries by automating the integration of multiple data bases to find linkages and make hypotheses.
In January 2004, after evaluating available systems, NIH procured a site license for Collexis software. This software is based on the principle of “fingerprinting” each piece of text that contains relevant information, such as an article in a scientific journal. The fingerprinting process makes use of the professional terminology of a particular field. For example, the system can fingerprint an article based on the National Library of Medicine Medical Subject Headings (MeSH®) Thesaurus. Collexis then can condense the fingerprints of all of the researcher’s publications into a knowledge profile of that individual.
Once Collexis has completed the fingerprinting/profiling of all sources of input, the system can make associations based on criteria established by the user. Consider this application. A busy Helpdesk receives several hundred e-mails daily that require responses from an expert. KM helps the Helpdesk by building knowledge profiles of all its employees. From then on, routing an incoming email is a matter of matching its fingerprint with the catalog of employee knowledge profiles.

Ideas for extending the scope of the project

  • Big Project: Videotape Oral Histories
Interview important persons in the history of AI and create a library of videos containing them. (Not on critical path for video archive, but useful in the future.)
  • Big Project: Create Film on Women in AI
Interview women with successful careers in AI who are good examples of smart women succeeding in CS. (Not on critical path for video archive, but useful to give girls and young women some positive illustrations to think about.)
  • Ongoing Project: Add Photos to the Video Archive
Collect, date, and identify photographs and slides that show people and projects related to AI. (Not on critical path for video archive, but useful in the future. Some links to collections are now up on Resources > Reference Shelf).
  • Student Project: Thesaurus
Create a thesaurus of synonyms, variant spellings, and related terms in AI. Then adjust the search engine to include the thesaurus to find tags in the wiki from the search terms specified by the user.
  • Student Project: Create Template for Teaching Aids (e.g., study guides & classroom lesson plans)
Develop a template and a wiki submission form for saving ideas about using materials in the classroom. See AITopics - Resources for Teachers for ideas. Better yet, crawl the web with a program similar to NewsFinder to locate course syllabi, tutorial notes, instructional materials.

Ideas for a future intern

  • Database Backend
All news article information is stored in the database tables urllist (for crawled articles) and cat_corpus (for articles that have been upgraded to "training" status in the AINewsAdmin interface and articles that were crawled from the AINews archive produced by Jon Glick). Category (topic) membership is stored in categories (for crawled articles) and cat_corpus_cats (for corpus articles).
However, information about how NewsFinder processed each story is not stored in the database. Information that is not stored includes the occurrences of whitelisted terms, an article's category distribution (scores returned by the support vector machines), a list of its duplicates, an article's scores (from the star ratings), and reasons the article was ultimately published or not published. All this information is present on an article's "info" page, however (e.g. http://aaai.org/AITopics/AIArticles/2011-2844).
If all that information was somehow stored in the database, then "deep analytics" could be applied to the article collection.
  • Deep analytics. Here are some ideas about what we might want to learn about the article collection:
of the published articles, which whitelisted terms were most common?
of the published articles, which sources were most common?
of the published articles, which categories were most common?
how closely are sources aligned with categories?
how closely are whitelisted terms aligned with categories?
how has whitelisted term usage changed over time?
which sources contribute the most unpublished articles?
how can we visualize article duplicates? (can we show clusters, marked by topic, on a timeline?)
how can we visualize whitelisted term covariance? (do some terms typically appear in the same articles?)
how have published article scores changed over time? (are we publishing more or less interesting articles over time?)
are there more or fewer article scores over time?
how are article scores distributed by category, by whitelisted terms, and by sources?
  • More "social". Obviously, the internet-connected world is head-over-heels for "social" websites. Here are some ideas for integrating AINews with various "social" technologies:
add a Facebook "Like" button to each published article [Done]
add "Tweet this" [Done] and "Google +1" buttons to published articles
push one or a handful of published articles to the AAAI Facebook page or other locations
show "relevant" news articles throughout the week, before the publication deadline, and elicit user feedback (like scores) -- perhaps this can influence which articles are ultimately published
find ways to optimize search engine presence and the various metrics found on Google Analytics and Feedburner Analytics
  • Better categorization, duplicate detection. There is a vast literature for automatic document classification (categorization) and duplicate detection. We are using standard, effective techniques (one support vector machine (SVM) per category, and a cosine-similarity duplicate detection technique, both based on an article's TF-IDF vector representation). However, there are more advanced techniques that may work better for our purposes:
look into feature selection techniques in order to minimize the number of terms given to the SVMs (to prevent over-fitting and minimize training time)
try including more metadata like an article's title and source into the training process to improve categorization
use a more "semantically-aware" document representation, especially for duplicate detection, since TF-IDF vectors don't seem to capture much "meaning"
  • Another domain. The NewsFinder code should be more-or-less easily altered to process articles from a different domain, such as biotech news.
  • Bigger data. An interesting project may be to dramatically increase the number of sources crawled. Using some "cloud" computing service like Amazon's Elastic Compute Cloud, and a MapReduce framework (like Hadoop) to split the processing across several machines, one could crawl thousands or hundreds-of-thousands of websites, looking for news about AI (or whatever domain).

Last edited by Joshua Eckroth, September 09, 2011


Bugs Fixed & Suggestions Implemented

  • Problems with tags implementation [Fixed] Formerly, tags with interior caps or characters like ' cause a preg_match() unknown modifier break. The problem is that words with these characteristics (e.g., CaseBasedReasoning, AAAIFellowSymposium, D'Andrea) are captured as WikiWords by pmwiki, resulting in snippets like <span class='wikiword'>AAAIFellowsSymposium</span> instead of simply AAAIFellowsSymposium. The problem has been "solved" by removing these tags and updating the video submission form to transform words like "CaseBasedReasoning" to "casebasedreasoning" and "D'Andrea" to "dandrea." However, Editors must still use caution and not manually edit tags to include the offending characteristics.
  • Submission Date. Automatically add a field to each video for the date submitted; display with name of contributor. [Done] June 10, 2008
  • Create a prototype web crawler that will find news stories mentioning AI, ordered by presumed relevance [Done] August 18, 2008
  • Relabel names of sections in TOC for every page [Done] Sept. 1, 2008
  • Add a section for photos of AI researchers [Done] Nov. 27, 2009 (last section in Resources > Reference Shelf)
  • Show Date of Last Edit ... as a way of assuring readers that the site is not rusting, show the date of last edit at the bottom of each page. [Done] Jan. 2, 2010.

  • Student Project: Track Chanes By Persons Making Them
Create a summary of changes made by collecting Pmwiki edits from the Contributions links of everyone from the user profiles and adding their new submissions. Allow editors to select changes to pages within a specified topic (or set of topics) and within a specified time frame. [Done] Jan. 2, 2010 RGS: This is available via the Contributions tag. To see an example, visit the Profile page for Bruce Buchanan and click on Contributions). Edit the page to see the markup.
  • Big Project: Improve The Web Crawler For Finding News Items
The prototype works modestly well, but it probably needs a complete rethinking. Also needs code to assess the degree of interest of items and then code to add items to AI in the News. [Done] August, 2010 (Liang Dong)
  • Student Project: Collect & Maintain Rating System
Collect ratings from users based on content and/or form. Show ratings & reviews to other users. See Netflix.com. [Done] August, 2010 (Liang Dong)
  • Show Dates [Done] August, 2010 (Liang Dong)
Show the date of the article, not the date of submission, before each headline for News items.
Show the date of each article pointed to. (Will follow from asking editors to put articles on their own pages.)
  • Problem with PMWiki and Submit Content. Occasionally PMWiki fails to parse some part of the URL provided on the submission form and returns an error message. All of the information provided on the submission page is flushed with no explanation of what to do to fix it. [Done] December, 2010
  • Collect course syllabi and course notes
Add to resources for educators <<some already here>> [Done] December, 2010 (need to keep adding to orig. set of 20)
  • Collect classic papers and Extend classic book collection
    1. Search for online versions of papers recognized as classic by AI Jnl, AAAI award, Science Citation Index
    2. Solicit classic books from authors and scan them
[Done] March 11, 2010. (Colleen McCarthy). More papers needed. May need to scan some (with permissions).
  • Correct Formatting Errors on Submission Page
A common syntactic error that causes problems for PMWiki is to include <cr> within multi-line descriptions on the submit page. If <cr> could be replaced by <sp> easily before sending on to PMWiki, considerable editing time would be saved.
Add aioverview to list of topics in pull-down menu
Create new page for 2009 videos
  • Reorganize Form for "Submit Content" ->[Done] September, 2011 (Josh Eckroth). Form for news items is greatly simplified; program infers nearly everything it needs.
    1. Put check box for "News Item" on the initial page
    2. Reorder information boxes as: Name, E-mail, Article Title, URL, Most Relevant Topic, Description, Author, Date Written, Source of Article, Tags for Searching, Comments
  • Establish correspondences between Facebook and AITopics with both AAAI and AITopics FB pages. E.g., Add Facebook "Like" buttons to AITopics pages. [Done] September, 2011 (Reid Smith)
  • Add Twitter feeds to the RSS feeds options. [Done] September, 2011 (Reid Smith)

Simple Form Examples ... testing

Name:
Password:
 

Select doesn't work in the current PmWiki version, only in beta.

  • There is other more complicated/comprehensive code for forms. For now, perhaps e-mail will suffice.
  • There is also code for better management of user accounts. For now, we use e-mail.
AAAI   Recent Changes   Edit   History   Print   Contact Us
Page last modified on December 17, 2011, at 08:15 AM