BROWSE TOPICS
RESOURCESABOUT THIS SITE |
Suggested ImprovementsAITopics > Project Notes > Suggested Improvements Bugs Noted
Suggestions for small improvements
Non-exclusive check buttons for (a) Videos = Lecture, Demo, Interview, Panel, TVshow, MovieClip, Other and (b) Text = Book, Article, NewsStory, Transcript, Other. Other categories to help with sorting? Add to tags automatically.
Display items retrieved from search in chronological order / alphabetic order / etc.
In items returned from a search, highlight the search terms. Place the most specific pages first in the list of hits. May require creating a new page for every item or at least every featured item.
Ideas for improving the Wiki
Of special interest would be those that use AITopics
Searching for a phrase now lands a person at the top of the page containing that phrase. Much more helpful to return the right line(s) with the phrase highlighted. Allow refinement of search with an option to "Search within these results".
Use browser's on-page search capability -- although that means searching in two steps, not one.
Write PmWiki code that will send new submissions and edits to the subject-area editors for their review. When signed off by editor, a page will be added to the wiki.
All are sent now to Buchanan & Smith -- ok for now, but distributing the vetting will be important.
Keep track of comments & reviews by persons who view a video and want to comment. See amazon.com.
Detect that a new submission duplicates an existing entry based on heuristics that do not depend solely on exact match of URL or title.
Once a short segment is identified as interesting, copy it and move it to our own server
Create new page for each clip, linked to page for video
Move many of the important readings to their own pages, with tags and all the info we request for videos. This way, a search result will contain pointers to exactly the items requested as well as the general pages of secondary relevance. (See 2009-0014, and listing under Robots > General Readings.)
"WikiScanner, a new Web site that traces the source of millions of changes to Wikipedia, the online encyclopedia. The site, wikiscanner.virgil.gr, created by a computer science graduate student, cross-references an edited entry on Wikipedia with the owner of the computer network where the change originated, using the Internet protocol address of the editor’s network. The address information was already available on Wikipedia, but the new site makes it much easier to connect those numbers with the names of network owners." From http://www.nytimes.com/2007/08/19/technology/19wikipedia.html?pagewanted=2&_r=1
Provide point scores, iPods, or other incentives for submitting videos & editing pages. Make a quarterly announcement of top-submitter of videos added to the site. Display a list of top scores; give people their own tally when they edit. See ACS Video Contest for ideas about a contest to collect 3 min. videos.
Consider the kind of text mining that is done by Collexis software for the NIH. (Samy Uthurusamy).
Collexis® refers to a family of intelligent, text-searching tools for examining vast quantities of data to identify patterns and establish relationships. As bio-medical data grows to petabytes (millions of gigabytes), managing this data becomes increasingly important. Intelligent text mining holds promise for promoting health research and accelerating discoveries by automating the integration of multiple data bases to find linkages and make hypotheses.
In January 2004, after evaluating available systems, NIH procured a site license for Collexis software. This software is based on the principle of “fingerprinting” each piece of text that contains relevant information, such as an article in a scientific journal. The fingerprinting process makes use of the professional terminology of a particular field. For example, the system can fingerprint an article based on the National Library of Medicine Medical Subject Headings (MeSH®) Thesaurus. Collexis then can condense the fingerprints of all of the researcher’s publications into a knowledge profile of that individual.
Once Collexis has completed the fingerprinting/profiling of all sources of input, the system can make associations based on criteria established by the user. Consider this application. A busy Helpdesk receives several hundred e-mails daily that require responses from an expert. KM helps the Helpdesk by building knowledge profiles of all its employees. From then on, routing an incoming email is a matter of matching its fingerprint with the catalog of employee knowledge profiles.
Ideas for extending the scope of the project
Interview important persons in the history of AI and create a library of videos containing them. (Not on critical path for video archive, but useful in the future.)
Interview women with successful careers in AI who are good examples of smart women succeeding in CS. (Not on critical path for video archive, but useful to give girls and young women some positive illustrations to think about.)
Collect, date, and identify photographs and slides that show people and projects related to AI. (Not on critical path for video archive, but useful in the future. Some links to collections are now up on Resources > Reference Shelf).
Create a thesaurus of synonyms, variant spellings, and related terms in AI. Then adjust the search engine to include the thesaurus to find tags in the wiki from the search terms specified by the user.
Develop a template and a wiki submission form for saving ideas about using materials in the classroom. See AITopics - Resources for Teachers for ideas. Better yet, crawl the web with a program similar to NewsFinder to locate course syllabi, tutorial notes, instructional materials.
Ideas for a future intern
All news article information is stored in the database tables urllist (for crawled articles) and cat_corpus (for articles that have been upgraded to "training" status in the AINewsAdmin interface and articles that were crawled from the AINews archive produced by Jon Glick). Category (topic) membership is stored in categories (for crawled articles) and cat_corpus_cats (for corpus articles).
However, information about how NewsFinder processed each story is not stored in the database. Information that is not stored includes the occurrences of whitelisted terms, an article's category distribution (scores returned by the support vector machines), a list of its duplicates, an article's scores (from the star ratings), and reasons the article was ultimately published or not published. All this information is present on an article's "info" page, however (e.g. http://aaai.org/AITopics/AIArticles/2011-2844).
If all that information was somehow stored in the database, then "deep analytics" could be applied to the article collection.
of the published articles, which whitelisted terms were most common?
of the published articles, which sources were most common?
of the published articles, which categories were most common?
how closely are sources aligned with categories?
how closely are whitelisted terms aligned with categories?
how has whitelisted term usage changed over time?
which sources contribute the most unpublished articles?
how can we visualize article duplicates? (can we show clusters, marked by topic, on a timeline?)
how can we visualize whitelisted term covariance? (do some terms typically appear in the same articles?)
how have published article scores changed over time? (are we publishing more or less interesting articles over time?)
are there more or fewer article scores over time?
how are article scores distributed by category, by whitelisted terms, and by sources?
add a Facebook "Like" button to each published article [Done]
add "Tweet this" [Done] and "Google +1" buttons to published articles
push one or a handful of published articles to the AAAI Facebook page or other locations
show "relevant" news articles throughout the week, before the publication deadline, and elicit user feedback (like scores) -- perhaps this can influence which articles are ultimately published
find ways to optimize search engine presence and the various metrics found on Google Analytics and Feedburner Analytics
look into feature selection techniques in order to minimize the number of terms given to the SVMs (to prevent over-fitting and minimize training time)
try including more metadata like an article's title and source into the training process to improve categorization
use a more "semantically-aware" document representation, especially for duplicate detection, since TF-IDF vectors don't seem to capture much "meaning"
Last edited by Joshua Eckroth, September 09, 2011 Bugs Fixed & Suggestions Implemented
Create a summary of changes made by collecting Pmwiki edits from the Contributions links of everyone from the user profiles and adding their new submissions. Allow editors to select changes to pages within a specified topic (or set of topics) and within a specified time frame. [Done] Jan. 2, 2010 RGS: This is available via the Contributions tag. To see an example, visit the Profile page for Bruce Buchanan and click on Contributions). Edit the page to see the markup.
The prototype works modestly well, but it probably needs a complete rethinking. Also needs code to assess the degree of interest of items and then code to add items to AI in the News. [Done] August, 2010 (Liang Dong)
Collect ratings from users based on content and/or form. Show ratings & reviews to other users. See Netflix.com. [Done] August, 2010 (Liang Dong)
Show the date of the article, not the date of submission, before each headline for News items.
Show the date of each article pointed to. (Will follow from asking editors to put articles on their own pages.)
Add to resources for educators <<some already here>> [Done] December, 2010 (need to keep adding to orig. set of 20)
[Done] March 11, 2010. (Colleen McCarthy). More papers needed. May need to scan some (with permissions).
A common syntactic error that causes problems for PMWiki is to include <cr> within multi-line descriptions on the submit page. If <cr> could be replaced by <sp> easily before sending on to PMWiki, considerable editing time would be saved.
Add aioverview to list of topics in pull-down menu
Create new page for 2009 videos
Simple Form Examples ... testing
Select doesn't work in the current PmWiki version, only in beta.
|
