New Technology Extracts Opinions from Text Data
NEC announced today the development of a "sentence characteristic distribution calculation method" that extracts "opinion sentences" from written text which expresses the feelings or subjective views of a writer.
The technology is designed for analyzing sentences that feature individuals' evaluations of corporate brands or products (reputation information) in a wide range of written content that includes blogs, questionnaires and others.
In recent years, with the proliferation of the Internet, users are able to transmit a wide range of information. This information includes many opinions and comments on news, products, and services, and this has attracted the attention of companies seeking information that can be effectively used in market surveys and improvements to products and services. In the past, technologies extracted reputation from information on blogs and other forms of user-generated content by specifying evaluation expressions (e.g., "good," "bad," "expensive," "cheap") along with the subject of the evaluation. In some cases, however, these technologies were unable to obtain evaluation information (opinion sentences) when the sentences were very short (e.g., when the subject of the sentence is not included) or complex sentences, where the subject of the evaluation is separated from the evaluation expressions. There has thus been a demand for technologies that offer more coverage of these types of sentences.
In order to judge whether a sentence should be considered an opinion sentence or a topic-related sentence, NEC's method focuses on the continuity of topics and calculates the subjectivity or topicality of several sentences that appear before or after the target sentence. The method focuses on the general tendency of sentences to be written with continuity on a given topic. Machine learning technologies are used to determine how many opinion sentences there are in a given group of continuous sentences in a text (a "block" of sentences), in order to extract rules for evaluating the subjectivity or topicality of the block. These rules are applied to the blocks being evaluated to calculate the score.
For example, this method could determine an individual's ideas, calculate evaluation scores, or determine the ratio of approval vs. disapproval regarding a certain event, product, or service, from information on the Internet, such as blogs, electronic bulletin boards, questionnaire data, or records of inquiries at call centers. This information could then be used for corporate marketing activities.
NEC plans to apply these technologies in new search services, analysis services for marketing activities, customer relationship management solutions, and to strengthen research and development activities aimed at further expanding these application fields in the future.
In recent years, with the proliferation of the Internet, users are able to transmit a wide range of information. This information includes many opinions and comments on news, products, and services, and this has attracted the attention of companies seeking information that can be effectively used in market surveys and improvements to products and services. In the past, technologies extracted reputation from information on blogs and other forms of user-generated content by specifying evaluation expressions (e.g., "good," "bad," "expensive," "cheap") along with the subject of the evaluation. In some cases, however, these technologies were unable to obtain evaluation information (opinion sentences) when the sentences were very short (e.g., when the subject of the sentence is not included) or complex sentences, where the subject of the evaluation is separated from the evaluation expressions. There has thus been a demand for technologies that offer more coverage of these types of sentences.
In order to judge whether a sentence should be considered an opinion sentence or a topic-related sentence, NEC's method focuses on the continuity of topics and calculates the subjectivity or topicality of several sentences that appear before or after the target sentence. The method focuses on the general tendency of sentences to be written with continuity on a given topic. Machine learning technologies are used to determine how many opinion sentences there are in a given group of continuous sentences in a text (a "block" of sentences), in order to extract rules for evaluating the subjectivity or topicality of the block. These rules are applied to the blocks being evaluated to calculate the score.
For example, this method could determine an individual's ideas, calculate evaluation scores, or determine the ratio of approval vs. disapproval regarding a certain event, product, or service, from information on the Internet, such as blogs, electronic bulletin boards, questionnaire data, or records of inquiries at call centers. This information could then be used for corporate marketing activities.
NEC plans to apply these technologies in new search services, analysis services for marketing activities, customer relationship management solutions, and to strengthen research and development activities aimed at further expanding these application fields in the future.