Microsoft
Senior Risk Integration Manager
Amazon Jul 2013 - Mar 2014
Senior Technical Program Manager
Paypal Feb 2010 - Jul 2013
Manager- Brand Risk Management Technology
Paypal Apr 2007 - Apr 2010
Lead Software Engineer
Paypal Sep 2005 - Apr 2007
Reseach Engineer, Risk Analytics
Education:
Stanford University 1992 - 1998
Doctorates, Doctor of Philosophy, Philosophy
University of Science and Technology of China 1987 - 1992
Bachelor of Applied Science, Bachelors, Computer Science
Skills:
E Commerce Agile Methodologies Scalability Enterprise Software Development and Architecture Data Mining Fraud Detection and Risk Management Software Development Cloud Computing Linux Mysql Software Engineering Statistical Modeling Perl Risk Management Distributed Systems Management Fraud Detection Credit Cards Xml Scrum
Peter J. Dehlinger - Palo Alto CA, US Shao Chin - Santa Cruz CA, US
Assignee:
Word Data Corp. - Palo Alto CA
International Classification:
G06F 17/30 G06F 7/00
US Classification:
707 5, 707 3, 707 4, 704 7, 704 9
Abstract:
Disclosed are a computer-readable code, system and method for classifying a target document in the form of a digitally encoded natural-language text as belonging to one or more of two or more different classes. Each of a plurality of non-generic words and optionally, words groups characterizing the target document is selected as a descriptive term if the term has an above-threshold selectivity value in at least one library of texts in a field, where the selectivity value of a term is a measure of the field-specificity of that term. There is then determined, for each of the plurality of sample texts having associated classification identifiers, a match score related to the number of descriptive terms present in or derived from that text that match those in the target text. From the selected matched texts, and the associated classification identifiers, a classification determination of the target document is made.
Peter J. Dehlinger - Palo Alto CA, US Shao Chin - Santa Cruz CA, US
Assignee:
Word Data Corp. - Palo Alto CA
International Classification:
G06F 17/30
US Classification:
707 5, 707 6, 704 7, 704 9
Abstract:
Disclosed are a computer-readable code, system and method for classifying a target document in the form of a digitally encoded natural-language text as belonging to one or more of two or more different classes. For each of a plurality of non-generic words and/or words groups characterizing the target document, there is determined a selectivity value calculated as the frequency of occurrence of that term in a library of texts in one field, relative to the frequency of occurrence of the same term in one or more other libraries of texts in one or more other fields, respectively, and the document is represented as a vector of terms, where the coefficient assigned to each term is a function of the selectivity value determined for that term. There is then determined, for each of the plurality of sample texts having associated classification identifiers, a match score related to the number of descriptive terms present in or derived from that text that match those in the target text. From the selected matched texts, and the associated classification identifiers, a classification determination of the target document is made.
Processing Input Text To Generate The Selectivity Value Of A Word Or Word Group In A Library Of Texts In A Field Is Related To The Frequency Of Occurrence Of That Word Or Word Group In Library
Disclosed is an automated system, machine-readable storage medium embodying computer-executable code, and method for generating descriptive words and optionally, multi-word groups derived from a digitally encoded, natural-language input text that describes a concept, invention, or event in a selected field. The system includes (a) an electronic digital computer, (b) a database of words and optionally, word-groups derived from a plurality of texts, and (c) machine-readable storage medium embodying computer-executable code for accessing the database. The database provides, or can be used to calculate, a selectivity value for each of the words and optionally, word groups contained in or derived from the input text. Words and optionally, word groups having an above-threshold selectivity value are selected as descriptive terms from the input text.
Code, System And Method For Representing A Natural-Language Text In A Form Suitable For Text Manipulation
Peter J. Dehlinger - Palo Alto CA, US Shao Chin - Santa Cruz CA, US
Assignee:
Word Data Corp. - Palo Alto CA
International Classification:
G06F 7/00 G06F 17/20 G06F 17/21 G06F 17/27
US Classification:
704 10, 704 7, 704 9, 707 3, 707 4, 707 5, 707 6
Abstract:
A computer method, system and code, for representing a natural-language document in a vector form suitable for text manipulation operations are disclosed. The method involves determining (a) for each of a plurality of terms selected from one of (i) non-generic words in the document, (ii) proximately arranged word groups in the document, and (iii) a combination of (i) and (ii), a selectivity value of the term related to the frequency of occurrence of that term in a library of texts in one field, relative to the frequency of occurrence of the same term in one or more other libraries of texts in one or more other fields, respectively. The document is represented as a vector of terms, where the coefficient assigned to each term includes a function of the selectivity value determined for that term.
Peter Dehlinger - Palo Alto CA, US Shao Chin - Stanford CA, US
International Classification:
G06F017/27
US Classification:
704/010000
Abstract:
Disclosed are a computer-readable code, system and method for comparing a target concept, invention, or event with each of a plurality of texts. Each of a plurality of non-generic words and optionally, words groups characterizing the target concept, invention, or event, is selected as a descriptive term if the term has an above-threshold selectivity value in at least one library of texts in a field, where the selectivity value of a term is a measure of the field-specificity of that term. There is then determined, for each of the plurality of texts, a match score related to the number of descriptive terms present in or derived from that text that match those in the target concept, invention, or event. Texts having the highest match scores are selected.
Peter Dehlinger - Palo Alto CA, US Shao Chin - Stanford CA, US
International Classification:
G06F007/00
US Classification:
707/001000
Abstract:
Disclosed is a computer-accessible database composed of a list of non-generic words contained in a plurality of digitally encoded texts. Associated with each term is a selectivity value or values that are related to the frequency of occurrence of that word in at least one library of texts in a field, relative to the frequency of occurrence of the same word in one or more libraries of texts in one or more other fields, respectively. Also associated with each term are one or more text identifiers identifying one or more of the digitally processed texts containing that word. Each text identifier may be further associated with sentence and word-number identifiers that identify the sentence and word number(s) of a given database word.
Peter Dehlinger - Palo Alto CA, US Shao Chin - Santa Cruz CA, US
International Classification:
G06F017/28
US Classification:
704/005000
Abstract:
Disclosed are a computer-readable code, system and method for comparing a target concept, invention, or event with each of a plurality of texts. Each of a plurality of non-generic words and optionally, words groups characterizing the target concept, invention, or event, is selected as a vector term if the term has an above-threshold selectivity value in at least one library of texts in a field, where the selectivity value of a term is a measure of the field-specificity of that term. There is then determined, for each of the plurality of texts, a match score related to the number of vector terms present in or derived from that text that match those in the target concept, invention, or event. Texts having the highest match scores are selected.
Peter Dehlinger - Palo Alto CA, US Shao Chin - Santa Cruz CA, US
Assignee:
WORD DATA CORP - Palo Alto CA
International Classification:
G06F017/27
US Classification:
704/009000
Abstract:
A computer method for representing a natural-language document in a vector form suitable for text manipulation operations is disclosed. The method involves determining (a) for each of a plurality of terms composed of non-generic words and, optionally, proximately arranged word groups in the document, a selectivity value of the term related to the frequency of occurrence of that term in a library of texts in one field, relative to the frequency of occurrence of the same term in one or more other libraries of texts in one or more other fields, respectively. The document is represented as a vector of terms, where the coefficient assigned to each term includes a function of the selectivity value determined for that term, and optionally related to the inverse document frequency of that word in one or more libraries of texts. Also disclosed are a computer-readable code for carrying out the method, a computer system that employs the code, and a vector produced by the method.