CSPC-712 Natural Language Processing | |||||||
|---|---|---|---|---|---|---|---|
Teaching Scheme | Credit | Marks Distribution | Duration of End Semester Examination | ||||
| L | T | P | Internal Assessment | End Semester Examination | Total | ||
| 3 | 1 | 0 | 4 | Maximum Marks: 40 | Maximum Marks: 60 | 100 | 3 Hours |
| Minimum Marks: 16 | Minimum Marks: 24 | 40 | |||||
Unit-I
Introduction to Natural Language Processing: Introduction: Natural Language Processing, Why is NLP hard? Programming languages Vs Natural Languages, Are natural languages regular? Finite automata for NLP, Stages of NLP, Challenges and Issues (Open Problems) in NLP.
Basics of text processing: Tokenization, Stemming, Lemmatization, Part of Speech Tagging.
Unit-II
Language Syntax and Semantics: Morphological Analysis: What is Morphology? Types of Morphemes, Inflectional morphology & Derivational morphology, Morphological parsing with Finite State Transducers (FST).
Syntactic Analysis: Syntactic Representations of Natural Language, Parsing Algorithms, Probabilistic context-free grammars, and Statistical parsing.
Semantic Analysis: Lexical Semantic, Relations among lexemes & their senses Homonymy, Polysemy, Synonymy, Hyponymy, WordNet, Word Sense Disambiguation (WSD), Dictionary based approach, Latent Semantic Analysis.
Unit-III
Language Modelling: Probabilistic language modelling, Markov models, Generative models of language, Log-Liner Models, Graph-based Models.
N-gram models: Simple n-gram models, Estimation parameters and smoothing, Evaluating language models.
Word Embeddings/ Vector Semantics: Bag-of-words, TFIDF, word2vec, doc2vec, Contextualized representations (BERT).
Topic Modelling: Latent Dirichlet Allocation (LDA), Latent Semantic Analysis, Non Negative Matrix Factorization.
Information Retrieval using NLP: Information Retrieval: Introduction, Vector Space Model, Named Entity Recognition: NER System Building Process, Evaluating NER System, Entity Extraction, Relation Extraction, Reference Resolution, Coreference resolution, Cross Lingual Information Retrieval.
Unit-IV
NLP Tools and Techniques: Prominent NLP Libraries: Natural Language Tool Kit (NLTK), spaCy, TextBlob, Gensim etc.
Linguistic Resources: Lexical Knowledge Networks, WordNets, Indian Language WordNet (IndoWordnet), VerbNets, PropBank, Treebanks, Universal Dependency Treebanks.
Word Sense Disambiguation: Lesk Algorithm, WordNets for Word Sense Disambiguation.
Applications of NLP: Machine Translation: Rule based techniques, Statistical Machine Translation (SMT), Cross Lingual Translation, Sentiment Analysis, Question Answering, Text Entailment, Discourse Processing, Dialog and Conversational Agents, Natural Language Generation.